>>>>>>Acceleration!

Following on from the addition of the Two Way Cache in my last post, I’ve made some more speed improvements to the Chameleon 64’s Minimig core.

Firstly, I’ve added a single-word write buffer, which means when the CPU writes to Fast RAM it doesn’t have to wait for the write to complete.Β  Provided completing the write takes priority over any potential reads from the same address, and the Cache is updated to reflect the new data, then the CPU can continue merrily processing.

Secondly, I’ve added a second access slot to the SDRAM controller, which means in many cases the wait for RAM service is reduced from a worst-case 24 cycles to 17 cycles.Β  The downside is that because RAM operations between the two access slots overlap, they can’t be to the same RAM bank.Β  For this reason I’ve remapped the RAM so that bank 0 contains Chip RAM, Slow RAM, Kickstart ROM and OSD RAM, leaving banks 1-3 free for Fast RAM.Β  Chip RAM and Kickstart ROM accesses can thus now overlap with Fast RAM accesses.

Finally, I’ve simplified the TG68 wrapper so that it no longer uses the enaWRreg signal to synchronize the CPU to the Amiga’s 28MHz clock.

Between them these changes give an average speed increase of about 65%.

Again, no binaries as yet, but source is in my github repo.

Test Docking13 Dual Slot reduced wait
EmuTest 1.13 1.97
WritePixel 0.5 0.57
Sieve 0.72 1.33
Dhrystone 0.88 1.86
Sort 0.78 1.61
Matrix 0.66 1.28
IMath 1.26 1.72
MemTest 0.63 1.2
TGTest 0.7 0.76
Savage 1.37 2
Beachball 1.22 1.89
CplxTest 0.95 1.74
TranTest 1.32 1.97
SysInfo 0.91 1.86

 

 

 

 

44 thoughts on “>>>>>>Acceleration!

  1. Again starting to wonder something. And it seems that I m correct about my comments earlier. Someone enters Amiga world and create wonders. In few month you have done more then those who are talking loud for past 15 years.

  2. I was considering the DE0-Nano board development board, containing a CycloneIV EP4CE22F17C6N FPGA with
    22,320 Logic elements (LEs)
    594 Embedded memory (Kbits)
    66 Embedded 18 x 18 multipliers
    4 General-purpose PLLs
    153 Maximum FPGA I/O pins

    As I’ve been looking for an affordable FPGA board that has enough space to do something usefull, and maybe even something fun, and at $79.00 this was looking pretty good to me, but your project looks like fun, do you think a DE0-Nano is large enough to load your project?

    Thanks.

    Paul.

      • I don’t know if I offended you with my post before, if so sorry…but I just wanted to ask again, what the LE count is on your Amiga core now…I suppose I could just compile your source code to check but I just reformatted both my home and work PC and don’t have the Altera suite installed anymore. Also, as for the 16-bit bus…would it be safe to say that if there were a board with a 32-bit data bus…you could implement a real A3000? or even a zorroIII or PCI slot from one of the gpio’s…this keeps getting more exciting…you guys are really pushing the envelope…I like your wording before when you said how Chaos is doing a great job “shoe-horning” the new builds to fit in the DE1.

        • Relax, you won’t offend me accidentally – any terseness in replies is purely down to time constraints πŸ™‚

          The total LE usage on my C3 board is currently 18,414.

          As for the A3000, you could match A3000 performance just by giving the Minimig a 32-bit internal bus and using burst accesses to the 16-bit-wide SDRAM. That’s already how the Fast RAM works on the Chameleon and C3 board, and for most things it’s now faster than the A3000.

          A Zorro slot is doable, but would require voltage translation, since FPGAs aren’t generally 5v tolerant.

          • I was thinking…I think that you were talking about putting this in an old Acer case or something…I just found an easy Arduino A500 keyboard to PS/2 website…It would be cool to put the C3 board into a gutted A500 case…I wonder how difficult it would be to implement a real FD interface…voltage again…but that would be really cool…I am checking forums…using anyone of the FPGAmiga’s to upgrade is actually cheaper…and the performance and compatibility is continual and flexible…I have been checking out the NATAMI site and they actually are making modifications to modernize the chipset yet retain compatibility…like your cache….cool.

          • I’ve wondered about the FD interface idea too. As you say, the biggest hurdle is probably voltage translation. An FPGA-based replacement motherboard for an A500 would be a very cool development. (Especially since so many A500+ machines suffer death-by-battery!)

          • I was looking at the DE0-NANO and there is only one 32MB chip on the back side of the board…is that an 8-bit-wide?

          • At a quick glance I can’t see any mention of the chip’s width – but I’d be very surprised if it wasn’t 16-bits wide.

          • The DE0-Nano features a Synchronous Dynamic Random Access Memory (SDRAM) device providing 32MB with a 16-bit data lines connected to the FPGA.

            Taken from page 14 of the User’s Manual

            So it’s a 16MBx16bit device

          • Did you just send the files as is to futuretec? or did you say you had to explain something to them?
            I am thinking about ordering two boards from them…
            Got any advice? I just don’t want to end up with the boards backwards…want the chirality correct…hehehe.

          • I sent the files, along with a covering email explaining I was new to this so if I’d made any obvious mistakes please let me know. They queried the traces being on Layer 1 rather than layer 16, and once I’d confirmed that was what I wanted, it was OK.

            I’ve only used two of the three boards I had made, though – if it’ll save you the hassle of having some made would you like the third one?

          • I don’t have the connectors but looks like I could scavenge most of the stuff off of an old VGA card and sound card and a mother board with some solder braid and stuff..I got all the through hole resistors…and the 40 connector came with the board…so should be good to go.

          • Wait…are other ports addressed through that same GPIO header? Maybe we could get together with some of the other people using the same board and make another PCB with the DB9 ports and such on it as well…is the parallel, serial and (external FD) implemented in the VHDL too!?

        • I am looking at that C3 board on ebay…now what should I tell them? I want another 16MB chip on it? Or do they put two 8MB chips on it?

          • What I did was to ask a question through eBay to ask if they could add a second SDRAM chip, then sent another message (just to be sure) when I bought the board. They’ll put 2 16MB chips on, so you’ll have 32 meg to play with.

          • The C3 board arrived today! Yahoo! Just wondering…do you have any pics of the I/O board that you made? Just connect it with a HDD cable? Been gone on spring break so haven’t started it yet…maybe it is already posted somewhere here on the site but I forgot where…

          • Are the Eagle files with corrections that you mentioned with routing on the back-side with through-hole components on the front? Just looking at it in EagleCAD looks like it is still the original version?…

        • Sure! Just let me know how you want to split the cost plus your time and shipping here to Japan…don’t think it is that much if you just wrap it in bubble wrap or cardboard and send it as a letter…hehehe is and I’ll “gift” you on paypal or what ever…

          • Yup, don’t worry, I understood what you meant πŸ™‚ I sent you a couple of emails on this subject – (the comments here probably aren’t the best place to sort out the finer details!) – but let me know if for any reason you didn’t get them? πŸ™‚

  3. very interesting work !
    I have a suggestion, I dont know if it’s posible with altera’s FPGA but, with Xilinx’s ones you can use a higher frequency internally (that you can obtain using DCM) .
    So why dont try to let the output port at 82Mhz and try to drive the internal core and cache over 100MHz ( for a Spartan III they say 500Mhz but its theorical …)

    I wonder you will continue this projet !

    ClΓ©ment

    • That’s a good idea – which is why the core already works that way! The cache and SDRAM controller run at 113.45MHz. The CPU’s clocked at the same speed, but only runs when an enable signal is active, which for cached is about 1 cycle in 6.

      • I’m not sure I understand.

        You said that the CPU is clocked at the same speed as the cache and the SDRAM controller BUT it is active only when an enable signal is active.

        So actually, the CPU runs slower than the cache controller and the memory controller ?

        What I was suggesting to you is keeping the memory controller at the bus clock rate and then increasing the cpu speed (to gain mips). Then the cache controller clock will be one side at memory controller rate and the other side at cpu rate.

        • Unfortunately the TG68 CPU has an fmax of somewhere in the region of 35Mhz. You can clock it faster than that, but it won’t work reliably unless you insert wait states (by means of the clkena signal) – so there’s not really any scope to increase speed that way.

          It may be possible to increase the system clock speed, though – the Chameleon’s SDRAM is apparently capable of operating reliably at 200MHz! The CPU core will still need to be limited to an effective speed of around 30-odd MHz though.

      • By the way,
        do you keep the original 16bit data bus ?
        Otherwise, the next step to improve data fetch performances will be switching to a 32bit wide data bus.

        There is so many improvment to do πŸ™‚

        • Yes, the physical connection to the SDRAM is 16 bits wide, and the TG68 CPU has 16 data lines. Increasing the CPU’s bus to 32-bits is way beyond my abilities at the moment.

          The RAM runs in burst mode, though – so data is fetched and cached in 64-bit bursts.

  4. Is there still a DE1 trunk in the source? I am not sure how to use the git hub source stuff everytime I ever clone a directory on my harddrive it projects never load…anyway I was also wondering if you could post which board you sourced on ebay…maybe I will follow your design…I remember that you kind of implemented AGA before….is that coming next? Can’t wait to follow your progress…it sure was great to get chaos’ build just before Christmas!

    • No, I’m afraid there’s no DE1 trunk in this repo. Chaos’s DE1 port is an amazing exercise in shoe-horning as much into the DE1’s FPGA as possible – but it’s a separate port. We’d both like to see improved code-sharing between different ports, but that’s going to be a long-term thing.

      The Two-Way Cache won’t fit into the DE1. There’s a chance it’d be possible to fit the extra access slot and the write buffer, though.

      As for AGA, it’s definitely something I want to support, but again it’s a long-term thing. All I managed to do so far was add support for the extra colour registers. Before that would be useful I’ll need to figure out increasing chip RAM bandwidth to allow extra bitplanes.

      • So was that AGA demo that Mike had running on fpga ARCADE his own private thing, do you know? or was that ” beta” AGA support minimig core released into a small group within the minimig development community? Actually, I never asked him…you don’t know…I’ll just ask him by mail later…

        But back a bit…you think that your core will fit on a DE0 NANO? that has 22K LE……the DE1 has 20K so it’s just over the limit for it…or is the format for the C2, C3, and C4 different so that it takes less LE’s for some things with the newer cyclone boards? I noticed that adding support for the C3 and C4 adds MB’s of extra libs and stuff to the IDE…the lib for the C2 is tiny…

        • I haven’t seen much difference in LE count between designs built for the Cyclone II and Cyclone III – I’ve not used a Cyclone IV yet so can’t comment, but the 22KLE on the DE0-nano should be enough.

          As for the FPGAArcade thing, I think it’s currently still closed beta, but last I heard it’s nearing completion – so fingers crossed πŸ™‚

          • I ordered a DE0-nano today as a birthday for myself, at $79 + $12 shipping it was too good a deal. For me, minimig is an educational avenue to FPGA, is the DE0-Nano won’t work, I’ll come up with a new project.

            I’ll report back when I get the beastie and have had a go at it.

    • The straight DE0 board has an EP3C16, which has a shade under 16,000 logic elements, which is unfortunately not enough to hold the Minimig design.

      • Yeah, in the beginning I was looking for a cheap alternative to the DE1 to get into FPGA and I picked up a used MCC-216 for $100 and it already had the JTAG header on it…but I didn’t understand about the number naming scheme I just saw it had a Cyclone4 and was excited…but it is like the EP4C16 or something and I was out of luck…but I did download the docs and it shows all the data for the “pins” to the controller ports, SD, etc…I still think it is possible to get the OneChipMSX to work on it though…I still think it looks like a really great board to develop on with all the ports already on it…double as GPIO’s.

Leave a Reply

Your email address will not be published.