>>>>>>Acceleration!

Posted on December 27, 2012 by AMR

Following on from the addition of the Two Way Cache in my last post, I’ve made some more speed improvements to the Chameleon 64’s Minimig core.

Firstly, I’ve added a single-word write buffer, which means when the CPU writes to Fast RAM it doesn’t have to wait for the write to complete. Provided completing the write takes priority over any potential reads from the same address, and the Cache is updated to reflect the new data, then the CPU can continue merrily processing.

Secondly, I’ve added a second access slot to the SDRAM controller, which means in many cases the wait for RAM service is reduced from a worst-case 24 cycles to 17 cycles. The downside is that because RAM operations between the two access slots overlap, they can’t be to the same RAM bank. For this reason I’ve remapped the RAM so that bank 0 contains Chip RAM, Slow RAM, Kickstart ROM and OSD RAM, leaving banks 1-3 free for Fast RAM. Chip RAM and Kickstart ROM accesses can thus now overlap with Fast RAM accesses.

Finally, I’ve simplified the TG68 wrapper so that it no longer uses the enaWRreg signal to synchronize the CPU to the Amiga’s 28MHz clock.

Between them these changes give an average speed increase of about 65%.

Again, no binaries as yet, but source is in my github repo.

Test	Docking13	Dual Slot reduced wait
EmuTest	1.13	1.97
WritePixel	0.5	0.57
Sieve	0.72	1.33
Dhrystone	0.88	1.86
Sort	0.78	1.61
Matrix	0.66	1.28
IMath	1.26	1.72
MemTest	0.63	1.2
TGTest	0.7	0.76
Savage	1.37	2
Beachball	1.22	1.89
CplxTest	0.95	1.74
TranTest	1.32	1.97
SysInfo	0.91	1.86

44 thoughts on “>>>>>>Acceleration!”

majsta on December 28, 2012 at 12:36 pm said:

Again starting to wonder something. And it seems that I m correct about my comments earlier. Someone enters Amiga world and create wonders. In few month you have done more then those who are talking loud for past 15 years.

Reply ↓
- AMR on December 28, 2012 at 8:36 pm said:
  
  The good news is that this work is directly applicable to your project too! 🙂
  
  Reply ↓
  - Gerard Braad on December 30, 2012 at 5:15 pm said:
    
    I applaud Alistair having developing this in the open as true open source. This is what prevented progress for the others by not releasing early, releasing often…
    
    Reply ↓
- Robert Johnston on January 25, 2013 at 8:14 pm said:
  
  I have been following majsta’s work on the A600 project and it looks awesome, too! Cool work and discovery learning experiences are the most meaningful.
  
  Reply ↓
PaulDriver on January 2, 2013 at 5:01 am said:

I was considering the DE0-Nano board development board, containing a CycloneIV EP4CE22F17C6N FPGA with
22,320 Logic elements (LEs)
594 Embedded memory (Kbits)
66 Embedded 18 x 18 multipliers
4 General-purpose PLLs
153 Maximum FPGA I/O pins

As I’ve been looking for an affordable FPGA board that has enough space to do something usefull, and maybe even something fun, and at $79.00 this was looking pretty good to me, but your project looks like fun, do you think a DE0-Nano is large enough to load your project?

Thanks.

Paul.

Reply ↓
- AMR on January 2, 2013 at 7:15 pm said:
  
  Yup, it should fit – you’d just have to build yourself an interface board for VGA, PS/2 Keyboard and Mouse, sound and SD card.
  
  Reply ↓
  - Robert Johnston on February 1, 2013 at 3:06 am said:
    
    I don’t know if I offended you with my post before, if so sorry…but I just wanted to ask again, what the LE count is on your Amiga core now…I suppose I could just compile your source code to check but I just reformatted both my home and work PC and don’t have the Altera suite installed anymore. Also, as for the 16-bit bus…would it be safe to say that if there were a board with a 32-bit data bus…you could implement a real A3000? or even a zorroIII or PCI slot from one of the gpio’s…this keeps getting more exciting…you guys are really pushing the envelope…I like your wording before when you said how Chaos is doing a great job “shoe-horning” the new builds to fit in the DE1.
    
    Reply ↓
    - AMR on February 1, 2013 at 8:58 am said:
      
      Relax, you won’t offend me accidentally – any terseness in replies is purely down to time constraints 🙂
      
      The total LE usage on my C3 board is currently 18,414.
      
      As for the A3000, you could match A3000 performance just by giving the Minimig a 32-bit internal bus and using burst accesses to the 16-bit-wide SDRAM. That’s already how the Fast RAM works on the Chameleon and C3 board, and for most things it’s now faster than the A3000.
      
      A Zorro slot is doable, but would require voltage translation, since FPGAs aren’t generally 5v tolerant.
      
      Reply ↓
      - Robert Johnston on February 6, 2013 at 2:24 pm said:
        
        I was thinking…I think that you were talking about putting this in an old Acer case or something…I just found an easy Arduino A500 keyboard to PS/2 website…It would be cool to put the C3 board into a gutted A500 case…I wonder how difficult it would be to implement a real FD interface…voltage again…but that would be really cool…I am checking forums…using anyone of the FPGAmiga’s to upgrade is actually cheaper…and the performance and compatibility is continual and flexible…I have been checking out the NATAMI site and they actually are making modifications to modernize the chipset yet retain compatibility…like your cache….cool.
      - AMR on February 7, 2013 at 1:22 am said:
        
        I’ve wondered about the FD interface idea too. As you say, the biggest hurdle is probably voltage translation. An FPGA-based replacement motherboard for an A500 would be a very cool development. (Especially since so many A500+ machines suffer death-by-battery!)
      - Sjamaan on February 13, 2013 at 3:29 pm said:
        
        I was looking at the DE0-NANO and there is only one 32MB chip on the back side of the board…is that an 8-bit-wide?
      - AMR on February 13, 2013 at 7:58 pm said:
        
        At a quick glance I can’t see any mention of the chip’s width – but I’d be very surprised if it wasn’t 16-bits wide.
      - PaulDriver on February 26, 2013 at 4:45 am said:
        
        The DE0-Nano features a Synchronous Dynamic Random Access Memory (SDRAM) device providing 32MB with a 16-bit data lines connected to the FPGA.
        
        Taken from page 14 of the User’s Manual
        
        So it’s a 16MBx16bit device
      - sjamaan on March 18, 2013 at 4:27 am said:
        
        Did you just send the files as is to futuretec? or did you say you had to explain something to them?
        I am thinking about ordering two boards from them…
        Got any advice? I just don’t want to end up with the boards backwards…want the chirality correct…hehehe.
      - AMR on March 18, 2013 at 6:52 pm said:
        
        I sent the files, along with a covering email explaining I was new to this so if I’d made any obvious mistakes please let me know. They queried the traces being on Layer 1 rather than layer 16, and once I’d confirmed that was what I wanted, it was OK.
        
        I’ve only used two of the three boards I had made, though – if it’ll save you the hassle of having some made would you like the third one?
      - sjamaan on March 22, 2013 at 5:04 am said:
        
        I don’t have the connectors but looks like I could scavenge most of the stuff off of an old VGA card and sound card and a mother board with some solder braid and stuff..I got all the through hole resistors…and the 40 connector came with the board…so should be good to go.
      - Robert Johnston on March 23, 2013 at 9:14 pm said:
        
        Wait…are other ports addressed through that same GPIO header? Maybe we could get together with some of the other people using the same board and make another PCB with the DB9 ports and such on it as well…is the parallel, serial and (external FD) implemented in the VHDL too!?
    - sjamaan on February 14, 2013 at 3:11 am said:
      
      I am looking at that C3 board on ebay…now what should I tell them? I want another 16MB chip on it? Or do they put two 8MB chips on it?
      
      Reply ↓
      - AMR on February 14, 2013 at 8:13 am said:
        
        What I did was to ask a question through eBay to ask if they could add a second SDRAM chip, then sent another message (just to be sure) when I bought the board. They’ll put 2 16MB chips on, so you’ll have 32 meg to play with.
      - Robert Johnston on March 12, 2013 at 8:26 am said:
        
        The C3 board arrived today! Yahoo! Just wondering…do you have any pics of the I/O board that you made? Just connect it with a HDD cable? Been gone on spring break so haven’t started it yet…maybe it is already posted somewhere here on the site but I forgot where…
      - AMR on March 12, 2013 at 8:29 pm said:
        
        Not many pics – but there are details here: http://retroramblings.net/?p=190
        I actually used a 40-pin female header so the board clips directly onto the FPGA board.
        Eagle files for the board can be found here, if they’re any help: http://retroramblings.net/C3Board/C3_VGABoard.zip
      - sjamaan on March 13, 2013 at 7:50 am said:
        
        Are the Eagle files with corrections that you mentioned with routing on the back-side with through-hole components on the front? Just looking at it in EagleCAD looks like it is still the original version?…
      - AMR on March 14, 2013 at 10:53 pm said:
        
        It’s still the original version – I haven’t recreated it as yet.
    - sjamaan on March 22, 2013 at 5:01 am said:
      
      Sure! Just let me know how you want to split the cost plus your time and shipping here to Japan…don’t think it is that much if you just wrap it in bubble wrap or cardboard and send it as a letter…hehehe is and I’ll “gift” you on paypal or what ever…
      
      Reply ↓
      - Robert Johnston on March 23, 2013 at 9:17 pm said:
        
        When I said split the cost…I just meant your original costs from future tech…(hehehe) I would of course pay for all of the shipping…
      - AMR on March 24, 2013 at 9:54 am said:
        
        Yup, don’t worry, I understood what you meant 🙂 I sent you a couple of emails on this subject – (the comments here probably aren’t the best place to sort out the finer details!) – but let me know if for any reason you didn’t get them? 🙂
  - Robert Johnston on February 21, 2013 at 11:58 am said:
    
    I just ordered one of those boards! I downloaded your github sources and they compiled right away…making the I/O board while I wait…can’t wait to get it going!
    
    Reply ↓
    - AMR on February 22, 2013 at 9:56 pm said:
      
      Excellent – I’m sure you’ll have a great deal of fun with it! 🙂
      
      Reply ↓
cclecle on January 9, 2013 at 11:51 am said:

very interesting work !
I have a suggestion, I dont know if it’s posible with altera’s FPGA but, with Xilinx’s ones you can use a higher frequency internally (that you can obtain using DCM) .
So why dont try to let the output port at 82Mhz and try to drive the internal core and cache over 100MHz ( for a Spartan III they say 500Mhz but its theorical …)

I wonder you will continue this projet !

Clément

Reply ↓
- AMR on January 10, 2013 at 1:25 am said:
  
  That’s a good idea – which is why the core already works that way! The cache and SDRAM controller run at 113.45MHz. The CPU’s clocked at the same speed, but only runs when an enable signal is active, which for cached is about 1 cycle in 6.
  
  Reply ↓
  - cclecle on January 10, 2013 at 12:01 pm said:
    
    I’m not sure I understand.
    
    You said that the CPU is clocked at the same speed as the cache and the SDRAM controller BUT it is active only when an enable signal is active.
    
    So actually, the CPU runs slower than the cache controller and the memory controller ?
    
    What I was suggesting to you is keeping the memory controller at the bus clock rate and then increasing the cpu speed (to gain mips). Then the cache controller clock will be one side at memory controller rate and the other side at cpu rate.
    
    Reply ↓
    - AMR on January 10, 2013 at 6:46 pm said:
      
      Unfortunately the TG68 CPU has an fmax of somewhere in the region of 35Mhz. You can clock it faster than that, but it won’t work reliably unless you insert wait states (by means of the clkena signal) – so there’s not really any scope to increase speed that way.
      
      It may be possible to increase the system clock speed, though – the Chameleon’s SDRAM is apparently capable of operating reliably at 200MHz! The CPU core will still need to be limited to an effective speed of around 30-odd MHz though.
      
      Reply ↓
  - cclecle on January 10, 2013 at 12:17 pm said:
    
    By the way,
    do you keep the original 16bit data bus ?
    Otherwise, the next step to improve data fetch performances will be switching to a 32bit wide data bus.
    
    There is so many improvment to do 🙂
    
    Reply ↓
    - AMR on January 10, 2013 at 6:49 pm said:
      
      Yes, the physical connection to the SDRAM is 16 bits wide, and the TG68 CPU has 16 data lines. Increasing the CPU’s bus to 32-bits is way beyond my abilities at the moment.
      
      The RAM runs in burst mode, though – so data is fetched and cached in 64-bit bursts.
      
      Reply ↓
Robert Johnston on January 25, 2013 at 7:50 pm said:

Is there still a DE1 trunk in the source? I am not sure how to use the git hub source stuff everytime I ever clone a directory on my harddrive it projects never load…anyway I was also wondering if you could post which board you sourced on ebay…maybe I will follow your design…I remember that you kind of implemented AGA before….is that coming next? Can’t wait to follow your progress…it sure was great to get chaos’ build just before Christmas!

Reply ↓
- AMR on January 25, 2013 at 11:00 pm said:
  
  No, I’m afraid there’s no DE1 trunk in this repo. Chaos’s DE1 port is an amazing exercise in shoe-horning as much into the DE1’s FPGA as possible – but it’s a separate port. We’d both like to see improved code-sharing between different ports, but that’s going to be a long-term thing.
  
  The Two-Way Cache won’t fit into the DE1. There’s a chance it’d be possible to fit the extra access slot and the write buffer, though.
  
  As for AGA, it’s definitely something I want to support, but again it’s a long-term thing. All I managed to do so far was add support for the extra colour registers. Before that would be useful I’ll need to figure out increasing chip RAM bandwidth to allow extra bitplanes.
  
  Reply ↓
  - sjamaan on January 30, 2013 at 10:30 am said:
    
    So was that AGA demo that Mike had running on fpga ARCADE his own private thing, do you know? or was that ” beta” AGA support minimig core released into a small group within the minimig development community? Actually, I never asked him…you don’t know…I’ll just ask him by mail later…
    
    But back a bit…you think that your core will fit on a DE0 NANO? that has 22K LE……the DE1 has 20K so it’s just over the limit for it…or is the format for the C2, C3, and C4 different so that it takes less LE’s for some things with the newer cyclone boards? I noticed that adding support for the C3 and C4 adds MB’s of extra libs and stuff to the IDE…the lib for the C2 is tiny…
    
    Reply ↓
    - AMR on February 3, 2013 at 7:58 pm said:
      
      I haven’t seen much difference in LE count between designs built for the Cyclone II and Cyclone III – I’ve not used a Cyclone IV yet so can’t comment, but the 22KLE on the DE0-nano should be enough.
      
      As for the FPGAArcade thing, I think it’s currently still closed beta, but last I heard it’s nearing completion – so fingers crossed 🙂
      
      Reply ↓
      - PaulDriver on February 26, 2013 at 12:39 am said:
        
        I ordered a DE0-nano today as a birthday for myself, at $79 + $12 shipping it was too good a deal. For me, minimig is an educational avenue to FPGA, is the DE0-Nano won’t work, I’ll come up with a new project.
        
        I’ll report back when I get the beastie and have had a go at it.
      - PaulDriver on February 26, 2013 at 4:46 am said:
        
        Must learn to proof read posts better after being inturrupted, LOLz
- AMR on January 26, 2013 at 9:33 am said:
  
  Forgot to mention, by the way – the board I bought on EBay was http://www.ebay.co.uk/itm/270910317225?ssPageName=STRK:MEWAX:IT&_trksid=p3984.m1423.l2649 – but make sure you ask them to add a second SDRAM chip if you buy one. By default it comes with a single chip which is only 8 bits wide. The Minimig design needs a 16-bit bus to RAM.
  
  Reply ↓
Alexandre 'Tabajara' Souza on January 30, 2013 at 8:59 pm said:

Congratulations for the nice work, AMR! But would that fit into a DE0 (not nano) board? Thanks! 🙂

Reply ↓
- AMR on February 3, 2013 at 7:57 pm said:
  
  The straight DE0 board has an EP3C16, which has a shade under 16,000 logic elements, which is unfortunately not enough to hold the Minimig design.
  
  Reply ↓
  - Sjamaan on February 13, 2013 at 1:52 pm said:
    
    Yeah, in the beginning I was looking for a cheap alternative to the DE1 to get into FPGA and I picked up a used MCC-216 for $100 and it already had the JTAG header on it…but I didn’t understand about the number naming scheme I just saw it had a Cyclone4 and was excited…but it is like the EP4C16 or something and I was out of luck…but I did download the docs and it shows all the data for the “pins” to the controller ports, SD, etc…I still think it is possible to get the OneChipMSX to work on it though…I still think it looks like a really great board to develop on with all the ports already on it…double as GPIO’s.
    
    Reply ↓

Retro Ramblings

Musings on FPGA and Retro Computing

>>>>>>Acceleration!

44 thoughts on “>>>>>>Acceleration!”

Leave a Reply Cancel reply