>>>>>>Acceleration!

Following on from the addition of the Two Way Cache in my last post, I’ve made some more speed improvements to the Chameleon 64’s Minimig core.

Firstly, I’ve added a single-word write buffer, which means when the CPU writes to Fast RAM it doesn’t have to wait for the write to complete.  Provided completing the write takes priority over any potential reads from the same address, and the Cache is updated to reflect the new data, then the CPU can continue merrily processing.

Secondly, I’ve added a second access slot to the SDRAM controller, which means in many cases the wait for RAM service is reduced from a worst-case 24 cycles to 17 cycles.  The downside is that because RAM operations between the two access slots overlap, they can’t be to the same RAM bank.  For this reason I’ve remapped the RAM so that bank 0 contains Chip RAM, Slow RAM, Kickstart ROM and OSD RAM, leaving banks 1-3 free for Fast RAM.  Chip RAM and Kickstart ROM accesses can thus now overlap with Fast RAM accesses.

Finally, I’ve simplified the TG68 wrapper so that it no longer uses the enaWRreg signal to synchronize the CPU to the Amiga’s 28MHz clock.

Between them these changes give an average speed increase of about 65%. Continue reading

Experimenting with TG68

Part 12b – a better cache

I’ve *finally* found the subtle bug that was causing my two-way cache to fail.  The symptom was that the boot process would appear to work fine, loading an S Record off the SD card, but having loaded it, the code would fail to start.  It was as though the bootloader was ignoring the S9 record at the end of the firmware file.

After various fruitless attempts to track this down, and to construct artificial testcases to trigger the behaviour, I finally added some serial debugging breadcrumbs to the first-stage bootloader.  From this I found that the final S9 record was being treated as thought it were an S1! Continue reading

Experimenting with TG68

Part 12a – a better cache

The TG68MiniSOC project has so far used a very simple cache for the CPU.  The SDRAM controller is set up to use four-word bursts, so the cache simply stores each complete burst.  Under this scheme, when data is read sequentially from RAM, only one read in four will need to wait for the SDRAM controller.  This is the simplest possible example of a Direct Mapped cache, with just a single cacheline. Continue reading