Porting an arcade core to the Turbo Chameleon 64 – Part 1 – 2020-02-26
There are many, many arcade re-creation cores in existence now – and only a handful have been ported to the Turbo Chameleon 64. When I discovered that Rampage exists for both MiSTer and MiST my interest was piqued, because this is a game I played on the Amiga as a kid, and while the Amiga version’s not a bad conversion, the original arcade game is significantly better.
The MiST board is quite similar in some ways to the original Minimig; it has an FPGA which is connected to the onboard memory, the video and audio out, and a separate microcontroller which is connected to the keyboard, mouse, SD card and joystick ports; the microcontroller is reponsible for loading the FPGA core file from SD card, and then communicating with the core via an SPI interface, providing input data, ROM uploads, drive emulation, etc.
The Turbo Chameleon 64 is quite different. While it has the same FPGA (Cyclone III with 25,000 logic elements – or in the case of the Chameleon 64 V2, a Cyclone 10LP, which is basically the same thing in a different package), and 32 megabytes of 16-bit-wide SDRAM with the same layout (13 bits of row address, 9 bits of column address.) input devices and the SD card are purely the core’s responsibility, so to port a core from the MiST we need to replace the missing functionality of the MiST’s microcontroller.
Cores on the Chameleon are stored in flash, and there’s a certain amount of space in each flash slot that can be used for associated ROMs – but the protocol for accessing it is complicated enough that I don’t fancy trying to implement it purely in logic. If you have to go to the lengths of adding a soft microcontroller to a project you might just as well load the data from SD card, so that’s what I’ve done with most of my projects so far.
Initially I wanted to implement a soft-replica of the microcontroller within the FPGA core – which is what was done for the Minimig core. I may well still do this at some point, but as you will soon see there are difficulties with using that approach for Rampage…
The first thing to do is to take a look at the MiST version of the core in question and see how big it is, and whether we’re going to have any difficulty making it fit the Chameleon. The original core can be found in Gehstock’s repository at https://github.com/Gehstock/Mist_FPGA – the one in question is the “Midway MCR 3 Monoboard” core, which supports three games. For now I’m only going to focus on Rampage, though. So having downloaded and built the core, we can take a look at the build report – specifically the “Fitter -> Resource Section” part of the report. The front page says:
Total logic elements: 13,160 / 24,624 ( 53 % )
Total memory bits: 464,096 / 608,256 ( 76 % )
Well that’s encouraging – plenty of logic elements left, and also plenty of memory. But wait – look more closely: deeper within the report we see:
M9Ks: 63 / 66 ( 95 % )
That’s not so good – that means we have only three M9K memory blocks (each one basically a kilobyte) to spare. If we’re going to add a soft microcontroller to deal with the SD card then we’re going to need, realistically, at least 8K of ROM, which means 8 M9Ks spare – and many soft CPUs require at least one memory block for their internal registers.
If nearly all the blocks are in use, but there are still plenty of memory bits free, then the core as written must be making less then optimal use of the memory blocks – and if we look more closely at the fitter report’s “Resource utilisation by entity” section we can see how many memory bits each entity is using. Of particular interest are the entities: “gen_ram:sprlinebuf1a” through 2b: four separate ram blocks of just 1024 bits each, but each using a whole M9K block. Even worse, gen_ram:palette needs just 576 bits but uses up a whole M9K.
It is possible to trade logic usage for ram usage, by defining a “ramstyle” attribute, so let’s do that. In the gen_ram definition, the RAM itself is defined like so:
type ramDef is array(addressRange) of std_logic_vector((dWidth-1) downto 0);
signal ram: ramDef;
So we’ll copy this to a new VHDL file, rename the entity from gen_ram to gen_ram_logic, and immediately after the RAM definition, add the lines:
attribute ramstyle : string;
attribute ramstyle of ram : signal is "logic";
We now change the mcr3mono.vhd file to use the new gen_ram_logic entity instead of gen_ram for the five RAMs identified above, and we now have five extra M9Ks available after building, at the cost of about 4,500 extra logic elements used.
There’s one other trick we can use to free up an M9K: gen_ram:sprite_ram and gen_ram:sprite_ram_cache are both single-port RAMs of 4096 bits. We can combine them into a dual-port RAM of 8192 bits, which would then only use a single M9K. We just need to add an extra address bit, which we tie low for the first RAM and high for the second.