Porting an arcade core to the Turbo Chameleon 64 – Part 1 – 2020-02-26
There are many, many arcade re-creation cores in existence now – and only a handful have been ported to the Turbo Chameleon 64. When I discovered that Rampage exists for both MiSTer and MiST my interest was piqued, because this is a game I played on the Amiga as a kid, and while the Amiga version’s not a bad conversion, the original arcade game is significantly better.
I was looking today at ways of improving the throughput of the EightThirtyTwo CPU. The design as it stands is very simple, and didn’t make any attempt to perform result forwarding or instruction fusing. These are both strategies for improving the performance of certain constructs, and I wasn’t sure which of these two techniques I should use.
In brief, without either mechanism implemented, when the CPU encounters code such as:
li 0 mr r0
it has to wait until the first instruction has finished writing to the tmp register before moving its new contents into the pipeline, and only then finally writing it to r0.
In my last post I touched briefly on the 832a assembler which I wrote as the first part of my solution to improving the code density of compiled C code.
An assembler that takes a single source file and spits out a ready-to-run binary file is not particularly difficult to write, but it’s not particularly useful either – in order to be useful we need to be able to link together multiple code modules.
I’ve joked a few times in this series about being too lazy to write an assember – but it would be more true to say that the stop-gap solution I was using was adequate, so my time was better spent on the more enjoyable aspects of the project. I am now feeling the limitations of using the GNU assembler to produce a bytestream for a target it knows nothing about, and to improve either the performance or code density of the vbcc backend’s output any further, I need to address the problem I’ve had so far with cross-module references…