Things are getting interesting with the ZPU core, and it’s now pretty much where I want it for inclusion into other projects.
For use as a ROM loader in the PACE project I needed something that wouldn’t eat either too many logic elements or Block RAMs, and the ZPU looked to be the ideal candidate. It may also be possible to use it as an IDE-to-SD Card bridge for certain older projects that expect to run from Compact Flash card, but time will tell.
My first experiments with loading files from SD card worked out fairly well, but the code size was quite large – I was struggling to stay within 8KB (16 M4K blocks!), and had not a hope of squeezing it into 4KB. A big part of the problem was the emulation block at the beginning of the ROM. The first kilobyte or so in a ZPU program is devoted to software implementations of the optional instructions. Whether this code is ever used depends upon which variant of the CPU core is in use, and also which compiler flags are supplied when building the program, so my goal over the last few days has been to eliminate the need for emulated instructions in the Boot ROM without bloating the ZPU itself too much.
It’s possible to prevent the compiler emitting some of the optional instructions, by way of the -mno-??? switches. A complete list can be found by typing “zpu-elf-gcc –target-help”.
The first snag is that the pre-compiled toolchain has a bug which prevents the -mno-neg switch from working. I’ve submitted a patch, and the repo here contains the fix, but there’s no fixed binary as yet.
The -mno-*shift* flags also don’t work, and cause GCC to throw an internal error instead of emitting equivalent code. This means that in order to remove the emulation table I have no choice but to implement the shift instructions in HDL. This I have now done, along with a few other instructions. If I ask the compiler not to emit a select few instructions, like so:
-mno-poppcrel -mno-pushspadd -mno-callpcrel -mno-byteop -mno-shortop -mno-neg
Then I can get away with omitting the exception table, provided I don’t use division or modulo in the Boot ROM.
In combination with enabling “relaxation” while linking, by supplying the following argument to the linker:
the Boot ROM is now under 4k.
The Boot ROM currently loads a splash screen into SDRAM, then loads and runs dhry.bin, which does a Dhrystone benchmark and prints the result to the RS232 serial port.
The CPU itself now takes just under 965 logic elements, runs at 133Mhz and turns in a shade over 1.5 DMIPS when running from uncached SDRAM.
The entire SoC with CPU, SDRAM controller, VGA controller, SPI and UART takes 2001 logic elements.
For anyone interested in playing with this, source is on my GitHub page, and I’ve also made a binary snapshot for the DE1 board, based on this tag.
(Put dhry.bin and splash.raw on an SD card, and upload the .sof file to the DE1 board.)