I’ve been playing some more with the small version of the ZPU core, and have successfully integrated it into a cut-down version of my previous MiniSOC project. The “official” small core only supports BlockRAM access, with external access reserved for IO. I’ve reversed this so that only the stack is in the CPU core’s internal BlockRAM, and program data comes from external RAM.
In addition, I’ve added optional hardware implementations of a few of the emulated instructions, namely mult (the Cyclone II has hardware multipliers, so why not use them?), eq, eqbranch and neqbranch. I think I can add the comparison instructions without bloating the core too much, as well.
By way of a benchmark, I’ve written a simple framebuffer test which simply writes a pattern of ascending longwords into the framebuffer. With the optional instructions disabled, this achieves about 1.9 frames per second, and the CPU takes up 621 logic elements.
With eq and eqbranch/neqbranch in hardware, the frame rate goes up to about 5.25, and the core takes 781 logic elements.
There’s lots still to do – reading from SDRAM is untested, and writes are currently always 32-bit – but the project is available here (currently DE1 toplevel only) for anyone who might be interested.