I wrote once before about the HelloWorld example in my ZPUDemos repository. This demo used the ZPU processor and a minimal UART to send the archetypal “Hello World!” message to the serial port, and used just 684 logic elements to do so.
I’ve revisited the project, and added a new demo to ZPUDemos, which shows how the ZPU’s size can be reduced even further.
By bringing a copy of the zpu_config.vhd file into the local project directory and modifying some of the constants within we can reduce the size of the CPU’s internal address registers. This saves quite some logic, since any unused address bits can’t be automatically optimised out, thanks to the fact that addresses can be written to and from the Stack RAM.
Since the HelloWorld ROM is between 1 and 2kb in size we need to use 11 address bits (10 downto 0), and allow one extra bit to produce some IO space, so we set maxAddrBitIncIO to 11. Since we’re using the de-facto ZPU convention of allowing the MSB to designate IO space, then any operations that don’t involve IO can use a max bit of 10 – thus we leave maxAddrBit set to maxAddrBitIncIO-1.
This change brings the size of the ZPU itself from 549 to 462 logic elements. This combined with adjusting the peripheral code to deal with the narrower address space, and disabling the UART RX brings the entire HelloWorld project from 692 logic elements (there have been tweaks and bug fixes since the previous result of 684 was achieved) down to 546!
Besides logic area, there is another benefit to trimming unused address logic: simplifying the logic – especially of the 32-bit adder which increments the program counter – relaxes the CPU’s timing requirements somewhat, and there’s a noticeable improvement in fmax as the address space narrows.
So, the 549 LEs is a CPU core with one UART, internal memory, timer(?)?
How many RAM/ROM blocks is the synthesis tool using?
Sweet 😉
It’s very minimalist – just the CPU core and Tx-only UART. The ZPU doesn’t use a register file, and the combined ROM / Stack RAM uses 16384 bits of RAM, so 4 M4K blocks on the DE1.
And for entertainment, did you try to run a dhrystone on it?
The Dhrystone ROM is significantly larger than the Hello World ROM, at 16kb, so I can’t run Dhrystone on the ZPU with exactly the same address width as the minimal Hello World – but that only affects the fmax slightly, and not the actual performance per MHz. With all the optional instructions disabled and running entirely from ROM/Stack RAM the ZPU turns in about 1.38 DMIPS @ 100Mhz.
On the DE1’s Cyclone II I can clock the minimal variant at a shade under 140MHz and still meet timing, which gives me about 1.93 DMIPS
If I ignore the timing reports and keep cranking up the clock speed until it doesn’t work any more, I can get it up to 216MHz, giving me 2.99 DMIPS! (My constraints must be overly stringent – there must be a multicycle I’ve missed.)