… can dance on the head of a pin?
I have no idea – but I do know how many ZPUFlex processors will fit in a DE1 dev board: 25! [Edit: I’ve since managed to squeeze in another one – so 26!]
OK, I’m stretching things a little to claim that 25 fully-featured ZPUFlex CPUs will fit on the DE1, because these have a really tiny ROM and very minimal supporting logic, so Quartus can optimise the hell out of the design – but it still seemed like a fun demo.
Source, as always, can be found on GitHub.
[EDIT: The M4K memory blocks on the DE1’s Cyclone II support 32-bit data width, but not in full dual-port mode, where the data width is limited to 16-bit. This means that no matter how small the ZPU’s ROM/StackRAM, it will occupy a minimum of 2 M4Ks. If it weren’t for this fact, I think the DE1 could take 3 further ZPUs. As it is, it’s possible to add just one more instance of the ZPU, and the code on github now reflects this. The record for the maxmium number of soft CPUs in a single project on the DE1 board is now 26. Unless you know diffferent!]
I wrote once before about the HelloWorld example in my ZPUDemos repository. This demo used the ZPU processor and a minimal UART to send the archetypal “Hello World!” message to the serial port, and used just 684 logic elements to do so.
I’ve revisited the project, and added a new demo to ZPUDemos, which shows how the ZPU’s size can be reduced even further.
By bringing a copy of the zpu_config.vhd file into the local project directory and modifying some of the constants within we can reduce the size of the CPU’s internal address registers. This saves quite some logic, since any unused address bits can’t be automatically optimised out, thanks to the fact that addresses can be written to and from the Stack RAM.
Since the HelloWorld ROM is between 1 and 2kb in size we need to use 11 address bits (10 downto 0), and allow one extra bit to produce some IO space, so we set maxAddrBitIncIO to 11. Since we’re using the de-facto ZPU convention of allowing the MSB to designate IO space, then any operations that don’t involve IO can use a max bit of 10 – thus we leave maxAddrBit set to maxAddrBitIncIO-1.
This change brings the size of the ZPU itself from 549 to 462 logic elements. This combined with adjusting the peripheral code to deal with the narrower address space, and disabling the UART RX brings the entire HelloWorld project from 692 logic elements (there have been tweaks and bug fixes since the previous result of 684 was achieved) down to 546!
Besides logic area, there is another benefit to trimming unused address logic: simplifying the logic – especially of the 32-bit adder which increments the program counter – relaxes the CPU’s timing requirements somewhat, and there’s a noticeable improvement in fmax as the address space narrows.
When I was debugging the ZPUFlex CPU core, I found myself using the ever-useful SignalTap to trace what was going on inside the CPU. One technique I wanted to use was to follow the program flow, and compare it against a simulated run through the program, thus spotting CPU bugs where the two diverged. To do this I needed a ZPU simulator.