More ZPU experiments

Since my last post I’ve been continuing to work on the ZPU small core, adding hardware implementations of the optional instructions while trying to keep the core as small as possible.

Certain instructions are closely related enough that once one is implemented the others come almost for free. Sub, Eq, Neq, Lessthan and Lessthanorequal come under that category.

The simplest way to compare two values is to subtract one from the other, and see whether the result is positive, negative or zero, so all five instructions need the result of a subtraction: (sp+1)-(sp).

Eq and Neq could be implemented as a direct comparison, but if the result of the subtraction is available anyway, we can compare that result against zero, which is theoretically more efficient. (There’s a good chance that even if we did a bitwise comparison, the synthesis tool would spot that the subtraction result’s available and use it anyway – the tools are very smart these days!)

Thus we perform the subtraction on every rising clock edge, like so:

if COMPARISON_SUB=true then
		- unsigned(memARead(wordSize-1)&memARead);
end if;

We make comparison_sub_result 33 bits wide rather than 32 because we need an extra bit for signed comparisons. If we were only doing unsigned comparisons we wouldn't need to worry about this.

Since we also need to know whether this result is zero to implement four of the instructions, we'll make a comparison_eq signal and assign it like so. This is done outside a clock edge, so it's combinatorial:

if COMPARISON_SUB=true and comparison_sub_result='0'&X"00000000" then
end if;

The implementation of the instructions is surprisingly straightforward. Sub is the simplest:

when State_Sub =>
	memAAddr        <= sp;
	memAWriteEnable <= '1';
	memAWrite       <= comparison_sub_result(wordSize-1 downto 0);
	state           <= State_Fetch;

Eq and Neq are also fairly straightforward. The two instructions are complementary, Eq pushes 1 onto the stack if the two operands are equal, and 0 otherwise. Neq reverses this, pushing 0 if the operands are equal. We could implement the two instructions separately, but it's also possible to implement them together, like so:

when State_EqNeq =>
	memAAddr <= sp;
	memAWriteEnable <= '1';
	memAWrite       <= (others =>'0');
	memAWrite(0) <= comparison_eq xor opcode(4);
	state <= State_Fetch;

The opcode for Eq is 46 ("101110"), and the opcode for Neq is 48 ("110000"), so to reverse the sense of the text for "Neq" we can use a simple exclusive-or against opcode bit 4.

Lessthan and Lessthanorequal are a bit trickier, because they perform signed comparison.
Unsigned comparison is simple - we just have to subtract one operand from the other, and check the highest bit of the result, which tells us whether the result underflowed. For signed comparison we can do exactly the same, but we need one extra bit's headroom in the subtraction result.  That highest bit is zero if op1 is less than or equal to op2, so we need to invert it.  We also need to take care of the difference between lessthan and lessthanorequal.  The code for signed comparison ends up looking like this:

when State_Comparison =>
	memAAddr <= sp;
	memAWriteEnable <= '1';
	memAWrite <= (others => '0');
	memAWrite(0) <= not (comparison_sub_result(wordSize)
		xor (not opCode(0) and comparison_eq));
	state <= State_Fetch;

So how does the core perform with these changes?
My current test case which writes to the framebuffer in halfwords performs like this:

  • Emulated sub, eq, lessthan, etc., and emulated eqbranch/neqbranch: 0.58 fps (579 logic elements)
  • Hardware sub, eq, lessthen, etc., and hardware eqbranch/newbranch: 1.55 fps (745 logic elements)

Full source for the project can be found on github for anyone that's interested.

9 thoughts on “More ZPU experiments

  1. Nice site with several interesting subjects. Question on the Minimig-C3 projects. Even though I’ve read through the readme and todo.txt files I haven’t got it. Are the Minimig-C3 a perfectly working Minimig, just as other variants? If not, what is missing? As I understand, some versions of Minimig are using SRAM and some SDRAM.. is this correct?

    • The original Minimig had 2 meg of SRAM and a real 68000 CPU, along with a separate microcontroller – either a PIC or an ARM – which handles such things as SD card access, and configuring the FPGA at poweron. It’s possible to hack another 2 meg onto the board by piggy-backing the chips.

      The DE1, Chameleon, MIST and C3-board versions of the Minimig all use SDRAM and Tobias Gubener’s TG68 soft CPU. The MIST has a separate ARM microcontroller, but the other SDRAM-based versions all use a second soft-CPU within the FPGA instead.

      The only feature missing from the SDRAM-based versions is that the Action Replay is currently broken. Apart from that, they’re fully-functional, and support real FastRAM (up to 24-meg of it) – unlike the original Minimig.

      The original is still slightly more compatible though.

      The C3 variant would be the easiest version to port to a new device.

      • Nice, I have a couple of FPGA-boards to which all of them connect to my own ArcadeExtender peripheral board. It includes:
        It has:
        – SD-Card/MMC connector
        – VGA connector – 12-bit/4096 colors
        – Stereo Sound – Line out connector
        – Joystick connector
        – PS/2 connector
        – MIDI In connector

        My FPGA-boards are:
        -Xilinx Spartan-3 Starterkit- 200K (too small for Minimig) and 1MB SRAM
        -Digilent Microblaze Starter kit – 1600E (Good FPGA and 64MB DDR memory)
        -Altera/Arrow BeMicroSDK – 64 MB Mobile DDR SDRAM

        Would it be possible with a simple rewrite of the Minimig-C3 for one of those boards? As the current project is Altera I would think that the BeMicroSDK would be a good target? What do you think?

        • I’ve never worked with DDR SDRAM, so I’ve no idea how difficult it will be to adapt the SDRAM controller. Apart from that, it should just be a case of changing the toplevel file – I’ve done my best to keep everything board-specific in the toplevel. The current C3 build takes about 18,500 logic elements, though – I don’t know how big the BeMicro FPGA is?

          • The BemicroSDK is populated with a “Cyclone IV EP4CE22F17C6N” which includes 22’300 LE so it will probably work and is quite close to the current target (24’624 LE).. Might be a tight fit depending on the conversion rate between SDR (560 LE) => DDR-mem. controller (??). Initial build/Synthesis with just swapping the device gives approx. of 19’600 LE so that looks ok. I’ll try to find a Mobile DDR-controller that works..

          • 22KLE should be fine – and I think if you disable SignalTap in the project you should be able to get the LE count down a bit further.

            As for the SDRAM controller, the Minimig’s needs are quite specialised – the SDRAM controller runs on a fixed 16-phase cycle at 113.5MHz, which allows it to remain in sync with the Amiga’s 7.09Mhz clock – so whichever controller you use, you’ll need three r/w ports and a guaranteed response time for the “Chip” port.

            It may turn out to be easier to adapt the existing controller to double-pump the data – but as I say I’ve never played with DDR so I’m not entirely sure what’s involved.

  2. Thanks again for your insightful blog postings. I’m years late regarding this one, but I’m a bit confused regarding the following paragraph:

    “Unsigned comparison is simple – we just have to subtract one operand from the other, and check the highest bit of the result, which tells us whether the result underflowed. For signed comparison we can do exactly the same, but we need one extra bit’s headroom in the subtraction result.”

    Any chance that perhaps signed and unsigned are swapped there? Looking at

    it appears that

    – for signed comparisons it suffices to look at the sign-bit of the result
    – for unsigned comparisons you need one extra bit (“borrow-bit”) and look at that one.

    I’m currently implementing a RISC-V core (32 bit flavor “RV-32I”) as an VHDL exercise, trying to be rather compact in terms of LEs (much like I hope to get the CPU into roughly 1000 LEs) – your blog postings are a very valuable resource regarding frugal FPGA designs and certainly sparked my interest in this area!

    • You’re right that there’s some inaccuracy in here. However, you need the extra bit’s headroom in both signed and unsigned cases. Given two 32-bit operands, the ZPUFlex code currently zero-extends both operands to 32-bit, subtracts one from the other, and looks at result(32) to determine the result of the comparison. In the unsigned case this is sufficient – for signed comparison result(32) is exclusive-ored with (op1(31) xor op2(31)) which takes care of reversing the sense of the results for negative operands.

      Good luck with the project, and let me know if you come up with a better way! 🙂

      • Thanks for you reply! Seems I misread my own table yesterday (d’oh, it’s been in plain sight!), I now see that you indeed need the extra headroom also in the signed case. It appears that the way of xor-ing the extension-bit with the sign-bits of the operands is indeed the most efficient way!

Leave a Reply

Your email address will not be published. Required fields are marked *