Some real-life code…

The EightThirtyTwo ISA – Part 7 – 2019-09-26

Last time I said I would show some real-life code for this new ISA. I’ve already shown some very simple “Hello World” code, and I’m now in the slow, careful process of implementing a CPU to run this – and exploring the wonderful world of HDL simulation in the process! But in the meantime, by far the best way to get a feel for how well an instruction set works is to write some actual code for it, and see which issues and limitations you bump into.

One of the projects I’ve been playing with recently involved the f32c MIPS-compatible CPU core, which is rather nice, and very fast when running from block RAM. (I’ve forked it just to fix a minor initialisation bug which is only relevant if you want the initial PC to be something other than zero.) With the 32-bit MIPS instruction set, this core has rather poor code-density, however, and I found myself needing to use more block RAM than I wanted just for the boot code.

I was able to bring some SD-card boot code down to under 4kb using LZ4 compression, and thus needed some lightweight LZ4 decompression code – and I happened upon Arnaud CarrĂ©’s lz4-68k project on github, which includes some 68K assembly decompressors, the smallest of which is a mere 74 bytes long.

Since I’m familiar with 68K assembly, and can fumble my way through MIPS assembly too, I transliterated this routine into MIPS, and while it works, it’s some 204 bytes in size thanks to the 32-bit code words. Someone with a better grasp of MIPS assembly may well be able to make it smaller, of course.

Anyhow, having transliterated the routine once already, it seemed like an ideal program to translate into EightThirtyTwo assembly, so that’s what I’ve done!

This program decompresses some (headerless) LZ4 data and outputs it to a UART – or rather the shell window, when using 832Sim (based on my earlier ZPUSim project). The code looks like this (assembled using the GNU assembler and a bunch of #defines in an include file, since as mentioned in a previous post, I’m too lazy to write an assembler…):


#include "assembler.pp" start: // Setup the stack li IMW2(0x1000) li IMW1(0x1000) li IMW0(0x1000) mr r6 // Setup source and destination pointers li IMW1(compressed-start) li IMW0(compressed-start) mr r0 li IMW1(decompressed-start) li IMW0(decompressed-start) mr r1 mr r2 // Call the depack routine li IMW0(PCREL(lz4_depack)) add r7 li 0 stbinc r1 // Write a zero-termination. // Now write the depacked buffer to UART. li IMW1(decompressed-start) li IMW0(decompressed-start) mr r1 li IMW1(0xffffffc0) // UART register li IMW0(0xffffffc0) mr r0 .txwait: li IMW1(0x100) // TX Ready flag li IMW0(0x100) mr r2 ld r0 and r2 cond EQ li IMW0(PCREL(.txwait)) add r7 ldbinc r1 cond NEQ st r0 li IMW0(PCREL(.txwait)) add r7 cond NEX // terminate simulation // r0 packed buffer // r1 destination pointer // r2 packed buffer end lz4_depack: stdec r6 li PCREL(.tokenLoop) add r7 .lenOffset: ldbinc r0 mr r3 li 8 ror r3 ldbinc r0 or r3 li 24 ror r3 mt r4 mr r5 mt r1 mr r4 mt r3 sub r4 li IMW1(PCREL(.readLen-1)) li IMW0(PCREL(.readLen)) add r7 li 4 add r5 .copy: ldbinc r4 stbinc r1 li 1 sub r5 cond NEQ li IMW0(PCREL(.copy)) add r7 .tokenLoop: ldbinc r0 mr r4 mr r5 li 15 and r4 li 4 shr r5 cond EQ li IMW1(PCREL(.lenOffset-1)) li IMW0(PCREL(.lenOffset)) add r7 li IMW0(PCREL(.readLen)) add r7 .litCopy: ldbinc r0 stbinc r1 li 1 sub r5 cond NEQ li IMW0(PCREL(.litCopy)) add r7 mt r2 cmp r0 cond SGT li IMW1(PCREL(.lenOffset-1)) li IMW0(PCREL(.lenOffset)) add r7 .over: ldinc r6 mr r7 .readLen: stdec r6 li 15 cmp r5 cond NEQ li IMW0(PCREL(.readEnd)) add r7 .readLoop: ldbinc r0 mr r3 add r5 li IMW1(255) li IMW0(255) xor r3 cond EQ li IMW0(PCREL(.readLoop)) add r7 .readEnd: ldinc r6 mr r7 compressed: .incbin "compressed.lz4" decompressed: .fill 550,1,-1 // Reserve space for the decompressed data

I was pleasantly surprised by how comfortable this instruction set is to use – obviously it has its quirks, but I didn’t feel as though I was fighting against its limitations too much. But given that my main reason for exploring this is to evaluate code density, how does it stack up against 68k and MIPS? The answer is “surprisingly well”! The 68k original was 74 bytes, the MIPS transliteration was 204 bytes, and the EightThirtyTwo version is a mere 72 bytes, beating even the 68k original!

Leave a Reply

Your email address will not be published. Required fields are marked *