Some real-life code…

The EightThirtyTwo ISA – Part 7 – 2019-09-26

Last time I said I would show some real-life code for this new ISA. I’ve already shown some very simple “Hello World” code, and I’m now in the slow, careful process of implementing a CPU to run this – and exploring the wonderful world of HDL simulation in the process! But in the meantime, by far the best way to get a feel for how well an instruction set works is to write some actual code for it, and see which issues and limitations you bump into.

One of the projects I’ve been playing with recently involved the f32c MIPS-compatible CPU core, which is rather nice, and very fast when running from block RAM. (I’ve forked it just to fix a minor initialisation bug which is only relevant if you want the initial PC to be something other than zero.) With the 32-bit MIPS instruction set, this core has rather poor code-density, however, and I found myself needing to use more block RAM than I wanted just for the boot code.

I was able to bring some SD-card boot code down to under 4kb using LZ4 compression, and thus needed some lightweight LZ4 decompression code – and I happened upon Arnaud Carré’s lz4-68k project on github, which includes some 68K assembly decompressors, the smallest of which is a mere 74 bytes long.

Since I’m familiar with 68K assembly, and can fumble my way through MIPS assembly too, I transliterated this routine into MIPS, and while it works, it’s some 204 bytes in size thanks to the 32-bit code words. Someone with a better grasp of MIPS assembly may well be able to make it smaller, of course.

Anyhow, having transliterated the routine once already, it seemed like an ideal program to translate into EightThirtyTwo assembly, so that’s what I’ve done!

This program decompresses some (headerless) LZ4 data and outputs it to a UART – or rather the shell window, when using 832Sim (based on my earlier ZPUSim project). The code looks like this (assembled using the GNU assembler and a bunch of #defines in an include file, since as mentioned in a previous post, I’m too lazy to write an assembler…):



#include "assembler.pp"


start:
	// Setup the stack
	li	IMW2(0x1000)
	li	IMW1(0x1000)
	li	IMW0(0x1000)
	mr	r6

	// Setup source and destination pointers
	li	IMW1(compressed-start)
	li	IMW0(compressed-start)
	mr	r0

	li	IMW1(decompressed-start)
	li	IMW0(decompressed-start)
	mr	r1
	mr	r2

	// Call the depack routine

	li	IMW0(PCREL(lz4_depack))
	add	r7

	li	0
	stbinc	r1	// Write a zero-termination.

	// Now write the depacked buffer to UART.

	li	IMW1(decompressed-start)
	li	IMW0(decompressed-start)
	mr	r1

	li	IMW1(0xffffffc0)	// UART register
	li	IMW0(0xffffffc0)
	mr	r0

.txwait:
	li	IMW1(0x100)	// TX Ready flag
	li	IMW0(0x100)
	mr	r2
	ld	r0
	and	r2
	cond	EQ
	  li	IMW0(PCREL(.txwait))
	  add	r7

	ldbinc	r1
	cond	NEQ
	  st	r0
	  li	IMW0(PCREL(.txwait))
	  add	r7

	cond	NEX	// terminate simulation


//	r0 packed buffer
//	r1 destination pointer
//	r2 packed buffer end

lz4_depack:
	stdec	r6
	li	PCREL(.tokenLoop)
	add	r7
			
.lenOffset:
	ldbinc	r0
	mr	r3
	li	8
	ror	r3
	ldbinc	r0
	or	r3
	li	24
	ror	r3

	mt	r4
	mr	r5

	mt	r1
	mr	r4
	mt	r3
	sub	r4

	li	IMW1(PCREL(.readLen-1))
	li	IMW0(PCREL(.readLen))
	add	r7

	li	4
	add	r5
.copy:
	ldbinc	r4
	stbinc	r1
	li	1
	sub	r5
	cond	NEQ
	  li	IMW0(PCREL(.copy))
	  add	r7
			
.tokenLoop:	
	ldbinc	r0
	mr	r4
	mr	r5
	li	15
	and	r4
	li	4
	shr	r5
	cond	EQ
	  li	IMW1(PCREL(.lenOffset-1))
	  li	IMW0(PCREL(.lenOffset))
	  add	r7

	li	IMW0(PCREL(.readLen))
	add	r7

.litCopy:
	ldbinc	r0
	stbinc	r1
	li	1
	sub	r5
	cond	NEQ
	  li	IMW0(PCREL(.litCopy))
	  add	r7

	mt	r2
	cmp	r0
	cond	SGT
	  li	IMW1(PCREL(.lenOffset-1))
	  li	IMW0(PCREL(.lenOffset))
	  add	r7
			
.over:
	ldinc	r6
	mr	r7

.readLen:
	stdec	r6
	li	15
	cmp	r5
	cond	NEQ
	  li	IMW0(PCREL(.readEnd))
	  add	r7

.readLoop:
	ldbinc	r0
	mr	r3
	add	r5
	li	IMW1(255)
	li	IMW0(255)
	xor	r3
	cond	EQ
	  li	IMW0(PCREL(.readLoop))
	  add	r7

.readEnd:
	ldinc	r6
	mr	r7

compressed:
	.incbin "compressed.lz4"
decompressed:
	.fill	550,1,-1  // Reserve space for the decompressed data

I was pleasantly surprised by how comfortable this instruction set is to use – obviously it has its quirks, but I didn’t feel as though I was fighting against its limitations too much. But given that my main reason for exploring this is to evaluate code density, how does it stack up against 68k and MIPS? The answer is “surprisingly well”! The 68k original was 74 bytes, the MIPS transliteration was 204 bytes, and the EightThirtyTwo version is a mere 72 bytes, beating even the 68k original!

Retro Ramblings

Musings on FPGA and Retro Computing

Leave a Reply Cancel reply