{"id":1294,"date":"2019-09-26T22:07:37","date_gmt":"2019-09-26T22:07:37","guid":{"rendered":"http:\/\/retroramblings.net\/?p=1294"},"modified":"2019-09-26T22:07:37","modified_gmt":"2019-09-26T22:07:37","slug":"some-real-life-code","status":"publish","type":"post","link":"http:\/\/retroramblings.net\/?p=1294","title":{"rendered":"Some real-life code&#8230;"},"content":{"rendered":"\n<p><strong>The EightThirtyTwo ISA &#8211; Part 7 &#8211; 2019-09-26<\/strong><\/p>\n\n\n\n<p>Last time I said I would show some real-life code for this new ISA.  I&#8217;ve already shown some very simple &#8220;Hello World&#8221; code, and I&#8217;m now in the slow, careful process of implementing a CPU to run this &#8211; and exploring the wonderful world of HDL simulation in the process!  But in the meantime, by far the best way to get a feel for how well an instruction set works is to write some actual code for it, and see which issues and limitations you bump into.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>One of the projects I&#8217;ve been playing with recently involved the <a href=\"https:\/\/github.com\/robinsonb5\/f32c\">f32c<\/a> MIPS-compatible CPU core, which is rather nice, and very fast when running from block RAM.  (I&#8217;ve forked it just to fix a minor initialisation bug which is only relevant if you want the initial PC to be something other than zero.)  With the 32-bit MIPS instruction set, this core has rather poor code-density, however, and I found myself needing to use more block RAM than I wanted just for the boot code.<\/p>\n\n\n\n<p>I was able to bring some SD-card boot code down to under 4kb using LZ4 compression, and thus needed some lightweight LZ4 decompression code &#8211; and I happened upon Arnaud Carr\u00e9&#8217;s <a href=\"https:\/\/github.com\/arnaud-carre\/lz4-68k\">lz4-68k<\/a> project on github, which includes some 68K assembly decompressors, the smallest of which is a mere 74 bytes long.<\/p>\n\n\n\n<p>Since I&#8217;m familiar with 68K assembly, and can fumble my way through MIPS assembly too, I transliterated this routine into MIPS, and while it works, it&#8217;s some 204 bytes in size thanks to the 32-bit code words.  Someone with a better grasp of MIPS assembly may well be able to make it smaller, of course.<\/p>\n\n\n\n<p>Anyhow, having transliterated the routine once already, it seemed like an ideal program to translate into EightThirtyTwo assembly, so that&#8217;s what I&#8217;ve done!<\/p>\n\n\n\n<p>This program decompresses some (headerless) LZ4 data and outputs it to a UART &#8211; or rather the shell window, when using 832Sim (based on my earlier ZPUSim project).  The code looks like this (assembled using the GNU assembler and a bunch of #defines in an include file, since as mentioned in a previous post, I&#8217;m too lazy to write an assembler&#8230;):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><br>\n#include \"assembler.pp\"\n\n\nstart:\n\t\/\/ Setup the stack\n\tli\tIMW2(0x1000)\n\tli\tIMW1(0x1000)\n\tli\tIMW0(0x1000)\n\tmr\tr6\n\n\t\/\/ Setup source and destination pointers\n\tli\tIMW1(compressed-start)\n\tli\tIMW0(compressed-start)\n\tmr\tr0\n\n\tli\tIMW1(decompressed-start)\n\tli\tIMW0(decompressed-start)\n\tmr\tr1\n\tmr\tr2\n\n\t\/\/ Call the depack routine\n\n\tli\tIMW0(PCREL(lz4_depack))\n\tadd\tr7\n\n\tli\t0\n\tstbinc\tr1\t\/\/ Write a zero-termination.\n\n\t\/\/ Now write the depacked buffer to UART.\n\n\tli\tIMW1(decompressed-start)\n\tli\tIMW0(decompressed-start)\n\tmr\tr1\n\n\tli\tIMW1(0xffffffc0)\t\/\/ UART register\n\tli\tIMW0(0xffffffc0)\n\tmr\tr0\n\n.txwait:\n\tli\tIMW1(0x100)\t\/\/ TX Ready flag\n\tli\tIMW0(0x100)\n\tmr\tr2\n\tld\tr0\n\tand\tr2\n\tcond\tEQ\n\t  li\tIMW0(PCREL(.txwait))\n\t  add\tr7\n\n\tldbinc\tr1\n\tcond\tNEQ\n\t  st\tr0\n\t  li\tIMW0(PCREL(.txwait))\n\t  add\tr7\n\n\tcond\tNEX\t\/\/ terminate simulation\n\n\n\/\/\tr0 packed buffer\n\/\/\tr1 destination pointer\n\/\/\tr2 packed buffer end\n\nlz4_depack:\n\tstdec\tr6\n\tli\tPCREL(.tokenLoop)\n\tadd\tr7\n\t\t\t\n.lenOffset:\n\tldbinc\tr0\n\tmr\tr3\n\tli\t8\n\tror\tr3\n\tldbinc\tr0\n\tor\tr3\n\tli\t24\n\tror\tr3\n\n\tmt\tr4\n\tmr\tr5\n\n\tmt\tr1\n\tmr\tr4\n\tmt\tr3\n\tsub\tr4\n\n\tli\tIMW1(PCREL(.readLen-1))\n\tli\tIMW0(PCREL(.readLen))\n\tadd\tr7\n\n\tli\t4\n\tadd\tr5\n.copy:\n\tldbinc\tr4\n\tstbinc\tr1\n\tli\t1\n\tsub\tr5\n\tcond\tNEQ\n\t  li\tIMW0(PCREL(.copy))\n\t  add\tr7\n\t\t\t\n.tokenLoop:\t\n\tldbinc\tr0\n\tmr\tr4\n\tmr\tr5\n\tli\t15\n\tand\tr4\n\tli\t4\n\tshr\tr5\n\tcond\tEQ\n\t  li\tIMW1(PCREL(.lenOffset-1))\n\t  li\tIMW0(PCREL(.lenOffset))\n\t  add\tr7\n\n\tli\tIMW0(PCREL(.readLen))\n\tadd\tr7\n\n.litCopy:\n\tldbinc\tr0\n\tstbinc\tr1\n\tli\t1\n\tsub\tr5\n\tcond\tNEQ\n\t  li\tIMW0(PCREL(.litCopy))\n\t  add\tr7\n\n\tmt\tr2\n\tcmp\tr0\n\tcond\tSGT\n\t  li\tIMW1(PCREL(.lenOffset-1))\n\t  li\tIMW0(PCREL(.lenOffset))\n\t  add\tr7\n\t\t\t\n.over:\n\tldinc\tr6\n\tmr\tr7\n\n.readLen:\n\tstdec\tr6\n\tli\t15\n\tcmp\tr5\n\tcond\tNEQ\n\t  li\tIMW0(PCREL(.readEnd))\n\t  add\tr7\n\n.readLoop:\n\tldbinc\tr0\n\tmr\tr3\n\tadd\tr5\n\tli\tIMW1(255)\n\tli\tIMW0(255)\n\txor\tr3\n\tcond\tEQ\n\t  li\tIMW0(PCREL(.readLoop))\n\t  add\tr7\n\n.readEnd:\n\tldinc\tr6\n\tmr\tr7\n\ncompressed:\n\t.incbin \"compressed.lz4\"\ndecompressed:\n\t.fill\t550,1,-1  \/\/ Reserve space for the decompressed data\n\n<\/pre>\n\n\n\n<p>I was pleasantly surprised by how comfortable this instruction set is to use &#8211; obviously it has its quirks, but I didn&#8217;t feel as though I was fighting against its limitations too much.  But given that my main reason for exploring this is to evaluate code density, how does it stack up against 68k and MIPS?  The answer is &#8220;surprisingly well&#8221;!  The 68k original was 74 bytes, the MIPS transliteration was 204 bytes, and the EightThirtyTwo version is a mere 72 bytes, beating even the 68k original!<br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The EightThirtyTwo ISA &#8211; Part 7 &#8211; 2019-09-26 Last time I said I would show some real-life code for this new ISA. I&#8217;ve already shown some very simple &#8220;Hello World&#8221; code, and I&#8217;m now in the slow, careful process of &hellip; <a href=\"http:\/\/retroramblings.net\/?p=1294\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,8],"tags":[],"class_list":["post-1294","post","type-post","status-publish","format-standard","hentry","category-fpga","category-hardware"],"_links":{"self":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1294","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1294"}],"version-history":[{"count":1,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1294\/revisions"}],"predecessor-version":[{"id":1295,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1294\/revisions\/1295"}],"wp:attachment":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1294"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1294"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1294"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}