The EightThirtyTwo ISA – Part 22 – 2020-05-04
What better way to celebrate Star Wars Day than with a multi-CPU code density shoot-out?! (Well, OK, most people can probably name a dozen better ways without even trying – but this is how I’m choosing to spend it!)
I was curious to know just how well the code density of EightThirtyTwo code generated by the VBCC backend stacks up against other architectures, so I compiled a fixed codebase, namely the OSD control module from one of the many Minimig variants, targetting 832, m68k, MIPS, OpenRISC, RISC-V ARM, and even i386 and x86-64. The results make for interesting reading…
Firstly, I should point out that while the 832 code is compiled with VBCC, I used GCC for all the other target architectures. In all cases I used the -Os optimisation setting, the -ffunction-sections and -fdata-sections flags when compiling, and the -Wl,–gc-sections and -Wl,–relax flags when linking. This ensures that no ‘dead’ code is incorporated into the final binaries, and that reference are optimised to their smallest possible form at link time. These settings best mimic what the 832 assembler and linker do.
So firstly, the code in question is a bunch of C code which bangs hardware registers, handles reading from SD card and interfacing with the on-screen menu system from Minimig. It takes very little porting from one CPU to another, so while I haven’t actually run the compiled code on the various CPUs, I have no reason to believe that it wouldn’t work, or is in any way incomplete.
It’s entirely possible that compiler settings could be tuned to improve the results for any of these architectures – I haven’t attempted to do that. Where an architecture has various different versions available, I’ve simply used the one that best suits whichever soft CPU I would choose if I were to use that architecture
The sizes, in bytes, produced for the various architectures were as follows (in descending order of size):
- OpenRISC – 81376
- MIPS (f32c) – 71356
- RISC-V – 69936
- ZPU – 68868
- ARM – 67952
- X86-64 – 66112
- m68k (68000) – 65760
- i386 – 64080
- 832 – 63599
So, that’s a pretty clear victory for EightThirtyTwo, yes?
Well… yes and no. Given that only one CPU on the list has a smaller logic footprint (ZPU), I’m pretty pleased with that result – and I’m actually pleasantly surprised to be beating i386 here, given that there’s been 30-odd years’ experience in optmizing compilers for that architecture. However, what I haven’t taken into account yet are compressed architectures:
MIPS, RISC-V and ARM all have subsets of their instruction sets which can be expressed in 16-bit rather than 32-bit words – and the file sizes when built for those architectures are impressively small:
- RISC-V compressed – 57780
- MIPS16 – 54192
- ARM Thumb – 51436
So ARM Thumb is clearly the winner here, by a significant margin! Nonetheless, I don’t have the option of using thumb code in my FPGA projects – nor even MIPS16. I could, if I wished, use RISC-V compressed (with picorv32), and the other soft CPUs I could easily deploy are m68k (TG68), ARM (Amber), OpenRISC (or1200 or mor1kx), MIPS (f32c), ZPU (ZPUFlex) or – of course – 832.
Very interesting project, but especially the above tests!
Some questions & requests, if you want to answer:
1) VBCC supports some of the ISAs that you tested: why haven’t you used it, instead of GCC? Can you repeat the test with VBCC, so we can have some other useful information on both VBCC quality and how it compares with GCC?
2) Can you report the exact ISA that it was used for RISC-V (G?), ARM (v7?), RISC-V Compressed (GC?), and ARM-Thumb (Thumb-2)?
3) It’ll be nice to have the number of generated instructions, so it can be possible to get this important KPI, and calculate as well the average instruction length (which is always 1 in your case / 832 :D).
4) Similar to the previous point, the total number of dynamic (executed) instructions is also an important KPI. Strictly related to this, it’ll be the average dynamic instruction bytes (which matches the total number of dynamic instructions, in your case / 832).
5) Do you plan to use some standard test suite? Embench is getting more attention in the embedded world, which should be interesting in your area.
Thanks for the interest! In response:
1) I used GCC simply because the 68k version of the firmware was already compiled with that, and I’d already ported it to ZPU, also using GCC, so translating the codebase to other GCC targets was the easiest option. (My goal wasn’t really to do anything rigorously scientific – it was more a smoke test to make sure 832 wasn’t going to create code with even worse density than OpenRISC.) Repeating the tests with VBCC is a great idea – I’ll do that at some point.
2) I’m not familiar enough with ARM or RISC-V to be sure which variants I was using, but the compiler flags were -march=rv31ima for RISC-V, -march=rv32imac for RISC-V compressed, and -mthumb for ARM Thumb. For regular ARM I didn’t specify an ISA, so it’s whatever GCC 9.2.1 defaults to.
3) Yes, indeed, that’s a good idea, but I’m not sure how to find that information? (832’s linker tends to mix rodata and code in the binary since that’s fairly efficient for accessing it – but it makes counting the number of code bytes tricky – I need to make it output some more detailed statistics.)
4) Again, yes, interesting information, and again I’m not sure how to measure it.
5) That would be good. Embench is new to me – I will explore that, thanks!
I’ve to thank you, because I discovered many things which are useful for me, and especially VBCC (I known it, but I wasn’t aware of how simple is to add a new backend). 🙂
Some replies/feedback to a few points.
2) ARM defaults to armv7 with vfpv3 FPU, so the most widespread 32-bit ISA. Your RISC-V configurations are mostly the common ones, except that you didn’t used the FPU and a few memory control extensions.
3) I don’t know if VBCC has some option to compute the number of generated instructions; it isn’t an usual option for a compiler. Maybe VBCC or VASM can be hacked/improved to add this useful metric.
4) This requires some performance counter registers to be used when running the benchmark. Some profiling tools do the job, but I don’t think that you have something like that for your 832.
It looks like that 68k and x86 have some penalty with the default setting of GCC: https://amigaworld.net/modules/newbb/viewtopic.php?topic_id=44169&start=80&post_id=854850&order=0&viewmode=flat&pid=854810&forum=17#854801
Could you please recompile your application for them with -f-omit-frame-pointer ?
Oh, that’s an interesting point – I hadn’t considered that. I don’t have the machine and test rig that I used for these tests any more but I will repeat the experiment at some point with the latest VBCC backend and the latest version of the Minimig OSD firmware.