References and a design flaw

The EightThirtyTwo ISA – Part 19 – 2020-03-08

In my experiments a few days ago I noticed an odd problem with the EightThirtyTwo toolchain – namely that the following construct worked just fine:

char string[]="Hello, world!";

but the following didn’t:

char *string="Hello, world!";

Looking at the code generated for this construct, I see that the code emitted is of the form:

_string:
.ref _characters
_characters:
.byte <char1>....

Looks reasonable enough – but after some investigation I realised that the problem is due to the fact that references are initially given a size of zero, and not assigned their actual size until link time, when the smallest size required to represent the reference becomes known. The problem occurs only when there is nothing other than references between two labels; when that happens, the two labels have the same cursor position within the code buffer, which means that the linker doesn’t know to insert the references between them! The label “_characters” will thus be assigned the same address as the label “_string”, even though there should be four bytes between them after linking.

My first attempt at fixing this failed dismally. My idea was to insert a single placeholder byte into the code stream for each reference, and step over this placeholder at link time. It soon became clear that this would just lead to an unholy tangle of code, if I could get it working at all – so another approach was needed.

Despite the fact that I’m using the same “struct symbol” for symbols and references, each section previously contained a separate list of each, with code from each being merged at link time (this was probably the least elegant and readable part of the linker). I decided instead to place both symbols and references in a single list, which guarantees that they’ll remain in the correct order. The flags within the symbol structure were already sufficient to determine whether any particular “struct symbol” was in fact a symbol or a reference, and making this change turned out to be significantly easier than I thought – and made the code somewhat simpler, too – always an indication that you’re on the right track!

Because this means the object format has changed, I’ve bumped the signature word from “832\01” to “832\02”, and the linker will now reject object files produced by previous versions. I’ve also taken the opportunity to widen the flags field of symbols, from 8 to 16 bits. I don’t yet need the extra 8 bits, but the first eight are all used, so now would be a good opportunity to add some headroom.

Leave a Reply

Your email address will not be published. Required fields are marked *