Threading Revisited

The EightThirtyTwo ISA – part 18 – 2020-02-19

Having added partial results forwarding I figured I’d better check that I hadn’t broken dual-thread mode in the process. (Spoiler alert: I had!)

My previous dual thread experiments had been pretty simple, with the second thread simply waiting for a signal from an interrupt handler, and printing “Tick” or “Tock” to the terminal. This simply proved that the second thread was basically working and didn’t tell me anything about performance, so I decided to adapt the Dhrystone demo to run two instances simultaneously.

There are certain concurrency issues that must be taken care of here. The first thing thread 1’s “premain” function does is to clear the BSS (uninitialised data) section; thread 2 can’t be allowed to start until this process is complete, otherwise it could start making use of areas that haven’t been cleared yet, and find its work being wiped out.

The elegant solution to this would be to make thread 2 sleep with “cond NEX”, and have thread 1 wake it up somehow (a new “wake” instruction, probably – I still have a tiny bit of encoding space left) – but for now thread 2 simply spins, reading a flag over and over unti thread 1 sets it.

Likewise, once thread 2 has completed the benchmark it must wait until thread 1 has finished reporting results before reporting its own results. Similarly, this is done by busy-waiting on a flag, but could, likewise, ultimately be done in a more elegant fashion by putting the 2nd thread to sleep.

Needless to say, the dual-thread test didn’t work!

I finally tracked down two CPU bugs which were causing problems. The first has been present for a long time: I was always feeding thread 1’s “sign” modifier into the ALU, no matter which thread was running! This causes very subtle errors – where everything seems to work fine until a comparison suddenly gives a wrong result. Easily fixed, though.

The second bug was related to the recent result-forwarding work. I realised when I implemented it that I had to make sure I didn’t accidentally forward results between threads; for this reason I disallowed thread switching while a matched pair of instructions were being executed. It turns out this isn’t sufficient, however – I hadn’t considered what happens when a thread sleeps with “cond NEX”. In that situation, control immediately switches to the other thread, and again, the CPU can accidentally forward a result from the wrong thread. To mitigate this I simply disable forwarding momentarily when a thread enters the sleep state.

With those subtleties taken care of, I can run two instances of the Dhrystone demo simultaneously. So what’s the performance like?

Firstly, I was expecting the two threads to be comparable but still noticably different, since thread 1 has priority – but when I ran the dual Dhrystone test under simulation I saw this:

DDhhrryyssttoonnee  BBeenncchhmmaarrkk,,  VVeerrssiioonn  22..11  ((LLaanngguuaaggee::  CC))  ((21nsdt  tthhrreeaadd))

PPrrooggrraamm ccoommppiilleedd wwiitthhoouutt ''rreeggiisstteerr'' aattttrriibbuuttee

EExxeeccuuttiioonn ssttaarrttss,, 44 rruunnss tthhrroouugghh DDhhrryyssttoonnee

So the two threads are running pretty much in lockstep. That’s encouraging.

Running on real hardware at 133MHz, I see:

(Thread 1)
Microseconds for one run through Dhrystone: 55
Dhrystones per Second: 17946
VAX MIPS rating * 1000 = 10211
...
(Thread 2)
Microseconds for one run through Dhrystone: 55
Dhrystones per Second: 17959
VAX MIPS rating * 1000 = 10219

Very nearly the same speed for both threads. Each significantly slower than when the CPU’s running in single-threaded mode, but when added together, the total throughput is around 20% higher.

It will be interesting to see what difference, if any, running from SDRAM makes to this improvement. In theory, since the CPU will spend more time waiting on memory, it will have more opportunity to do useful work in that downtime – we’ll see what actually happens…

Leave a Reply

Your email address will not be published.