Part 2 – a simple test core
To demonstrate how the control module is built, we need a core to which we can add the control module. In the interests of keeping the project as simple as possible and avoiding needless distractions, I’ve started a new project for this purpose, which can be found on github at https://github.com/robinsonb5/CtrlModuleTutorial
I shall tag this at key points, and at the time of writing there are two tags in place.
To play with this, check out a local copy of the core, like so:
> git clone https://github.com/robinsonb5/CtrlModuleTutorial.git
> cd CtrlModuleTutorial
> git submodule init
> git submodule update
> git checkout <tag name>
The first tag, called “StartingPoint” contains a VGA test pattern generator for the DE1 board, which has four slightly different test patterns selectable by the DE1’s switches. In the coming parts I shall show how to eliminate the switches and replace them with an On Screen Display.
Part 1 – an overview, and some details of the On-Screen Display
Both the OneChipMSX and PC Engine cores on this site make use the ZPUFlex processor to provide a bootstrap, control and OSD module. Let’s take a closer look at this control module:
The control module needs to provide the following services:
- Load a ROM from SD card (holding the host core off the SD card during the process, if necessary)
- Provide an On-Screen Display, toggled by the F12 key. The on-screen display must be generated in a form that can be merged with the host core’s video output.
- Prevent keystrokes reaching the host core while the OSD is displayed
- Allow various options to be set, and the settings to be read by the host core
- Perform any high-level peripheral translation (keyboard-based gamepad emulation for the PC Engine core, mouse emulation for the OneChipMSX)
NEC PC Engine!
Some months ago Gregory Estrade (AKA Torlus) created a PC Engine FPGA core which is available on GitHub – however his original version was targetted to the DE1 dev board, making use of the Flash memory and switches. I’ve spent my very limited coding time over the last few weeks adding a control module, very similar to the one I added to the OneChipMSX core, and ported it to the Chameleon and MIST boards.
The project now has a page here.
I’ve been intrigued by SymbOS for a while, and being able to play with it was the main reason I wanted to try and port the CPCTrex core to a more current FPGA platform at some point (a task that will be tackled sometime between “one of these days” and “the heat death of the universe”, mainly because of the need for some kind of CompactFlash-to-SD bridge component.) However, SymbOS runs on MSX as well (and could in theory be ported to anything Z80-based provided it has enough RAM), so now I have the OneChipMSX core at my fingertips, I can finally give SymbOS a whirl.
There was just one thing missing – a mouse!
Part 2: the first milestone.
Today I have successfully booted OneChipMSX on the Turbo Chameleon 64!
Part 1: This should be easy, right?
Some months back I was introduced to OneChipMSX by a keen user of the Turbo Chameleon 64 and various other FPGA boards, in the hope that I’d be able to port the project to the Chameleon.
At the time it was a more complicated proposition than I was capable of tackling, but I’ve learned a great deal since then, so recently decided to look again. Continue reading
… can dance on the head of a pin?
I have no idea – but I do know how many ZPUFlex processors will fit in a DE1 dev board: 25! [Edit: I’ve since managed to squeeze in another one – so 26!]
OK, I’m stretching things a little to claim that 25 fully-featured ZPUFlex CPUs will fit on the DE1, because these have a really tiny ROM and very minimal supporting logic, so Quartus can optimise the hell out of the design – but it still seemed like a fun demo.
Source, as always, can be found on GitHub.
[EDIT: The M4K memory blocks on the DE1’s Cyclone II support 32-bit data width, but not in full dual-port mode, where the data width is limited to 16-bit. This means that no matter how small the ZPU’s ROM/StackRAM, it will occupy a minimum of 2 M4Ks. If it weren’t for this fact, I think the DE1 could take 3 further ZPUs. As it is, it’s possible to add just one more instance of the ZPU, and the code on github now reflects this. The record for the maxmium number of soft CPUs in a single project on the DE1 board is now 26. Unless you know diffferent!]
I wrote once before about the HelloWorld example in my ZPUDemos repository. This demo used the ZPU processor and a minimal UART to send the archetypal “Hello World!” message to the serial port, and used just 684 logic elements to do so.
I’ve revisited the project, and added a new demo to ZPUDemos, which shows how the ZPU’s size can be reduced even further.
By bringing a copy of the zpu_config.vhd file into the local project directory and modifying some of the constants within we can reduce the size of the CPU’s internal address registers. This saves quite some logic, since any unused address bits can’t be automatically optimised out, thanks to the fact that addresses can be written to and from the Stack RAM.
Since the HelloWorld ROM is between 1 and 2kb in size we need to use 11 address bits (10 downto 0), and allow one extra bit to produce some IO space, so we set maxAddrBitIncIO to 11. Since we’re using the de-facto ZPU convention of allowing the MSB to designate IO space, then any operations that don’t involve IO can use a max bit of 10 – thus we leave maxAddrBit set to maxAddrBitIncIO-1.
This change brings the size of the ZPU itself from 549 to 462 logic elements. This combined with adjusting the peripheral code to deal with the narrower address space, and disabling the UART RX brings the entire HelloWorld project from 692 logic elements (there have been tweaks and bug fixes since the previous result of 684 was achieved) down to 546!
Besides logic area, there is another benefit to trimming unused address logic: simplifying the logic – especially of the 32-bit adder which increments the program counter – relaxes the CPU’s timing requirements somewhat, and there’s a noticeable improvement in fmax as the address space narrows.
When I was debugging the ZPUFlex CPU core, I found myself using the ever-useful SignalTap to trace what was going on inside the CPU. One technique I wanted to use was to follow the program flow, and compare it against a simulated run through the program, thus spotting CPU bugs where the two diverged. To do this I needed a ZPU simulator.
Part 2: DMA
In my last post I’d hooked up a TFT screen to an FPGA dev board and got some simple driver software running on the ZPUFlex CPU core. While it worked, having the CPU spoonfeed a frame of video data, byte-by-byte over an SPI link isn’t ideal, so I implemented a DMA process to handle the hard work.