Part 3 – Hello World!
This time round I’ve added the On-screen Display component, and the firmware verifies that it’s working correctly by way of the archetypal “Hello World!” message!
I’ve also added project files for the MIST board, and will add support for a Xilinx-based board in the near future.
The source tree to accompany this part is tagged in the git repo as Step2.
The OSD component itself provides a few hardware registers that can be accessed from software, along with a 512-byte character buffer.
The VHDL interface looks like this:
entity OnScreenDisplay is
reset_n : in std_logic;
clk : in std_logic;
hsync_n : in std_logic; -- Sync inputs from the main core, used to time the
vsync_n : in std_logic; -- window and pixel signals and position the OSD.
enabled : out std_logic;
pixel : out std_logic;
window : out std_logic;
addr : in std_logic_vector(8 downto 0);
data_in : in std_logic_vector(15 downto 0);
data_out : out std_logic_vector(15 downto 0);
reg_wr : in std_logic;
char_wr : in std_logic;
char_q : out std_logic_vector(7 downto 0)
The readable registers are implemented using simple combinational logic and will thus respond within a single clock, so we don’t bother with any kind of req / ack mechanism here, in the interests of keeping things simple.
Address and data from the CPU are placed on addr and data_in, and reg_wr is brought high to trigger a write to a register, and a char_wr is brought high to trigger a write to the character RAM.
data_out and char_q will output data from registers and character RAM, respectively, based on the addr input. This is a constant connection – no req signal is needed. If reading from the registers triggered some kind of action then we’d need a more complete req/ack mechanism here, but since reads are completely passive we don’t need to worry about it in this case.
Part 2 – a simple test core
To demonstrate how the control module is built, we need a core to which we can add the control module. In the interests of keeping the project as simple as possible and avoiding needless distractions, I’ve started a new project for this purpose, which can be found on github at https://github.com/robinsonb5/CtrlModuleTutorial
I shall tag this at key points, and at the time of writing there are two tags in place.
To play with this, check out a local copy of the core, like so:
> git clone https://github.com/robinsonb5/CtrlModuleTutorial.git
> cd CtrlModuleTutorial
> git submodule init
> git submodule update
> git checkout <tag name>
The first tag, called “StartingPoint” contains a VGA test pattern generator for the DE1 board, which has four slightly different test patterns selectable by the DE1’s switches. In the coming parts I shall show how to eliminate the switches and replace them with an On Screen Display.
Part 1 – an overview, and some details of the On-Screen Display
Both the OneChipMSX and PC Engine cores on this site make use the ZPUFlex processor to provide a bootstrap, control and OSD module. Let’s take a closer look at this control module:
The control module needs to provide the following services:
- Load a ROM from SD card (holding the host core off the SD card during the process, if necessary)
- Provide an On-Screen Display, toggled by the F12 key. The on-screen display must be generated in a form that can be merged with the host core’s video output.
- Prevent keystrokes reaching the host core while the OSD is displayed
- Allow various options to be set, and the settings to be read by the host core
- Perform any high-level peripheral translation (keyboard-based gamepad emulation for the PC Engine core, mouse emulation for the OneChipMSX)
NEC PC Engine!
Some months ago Gregory Estrade (AKA Torlus) created a PC Engine FPGA core which is available on GitHub – however his original version was targetted to the DE1 dev board, making use of the Flash memory and switches. I’ve spent my very limited coding time over the last few weeks adding a control module, very similar to the one I added to the OneChipMSX core, and ported it to the Chameleon and MIST boards.
The project now has a page here.
I’ve been intrigued by SymbOS for a while, and being able to play with it was the main reason I wanted to try and port the CPCTrex core to a more current FPGA platform at some point (a task that will be tackled sometime between “one of these days” and “the heat death of the universe”, mainly because of the need for some kind of CompactFlash-to-SD bridge component.) However, SymbOS runs on MSX as well (and could in theory be ported to anything Z80-based provided it has enough RAM), so now I have the OneChipMSX core at my fingertips, I can finally give SymbOS a whirl.
There was just one thing missing – a mouse!
A few weeks ago I pushed the PSoC Creator project for my Sega-6-button-to-CD32 converter project, but anyone wishing to build a converter would need to know the pin mapping. This is defined within the Creator project, but figuring out which pins on the DB9s need to be wired to which terminals on the board would be pretty tedious – so here’s a wiring diagram for the benefit of anyone who wants to have a go.
(Please note, the board shown here is the CY8CKIT-049-42XX – the identical-looking 41XX won’t work because the chip lacks the programmable logic the project uses to create the shift register.)
Part 2: the first milestone.
Today I have successfully booted OneChipMSX on the Turbo Chameleon 64!
Part 1: This should be easy, right?
Some months back I was introduced to OneChipMSX by a keen user of the Turbo Chameleon 64 and various other FPGA boards, in the hope that I’d be able to port the project to the Chameleon.
At the time it was a more complicated proposition than I was capable of tackling, but I’ve learned a great deal since then, so recently decided to look again. Continue reading
… can dance on the head of a pin?
I have no idea – but I do know how many ZPUFlex processors will fit in a DE1 dev board: 25! [Edit: I’ve since managed to squeeze in another one – so 26!]
OK, I’m stretching things a little to claim that 25 fully-featured ZPUFlex CPUs will fit on the DE1, because these have a really tiny ROM and very minimal supporting logic, so Quartus can optimise the hell out of the design – but it still seemed like a fun demo.
Source, as always, can be found on GitHub.
[EDIT: The M4K memory blocks on the DE1’s Cyclone II support 32-bit data width, but not in full dual-port mode, where the data width is limited to 16-bit. This means that no matter how small the ZPU’s ROM/StackRAM, it will occupy a minimum of 2 M4Ks. If it weren’t for this fact, I think the DE1 could take 3 further ZPUs. As it is, it’s possible to add just one more instance of the ZPU, and the code on github now reflects this. The record for the maxmium number of soft CPUs in a single project on the DE1 board is now 26. Unless you know diffferent!]
I wrote once before about the HelloWorld example in my ZPUDemos repository. This demo used the ZPU processor and a minimal UART to send the archetypal “Hello World!” message to the serial port, and used just 684 logic elements to do so.
I’ve revisited the project, and added a new demo to ZPUDemos, which shows how the ZPU’s size can be reduced even further.
By bringing a copy of the zpu_config.vhd file into the local project directory and modifying some of the constants within we can reduce the size of the CPU’s internal address registers. This saves quite some logic, since any unused address bits can’t be automatically optimised out, thanks to the fact that addresses can be written to and from the Stack RAM.
Since the HelloWorld ROM is between 1 and 2kb in size we need to use 11 address bits (10 downto 0), and allow one extra bit to produce some IO space, so we set maxAddrBitIncIO to 11. Since we’re using the de-facto ZPU convention of allowing the MSB to designate IO space, then any operations that don’t involve IO can use a max bit of 10 – thus we leave maxAddrBit set to maxAddrBitIncIO-1.
This change brings the size of the ZPU itself from 549 to 462 logic elements. This combined with adjusting the peripheral code to deal with the narrower address space, and disabling the UART RX brings the entire HelloWorld project from 692 logic elements (there have been tweaks and bug fixes since the previous result of 684 was achieved) down to 546!
Besides logic area, there is another benefit to trimming unused address logic: simplifying the logic – especially of the 32-bit adder which increments the program counter – relaxes the CPU’s timing requirements somewhat, and there’s a noticeable improvement in fmax as the address space narrows.