Obscure obsolete media du jour

Having found that clip of the Pat Metheny Group on YouTube a couple of weeks ago, I wanted to get hold of the DVD from which it was ripped. Unfortunately it seems to be really hard to track down – I can find it easily enough on VHS, and also on Laserdisc, but not on DVD. So I acquired it by… ahem… “other means”. Since I’ve never actually handled a Laserdisc, though, I couldn’t resist the urge to buy a copy from a Stateside seller on EBay.

It arrived yesterday.

sleeve
Disc

It was wrapped in newspaper, and I can honestly say this is a headline I never thought I’d see:

Nuns

Some linker-script magic

In my last post I mentioned that I had to employ some ugly hacks in the boot firmware for my ZPU project, to make sure certain structures ended up in SDRAM rather than the initial Boot ROM.

To illustrate the problem let’s look at a minimal test program:

short inconvenience;

int main(int argc,char **argv)
{
    inconvenience=0x0123;
    return(0);
}

This little program declares a 16-bit word global variable, and then writes to it.  The assembly output produced by

zpu-elf-gcc -Os -S bsstest.c

is as follows:

    .file    "bsstest.c"
.text
    .globl    main
    .type    main, @function
main:
    im 291
    nop
    im inconvenience
    storeh
    im 0
    nop
    im _memreg+0
    store
    poppc
    .size    main, .-main
    .comm    inconvenience,2,4
    .ident    "GCC: (GNU) 3.4.2"

Note the storeh instruction half way down.  That’s the source of my problem.  I’ve implemented storeh in hardware for SDRAM, but not for the BlockRAM-based Boot code, and I’d really like to avoid doing the latter if possible, because doing a 16-bit write to a 32-bit wide RAM is going to be messy and eat up logic elements.  The boot code is also rather on the large side, so it would be nice to avoid storing unitialised data in there at all if possible.
Continue reading

A System-on-Chip in 2300 logic elements!

My experiments with the “small” variant of the ZPU processor have resulted so far in a reasonably functional and tiny System-on-Chip. Supported so far are:

  • ZPU processor with GCC toolchain support
  • SDRAM controller with cache
  • VGA video output, 640x480x16-bit (Dithered on DE1 board)
  • Millsecond counter
  • SD card access
  • UART
  • HEX display (DE1 only)

All within a mere 2300 logic elements.
Continue reading

More Follin Fandom

It doesn’t seem to matter which platform’s sound chip Tim Follin composed for, he always seemed to find some way of squeezing more out of it than anyone previously thought possible. Here are just a few examples, found on YouTube…
Continue reading

Something Completely Different

This isn’t remotely tech-related, but it’s definitely Retro.  I found this on YouTube recently and have to say I love it!

Quite apart from the musicianship, look out for Pat Metheny displaying his usual facial contortions, and Lyle Mays looking for all the world like the carny from Despicable Me!

Sadly the DVD this was ripped from (More Travels) isn’t easy to get hold of now.

More ZPU experiments

Since my last post I’ve been continuing to work on the ZPU small core, adding hardware implementations of the optional instructions while trying to keep the core as small as possible.

Certain instructions are closely related enough that once one is implemented the others come almost for free. Sub, Eq, Neq, Lessthan and Lessthanorequal come under that category.
Continue reading

Speed / size tradeoffs

I’ve been playing some more with the small version of the ZPU core, and have successfully integrated it into a cut-down version of my previous MiniSOC project. The “official” small core only supports BlockRAM access, with external access reserved for IO. I’ve reversed this so that only the stack is in the CPU core’s internal BlockRAM, and program data comes from external RAM.

In addition, I’ve added optional hardware implementations of a few of the emulated instructions, namely mult (the Cyclone II has hardware multipliers, so why not use them?), eq, eqbranch and neqbranch. I think I can add the comparison instructions without bloating the core too much, as well.

By way of a benchmark, I’ve written a simple framebuffer test which simply writes a pattern of ascending longwords into the framebuffer. With the optional instructions disabled, this achieves about 1.9 frames per second, and the CPU takes up 621 logic elements.
With eq and eqbranch/neqbranch in hardware, the frame rate goes up to about 5.25, and the core takes 781 logic elements.

There’s lots still to do – reading from SDRAM is untested, and writes are currently always 32-bit – but the project is available here (currently DE1 toplevel only) for anyone who might be interested.

ZPUFramebufferTest

A Tiny CPU

There are various FPGA projects which could benefit from the existence of a really small CPU core to handle things like loading ROMs from SD card. The Minimig project either uses an external microcontroller (For the original Minimig and now also the MIST board), or throws in a second fully-fledged CPU into the FPGA itself. This is either a second instance of the TG68, or in the case of Chaos’s DE1 port, an OpenRisc CPU.

The only problem with this approach is that the second CPU takes up valuable resources – the OpenRisc CPU is smaller than the TG68, but still takes up over 2000 logic elements, so there’s definitely a need for a really small CPU core. Continue reading

Caching and Bus Snooping

When I first added the Turbo ChipRAM feature to the Minimig core, there were suprisingly few unpleasant side-effects. However, one side-effect did become apparent when I added the Two-way CPU cache.

On a real Amiga, it’s possible for the CPU’s bus and the chipset’s bus to operate independently, so if Fast RAM is available, the CPU can conduct a FastRAM operation at the same time as the chipset is reading or writing from Chip RAM. On the TG68-based Minimig variants, there’s only one type of RAM available – SDRAM, and no matter how they’re handled, Chip RAM and Fast RAM accesses ultimately all end up in the same RAM chip. Chip RAM accesses are coordinated by the Minimig’s chipset emulation, while Fast RAM accesses use a different port in the SDRAM controller and bypass the Minimig chipset emulation entirely, which is much faster. This is the key to my “Turbo ChipRAM” option – which basically allows Chip RAM to be accessed through the SDRAM controller’s Fast RAM port.

The problem is that the Fast RAM port is cached, so if the chipset writes to a piece of Chip RAM which happens to be in the cache, the cached data becomes stale, and next time the CPU reads that address it received the stale data, not the newly-written data.

The solution to this is to perform Bus Snooping. The CPU cache needs to monitor the SDRAM controller’s Chip RAM port, watching for writes, and any time it sees a write to an address that’s in cache, it must either update or flush the cached data.

I’ve taken the easier but slower option here, simply marking the data as invalid, forcing it to be re-read from SDRAM next time the CPU wants to access it. There’s scope to improve performance by updating the cached data and leaving it marked as valid. The cost for this would simply be the extra logic elements needed to store the written data temporarily and write it to the cache.

Source for the Two-way cache with Bus Snooping can be found here – and I’ll try and release a new version of the Chamemleon core in the near future with this change included.