Part 2: DMA
In my last post I’d hooked up a TFT screen to an FPGA dev board and got some simple driver software running on the ZPUFlex CPU core. While it worked, having the CPU spoonfeed a frame of video data, byte-by-byte over an SPI link isn’t ideal, so I implemented a DMA process to handle the hard work.
Having sent the “write” commands to the display, the CPU now writes the framebuffer address, followed by the number of 16-bit words to be transferred to two new registers, which triggers the DMA process.
When the DMA process has completed, an interrupt is triggered, in which the software triggers the next frame transfer.
Without having made any special effort to optimise the SPI transfer, the project currently transfers 19 frames per second over a 50MHz SPI link.
Test patterns are all very well, but I wanted the display to show something more interesting, so I added software to load an image file off the SD card, and shook out a couple of bugs in the CPU core and supporting libraries in the process, most notable in the byte-swap routine used when accessing partition tables from SD card. What I hadn’t realised is that GCC short-circuits certain shift instructions by using a loadb command. This fails when it’s being used to access part of a local variable because I haven’t implemented loadb from stack RAM. To fix this, I’ve simply made the ZPU drop back to emulation if loadb (or loadh) is attempted from stack RAM, while still allowing hardware versions of the instructions for external RAM.