I recently bought a Tang Nano 20K FPGA board, due to my interest being piqued by the Nanomig project. This is the work of Till Harbaum (creator of the MiST project more than a decade ago), and combines this tiny FPGA board with a supporting microcontroller to create what may currently be the tiniest Amiga recreation in existence.
[If you’re looking for the tl;dr – the Gowin GW2AR-18 FPGA on the Tang Nano 20K features a built-in JTAG primitive called GW_JTAG, and the two user IR Scan codes are 0x42 and 0x43.]
Nanomig provides a basic Amiga with 2 megabytes of Chip RAM and the ECS chipset – based on the source of the MiSTer Amiga core – so naturally I wanted to extend it to try and squeeze the most out of this tiny chip.
The Tang Nano 20k has been available for a while now, and it has some interesting features: the Gowin GW2AR-18 FPGA is fairly standard entry-level fare, with quite limited clocking and internal BRAMs which don’t have byte enables. But the most interesting feature of this particular device is that it has 8 megabytes of 32-bit wide SDRAM built into the same chip.
The first step with any new device is to explore the development and debugging facilities. The Gowin IDE is similar in layout and flow to Lattice Diamond, and in fact feels very much like a modernised and decluttered version of Diamond. It’s also surprisingly fast – with simple projects it will have gone from zero to bitstream in less time than it takes Quartus or Vivado to haul their respective lumbering hulks to their feet and start synthesizing.
For debugging there is tool called “GAO” – Gowin Analyzer and Oscilloscope – and this is where things start to break down a little. The Tang Nano has a BL616 microcontroller onboard which provides both a JTAG interface to the FPGA and a UART over USB. A suitable udev rule is supplied which configures this microcontroller appropriately for JTAG access when connected to a Linux PC, but unfortunately in that configuration the UART doesn’t work!
Luckily openFPGAloader works nicely for loading bitstreams even without the Gowin-supplied udev rule – and likewise openocd can talk to the chip just fine.
One facility I’ve used extensively on Altera/Intel FPGAs is the virtual JTAG subsystem, and I’ve used it to implement a debug interface for my EightThirtyTwo CPU. But the virtual JTAG / system-level-debugging stuff can be bypassed and it’s possible to use a JTAG primitive directly, in which case the FPGA uses a pair of reserved IR codes (0x00c and 0x00e on the MAX10, for example) to make your design accessible to the PC via JTAG.
Similarly, both Xilinx and Lattice devices supply a JTAG primitive, and again they use a pair of reserved IR codes (0x02 and 0x03 in the case of Spartan 3E and 0x32 and 0x38 in the case of Lattice ECP5) for bridging a user design to the PC.
I’ve successfully used all of these with the EightThirtyTwo debug interface in the past, so naturally I wanted to do the same with the Gowin device.
Unfortunately the Gowin devices aren’t all that well documented yet. Further, since web forums are dying and a lot of technical discussion now takes place in closed enclaves such as Discord, there’s very little Googlable information on Gowin devices and JTAG.
I went poking around in the files created by creating a GAO instance, and found reference to GW_JTAG – that looks promising…
module GW_JTAG (
tck_pad_i,
tms_pad_i,
tdi_pad_i,
tdo_pad_o,
tck_o, //DRCK_IN
tdi_o, //TDI_IN
test_logic_reset_o, //RESET_IN
run_test_idle_er1_o,
run_test_idle_er2_o,
shift_dr_capture_dr_o,//SHIFT_IN|CAPTURE_IN
pause_dr_o,
update_dr_o, //UPDATE_IN
enable_er1_o, //SEL_IN
enable_er2_o, //SEL_IN
tdo_er1_i, //TDO_OUT
tdo_er2_i //TDO_OUT
)/* synthesis syn_black_box */;
So the pad signals are clearly for the incoming JTAG signals, and the others match pretty closely what the other vendors’ JTAG primitives provide, so it’s looking good so far.
When using the inbuilt SDRAM the pads aren’t declared in the constraints file – instead you simply use “magic” pin names in the toplevel and the toolchain automatically connects them up. On a hunch I tried declaring the *_pad_* signals in the toplevel and connecting them to a GW_JTAG instance – and it worked – but if you’re going to import and route the JTAG signals from the toplevel then you don’t really need the GW_JTAG primitive at all – you can just implement your own JTAG state machine. (The inbuilt one would still be active, though – so you can’t just use any random IR codes, unless you configure the JTAG pins to be regular IO, at which point you’d have to power-cycle to reconfigure the FPGA.)
Next I tried simply leaving the *_pad_* pins unconnected, and it still worked – so the toolchain clearly hooks them up implicity.
The one thing I couldn’t manage was to get the GW_JTAG primitive instantiated in VHDL – it seems to be a Verilog-only thing. I suspect because (a) I had trouble figuring out how to do the /* synthesis syn_black_box */ thing correctly in VHDL, and (b) because leaving inputs unconnected is illegal in VHDL; the component declaration would have to assign a default value to the pad inputs, at which point they’re no longer floating for the toolchain to hook them up appropriately. For that reason, even though most of my JTAG debug adventures have happened in VHDL, I use a small verilog shim around the GW_JTAG primitive.
The next difficulty was to determine the appropriate reserved IR codes for user applications. (In the good old days Google would have been able to help with this.) After I’d found the answer for myself, I discovered that it’s already in the OpenOCD codebase – and with hindsight that’s the first place I should have looked!
In the end I found them by making a small core that monitors JTAG activity and dumps IR codes to a secondary UART on a spare pin (remember I said the UART and GAO can’t be used simultaneously?) then ran GAO – at which point I discovered that the first of the two numbers I was seeking was 0x42. From there it was easy to find the second one, which is 0x43.
So to cut a long story short, I’m now able to run the EightThirtyTwo CPU and talk to the CPU via its debug interface, on Intel, Xilinx, Lattice and now Gowin FPGAs.
I also blogged some time ago about implementing a poor-man’s-SignalTap on the ECP5 FPGA, and I now have this working on the Tang Nano 20k, too.