Part 13: Timing closure at last!
In the TG68MiniSOC project, I took the lead from the way the CPU was implemented in the Minimig project and clocked everything at a single high-speed clock.
The TG68 itself can’t run at much more than 32MHz, and it it’s only possible to clock it faster than this because it has a “clkena” signal, which allows it to be halted while the combinational logic inside catches up. While this approach works (mostly) it causes issues for the FPGA synthesis software, which reports severe timing violations when combinational paths are more than a single cycle long. It’s possible to inform the software that these long paths are OK, and anticipated in the design by way of “multicycles”, but finding and specifying them all, without including anything that shouldn’t be included, is long-winded, tedious and error-prone. This issue is the primary cause of the build-to-build stability problems that have dogged several TG68-based projects.
Today, however, I’ve been trying a different approach, with split clocks. This version of the project uses a 100MHz clock for the SDRAM controller and VGA controller, and a 25MHz clock (generated within the same PLL as the 100MHz clock, and thus aligned with it) that runs the CPU, the main state machine and peripherals. Dealing with data transfer between these two clocks turned out to be much easier than I’d anticipated, and simply making sure that all signals from the CPU are registered before being handed over to anything running at the faster clock rate seems to be the main trick.
The big news is that this approach allows the project to achieve timing closure! When building for the venerable DE1, TimeQuest no longer reports any timing violations. I have yet to run any benchmarks on this version of the project, so I don’t yet know what sort of speed penalty the project will suffer, but the increased tidiness of the main state machine, and the added reliability that should come from the tools being able to understand the project’s structure should make the performance penalty well worthwhile.
As always, full source can be found on my github page. I tag the repository at key points, and this post refers to the code tagged as SplitClock_20140405.
Hi 😉
Did you have a chance to run any benchmarks yet? I’m really interested how much the reduced clock will influence the CPU.
Also, did you notice any decrease of implementation size with the slower clock?
Yes, I have – and it seems the slower clock has only a tiny performance impact! I get approximately 4.8 Dhrystone MIPS both with the CPU clocked at 100MHz with wait states, and 4.7 with it clocked at 25MHz.
It doesn’t decrease the resource usage at all, and since I now register a few signals that used to be directly routed into the SDRAM controller, the supporting logic is very slightly larger.
Ages ago i unsuccessfully tried the same. I didn’t succeed within a few hours and thought that TobiFlexx will have had his reason to run the core at 128MHz.
Your writing finally made me give it another try. And gues what? It works 🙂
The main difference is that i can run it with clkena permanently enabled with a 32 Mhz clock while before at 128Mhz with clkena enabled every 4th cycle it was never really stable and even with every 5th cycle enable i still had massive instabilities with most builds.
Now it runs stable at 32Mhz. In the current Atari ST based setup memory then becomes a bottleneck so your sdram chapter is also very welcome. Currently i give my CPU every second 8MHz Atari system cycle and do a full 64 bit burst there resulting in 32MBytes/s which is half the speed a 32MHz TG68 can cope with. But together with 4k instruction and data caches my setup gives me 8-9 times the performance of the stock 8MHz 68000 based Atari ST.
Excellent – glad to hear it’s working in the MIST core. I hope one day to get the Minimig core working on the same basis.
Hi,
I had the same issue with the TG68.
clk_ena did not work well, even with timing constraints.
Finally, I made my own 68000 softcore – the J68 – which is designed to work at the SDRAM speed.
A 100 MHz J68 is equivalent to a 33 MHz 68000, so you should get ~4 MIPS with it.
I do know that by doing the instruction fetch/decode in parallel with the execute, the performance will be almost doubled.
For the moment, I have been too lazy to do it :-).
The big advantage of the J68 is its size : 2000 LEs instead of 3500-4000 LEs.
Best regards,
Frederic
Hi, Alastair, Rok, Frederic,
OK, this might be quite a long message, but I’ll try to keep it short-ish…
If you remember on the Minimig forum, Martin mentioned that he was having lots of problems getting the Minimig core to run on his DE1…
http://www.minimig.net/viewtopic.php?f=9&t=609
Well, it was me who recommended him to buy the DE1, so I feel a bit bad about that now. :p
I’ve spent literally weeks trying to get the newer versions of the core to compile and run reliably on my own DE1.
Only some of the pre-compiled .sof files, like from “minimig-de1-rel6-b2-testing2” actually run? If I try to compile anything from the sources, it generally won’t run?
I’m looking to carry on where Alastair left off with the AGA stuff, but I would really rather use the newest core with 020 support etc.
I even made an attempt to implement the HAM8 mode, and expand the RGB colour output to 24-bit in Denise and Amber…
http://pastebin.com/2d38zg1e
I was also having problems with running Mike Stirling’s ZX Spectrum and BBC cores – they would generally compile fine, but then start acting strangely when run on the board itself. Most of the time they just corrupted the display or froze?
So, I plugged in a lot of the stuff mentioned on the MM forum, and it seemed to have got them running reliably again (also put input / output delay stuff into the SDC file)…
http://pastebin.com/3sT45VaU
I can compile Alastair’s “colortable” Minimig source just fine on Quartus 13.0 SP1 though?. It seems to run perfectly (one I added most of the above parameters to the .QSF file), but using a new core with the OR1200 controller would probably be a good idea.
I can’t get ANY of the OR1200 based cores to boot at all – it does random things with the LEDs, and usually doesn’t send anything at all to the serial port (I’ve tried all sorts of switch settings and checking through the code etc.)
I also have to run the OpenRISC dev image under Oracle VM just to compile the latest FW files and make sure I’m using the most recent ctrl_boot and de1_boot.bin.
It’s very confusing though, as there are so many different versions now.
I believe Alastair’s “colortable” core is booting from the MENUE.SYS file atm?
I sometimes got the FAT16 Error when trying the newer cores, but it got a bit further after I freshly formatted a different 1GB card then copied over DE1_BOOT.BIN first, then KICK.ROM second.
Most of the time, only green LED5, or both LED6+LED5 would stay on, or flash?
The biggest problem is not getting any proper serial output at all with any of the OR1200 based cores, so I can’t tell what errors it’s giving.
I originally had the 90n Spansion Flash on my board, non-EDBLL ISSI SRAM, and the ISSI SDRAM. So I went as far as replacing the SRAM with a new ISSI one (due to the issues with the ZX Speccy / Beeb cores, as they only use SRAM IIRC).
I also replaced the Spansion Flash chip with the 70ns version as on the original boards, just in case it was causing issues with cores that use Flash.
I’m convinced now that this must be down to changes on the newer boards and slight timing differences. The EDBLL SRAM type used on some new boards was probably just compounding the issue and causing more problems?
I’m desperate to get a newer source to compile and run reliably, so we can keep to the same branch with the newest TK68K etc.
I’m not an expert coder, but I really want to try adding the AGA stuff, even if it takes many months. I know the fpgaArcade guys have had AGA working for years, but I don’t think they will be releasing it publicly?
I can sort of read VHDL OK, but I’m much happier working with Verilog, so I’ve even converted the TG68K core to Verilog. 🙂
It compiles OK with the Verilog core, but I’m not sure if it’s properly working yet because this was on the newest core from chaos (which I still can’t get to boot.)
Please help. 😉
Could somebody please have a quick look at my attempt at implementing the HAM8 mode. I think I’ve got the basics of how HAM6 and HAM8 work now, but I’ve screwed up HAM6 somewhere (HAM6 pics in Deluxe Paint are slightly messed up now).
Regards,
Ash.
P.S. Frederic – are you the author of the J68 core?
If so, nice work on helping Grégory with the Atari Jaguar core, it’s a great achievement (as is the Minimig ofc). 🙂
Hi,
I hope everyone else sees your post – I’m not sure if everyone will be notified or just me!
One long-term goal of mine – though I don’t know when it’ll bubble up to the top of my “To Do” list – is to use the split clock technique in the Minimig core, and hopefully then achieve timing closure. This should solve most of the build-to-build stability issues, and make hacking on the core much more enjoyable.
I have to agree, I’m not holding my breath waiting for a source release from the FPGAArcade guys – it may well happen eventually, but in their shoes who wouldn’t want to hold on to the first-mover advantage for as long as possible?
Hi, Alastair,
That’s OK, I’m just glad at least you saw it so far. 😉
I did try to post on the MM forum, but apparently my account hasn’t been activated yet? I think it’s been like that for months tbh.
Yep, I do think almost all of these issues are down to the timing – I see the TimeQuest reports are pretty bad for some of these cores, and maybe it’s just marginal on the older DE1 boards?
As I say, when I’m compiling your colortable example with Quartus 13.0, it runs perfectly, so it’s unlikely to be a serious issue with the board or chips as such.
I’ve tried to learn about timing constraints and setting the global clocks specifically, but as you say, it’s probably down to the way they are routing the clocks through too many blocks / registers etc.
What do you think of my HAM8 attempt? Pretty bad isn’t it? lol
It looks like it’s delaying (or not delaying) some signals somewhere, or I’ve still not routed all the bits properly.
But, the usual graphics modes appear fine when routing through the full 24-bit RGB and padding the lower bits etc.
I had some trouble figuring out the filtering stuff in Amber at first due to the way it adds extra bits etc.
But, this seems to be working OK so far (excluding the HAM6/8 stuff in Denise)…
pastebin.com/Eexcc8A0
Hope it’s OK to post code snippets here and there btw?
I wouldn’t want to violate any GNU stuff.
And here’s my latest Denise source. I have no idea if the extra bitplane and register stuff I’ve added works at all…
http://pastebin.com/0DG31eJB
If you fancy giving it a quick try, you’ll just need to hook up the full 8-bit per colour through to the top level etc.
Like you said in your blog post about the colour table stuff, I’ve also not hooked up the extra resistors on my DE1 either, so haven’t yet seen the smooth slope of the CopperTest. :p
I’m just trying to get my head around all the code atm, and laying the groundwork for the full AGA stuff.
I’m mainly worried about how much work it will be to route the full 32-bit bus from the 68K. I understand it that the 020 mode on the newer sources still work with a 16-bit data bus?
A big leap for the AGA stuff will just to be able to get some games / demos to run at all, even if the graphics are messed up. That way, I / we can stick SigTap or something on there, then slowly start adding the extra logic.
Anywho, I have many questions about the code still.
I know you and the other guys are likely quite busy, but can I maybe contact you via e-mail instead?
True about the fpgaArcade guys too – I’ve been watching the progress of that project for going on 6 years now I think, and I realized early on that they were looking to use the AGA port as a big selling point.
Would be great if they made the source public ofc, but I’m not expecting it any time soon either. So, we’ll have to make our own. 😀
Regards,
Ash.
Oh, and hi to Till as well. 🙂
Apologies, I thought I’d left somebody out.
Ash.