{"id":1184,"date":"2018-03-25T18:26:41","date_gmt":"2018-03-25T18:26:41","guid":{"rendered":"http:\/\/retroramblings.net\/?p=1184"},"modified":"2018-03-25T20:32:36","modified_gmt":"2018-03-25T20:32:36","slug":"improving-the-megadrive-genesis-core-2","status":"publish","type":"post","link":"http:\/\/retroramblings.net\/?p=1184","title":{"rendered":"Improving the Megadrive \/ Genesis core"},"content":{"rendered":"<p><strong>Part 2: Saving cycles<\/strong><br \/>\n<strong>2018-03-25<\/strong><\/p>\n<p>In the first part of this series I covered some basic tidy-ups to the code to make it easier to maintain.\u00a0 Now I&#8217;ll look at how we can speed things up.<br \/>\n<!--more--><br \/>\nThe first avenue for speeding things up lies in this code:<\/p>\n<pre>\tprocess(clk)\r\n\tbegin\r\n\t\tif rising_edge(clk) then\r\n\t\t\tsd_data &lt;= (others =&gt; 'Z');\r\n\t\t\tif sd_data_ena = '1' then\r\n\t\t\t\tsd_data &lt;= sd_data_reg;\r\n\t\t\tend if;\r\n\t\t\tsd_addr &lt;= sd_addr_reg;\r\n\t\t\tsd_ras_n &lt;= sd_ras_n_reg;\r\n\t\t\tsd_cas_n &lt;= sd_cas_n_reg;\r\n\t\t\tsd_we_n &lt;= sd_we_n_reg;\r\n\t\t\tsd_ba_0 &lt;= sd_ba_0_reg;\r\n\t\t\tsd_ba_1 &lt;= sd_ba_1_reg;\r\n\t\t\tsd_ldqm &lt;= sd_ldqm_reg;\r\n\t\t\tsd_udqm &lt;= sd_udqm_reg;\r\n\t\tend if;\r\n\tend process;\r\n<\/pre>\n<p>This code copies internal versions of the SDRAM control signals to the SDRAM chip, on a clock edge. This is fine, except all the signals in question are already assigned on a clock edge within the state machine, so we&#8217;re effectively delaying them all by an extra clock. If we were running the SDRAM chip at a very high speed, and if these signals were derived from complicated combinational logic chains then doing this would be worth doing for stability, but in the Megadrive core it shouldn&#8217;t be necessary, so we&#8217;ll just comment out the &#8220;if rising_edge(clk) then&#8221; and its associated &#8220;end if;&#8221;. The only other thing we have to do is reduce the number of clocks the state machine waits after asserting _CAS since the data will now arrive from the SDRAM chip one clock sooner.<\/p>\n<p>So that&#8217;s the low-hanging fruit &#8211; where else can we improve throughput?<\/p>\n<p>The SDRAM chip is being clocked as a relatively low rate of 108MHz in this core, so a little over 9ns per cycle. Looking at the datasheets for the SDRAM chips used in the MIST and Chameleon 64 boards, the Active to Read delay and Precharge to Read delays are 18ns or less, which means two clocks will be sufficient, while the SDRAM controller is currently allowing three.\u00a0 This would be necessary if the chip were being clocked faster, and also necessary for the chip used on the DE2 board which has a minimum time of 20ns for both those parameters &#8211; so in the interests of trimming as much time as possible, I&#8217;ve made the delays configurable on a per-board basis.<\/p>\n<p>The next avenue for improving performance is to exploit bank interleaving.\u00a0 The SDRAM chips are split into four distinct banks which are able to operate more-or-less independently, so if a request arrives on different ports simltaneously but need to access different banks, it&#8217;s possible to do some setup for the next access in advance, precharging the previously row and opening the next row on a bank while a read to another bank is in progress.<\/p>\n<p>I&#8217;ve added a signal called preselectBank, which the state machine sets to high any time it knows the bus will be clear for the next cycle, and can thus accept a setup command for the next bank.\u00a0 Setting up the next bank is as easy as this:<\/p>\n<pre>\tif preselectBankPause \/= 0 then\r\n\t\tpreselectBankPause &lt;= preselectBankPause - 1;\r\n\tend if;\r\n\t\t\t\r\n\tif preselectBank='1' and preselectBankPause=0 then\r\n\t\tif nextRamState \/= RAM_IDLE and\r\n\t\t\t\t(currentBank \/= nextRamBank or ramAlmostDone='1') then\r\n\t\t\t-- Do we need to close a row first?\r\n\t\t\tif\tbanks(to_integer(nextRamBank)).rowopen='1' and\r\n\t\t\t\tbanks(to_integer(nextRamBank)).row \/= nextRamRow then\r\n\t\t\t\t-- Wrong row active in bank, do precharge to close the row\r\n\t\t\t\tsd_we_n_reg &lt;= '0';\r\n\t\t\t\tsd_ras_n_reg &lt;= '0';\t\t\t\t\r\n\t\t\t\tsd_ba_0_reg &lt;= nextRamBank(0);\r\n\t\t\t\tsd_ba_1_reg &lt;= nextRamBank(1);\r\n\t\t\t\tbanks(to_integer(nextRamBank)).rowopen &lt;= '0';\r\n\t\t\t\t-- Ensure a gap of at least one clock between preselecion commands\r\n\t\t\t\tpreselectBankPause&lt;=prechargeTiming-1;\r\n\t\t\telsif banks(to_integer(nextRamBank)).rowopen='0' then\r\n\t\t\t\t-- Open the next row\r\n\t\t\t\tsd_addr_reg &lt;= nextRamRow;\r\n\t\t\t\tsd_ras_n_reg &lt;= '0';\r\n\t\t\t\tsd_ba_0_reg &lt;= nextRamBank(0);\r\n\t\t\t\tsd_ba_1_reg &lt;= nextRamBank(1);\r\n\t\t\t\tbanks(to_integer(nextRamBank)).row &lt;= nextRamRow;\r\n\t\t\t\tbanks(to_integer(nextRamBank)).rowopen &lt;= '1';\r\n\t\t\t\t-- Ensure a gap of at least one clock between this and next command\r\n\t\t\t\tpreselectBankPause&lt;=rasCasTiming-1;\r\n\t\t\tend if;\r\n\t\tend if;\r\n\tend if;\r\n<\/pre>\n<p>This, combined with a couple of other timing tweaks, like dispatching precharge and active command directly from the idle state, instead of from the first state in the read process, takes the response time about half way towards where it needs to be to solve the glitching problems.<\/p>\n<p>Once these relatively easy tweaks are done, the only way to improve throughput further is to introduce some caching.\u00a0 There are a number of ways this could be done, but the easiest for me was just to integrate the two-way cache from my TG68MiniSOC and ZPUDemos projects.\u00a0 The only difficulty here is that the cache is set up for 8-word bursts, and the the SDRAM controller&#8217;s currently running in 4-word burst mode, so I&#8217;ve changed the controller to use 8-word bursts, but to terminate the bursts early for any port other than the VRAM port.\u00a0 I&#8217;m not yet certain whether performance would be better if the core used 4-word bursts and cachelines instead, but for now it works pretty well.<\/p>\n<p>The core is still not completely glitch-free, but it&#8217;s to the point where all important in-game elements are visible, making some games playable that really weren&#8217;t beforehand.\u00a0 Once I&#8217;ve tested the core a little more on both platforms, I&#8217;ll make binaries available for download.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Part 2: Saving cycles 2018-03-25 In the first part of this series I covered some basic tidy-ups to the code to make it easier to maintain.\u00a0 Now I&#8217;ll look at how we can speed things up.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,8],"tags":[],"class_list":["post-1184","post","type-post","status-publish","format-standard","hentry","category-fpga","category-hardware"],"_links":{"self":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1184","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1184"}],"version-history":[{"count":3,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1184\/revisions"}],"predecessor-version":[{"id":1188,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1184\/revisions\/1188"}],"wp:attachment":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1184"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1184"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1184"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}