{"id":1178,"date":"2018-03-19T23:38:19","date_gmt":"2018-03-19T23:38:19","guid":{"rendered":"http:\/\/retroramblings.net\/?p=1178"},"modified":"2018-03-19T23:38:19","modified_gmt":"2018-03-19T23:38:19","slug":"improving-the-megadrive-genesis-core","status":"publish","type":"post","link":"http:\/\/retroramblings.net\/?p=1178","title":{"rendered":"Improving the Megadrive \/ Genesis core"},"content":{"rendered":"<p><strong>2018-03-19<\/strong><\/p>\n<p>The Megadrive\/Genesis core has been plagued from the start with graphical issues that result from the SDRAM controller not responding quickly enough.\u00a0 Over the last few days I&#8217;ve finally put some time into understanding the SDRAM controller used by the project and spent some time improving its throughput.<br \/>\n<!--more--><br \/>\nThe existing SDRAM controller was based on the reference implementation supplied by Peter Wendrich for the Chameleon 64, with ports added to better match the needs of the Megadrive core &#8211; but there was quite a lot of dead code not used in this project, making it tricky to work on.\u00a0 So the first thing I did was to refactor the controller, removing dead code, unused ports, and tidying up the code in general.<\/p>\n<p>We have four access ports from the core, one of which is only used during bootup so can be ignored when considering performance.<\/p>\n<p>The ports are as follows:<\/p>\n<ul>\n<li>1 16-bit wide write-only port for writing a ROM image into memory<\/li>\n<li>1 64-bit wide read-only port for reading instructions from a ROM image.\u00a0 This is 64-bits wide so as to take advantage of burst reads.\u00a0 When running program code, most accesses will be sequential, so saving three lots of command setup and reading 64-bits in one burst is a big win.<\/li>\n<li>1 16-bit wide read\/write port from the CPU, for reading and writing data.<\/li>\n<li>1 16-bit wide read\/write port from the chipset.\u00a0 This port taking too long to respond is what&#8217;s causing the graphical issues with the Megadrive core.<\/li>\n<\/ul>\n<p>My game plan for improving throughput was to make use of bank interleaving.\u00a0 The SDRAM we&#8217;re working with has four independent banks, so while we&#8217;re waiting to data to arrive from one bank it&#8217;s perfectly possible to prepare the next bank for reading in advance.\u00a0 This requires the data to be efficiently distributed between banks, which was not the case here.\u00a0 Bits of the incoming address were mapped in order from MSB to LSB, to the SDRAM&#8217;s bank bits, the row bits and finally the column bits, so on the Chameleon&#8217;s SDRAM where we have 2 bank bits, 13 row bits and 9 column bits, we have 24 bits in total, addressing 16 million-odd 16-bit words.<\/p>\n<p>That makes 32 megabytes of RAM, split into 8 megabytes per bank, so the vast majority of games will end up in a single bank, and since the 68000&#8217;s address space is only 16 megabytes in size, the top two banks will never be accessed.<\/p>\n<p>To fix this, I adjusted the address mapping so that the bank bits fall between the row and column bits.\u00a0 The means that the address space is striped across all four banks, in chunks of 1 kilobyte, massively improving the chance of concurrent accesses hitting different banks.<\/p>\n<p>In order to improve the readability of the code, I&#8217;ve created signals within a record (the closest thing in VHDL to a structure in C) and separated the port address mapping from the priority encoding and command dispatching, like so:<\/p>\n<pre>\ttype ramPort_record is record\r\n\t\tramport : ramPorts;\r\n\t\tbank : unsigned(1 downto 0);\r\n\t\trow : row_t;\r\n\t\tcol : col_t;\r\n\t\tudqm : std_logic;\r\n\t\tldqm : std_logic;\r\n\t\tpending : std_logic;\r\n\t\tburst : std_logic;\r\n\t\twr : std_logic;\r\n\tend record;\r\n\ttype ramPort_records is array(3 downto 0) of ramPort_record;\r\n\tsignal ramPort_rec : ramPort_records;\r\n\r\n-- \r\n\r\n-- -----------------------------------------------------------------------\r\n-- Create row, column, bank and pending signals for each port\r\n\r\n\t-- ROM Write port\r\n\r\n\tramPort_rec(0).pending&lt;='1' when (romwr_req \/= romwr_ackReg) and (currentPort \/= PORT_ROMWR) else '0';\r\n\tramPort_rec(0).bank&lt;=romwr_a((colAddrBits+2) downto (colAddrBits+1));\r\n\tramPort_rec(0).row&lt;=romwr_a((colAddrBits+rowAddrBits+2) downto (colAddrBits+3));\r\n\tramPort_rec(0).col&lt;=romwr_a(colAddrBits downto 1);\r\n\tramPort_rec(0).wr&lt;=romwr_we;\t\t\r\n\tramPort_rec(0).burst&lt;='0';\r\n\tramPort_rec(0).ldqm&lt;='0';\r\n\tramPort_rec(0).udqm&lt;='0';\r\n\tramPort_rec(0).ramport&lt;=PORT_ROMWR;\r\n\t\t\r\n\t-- ROM Read port\r\n\t\t\r\n\tramPort_rec(1).pending&lt;='1' when (romrd_req \/= romrd_ackReg) and (currentPort \/= PORT_ROMRD) else '0';\r\n\tramPort_rec(1).bank&lt;=romrd_a((colAddrBits+2) downto (colAddrBits+1));\r\n\tramPort_rec(1).row&lt;=romrd_a((colAddrBits+rowAddrBits+2) downto (colAddrBits+3));\r\n\tramPort_rec(1).col&lt;=romrd_a(colAddrBits downto 3)&amp;\"00\";\r\n\tramPort_rec(1).wr&lt;='0';\r\n\tramPort_rec(1).burst&lt;='1';\r\n\tramPort_rec(1).ldqm&lt;='0';\r\n\tramPort_rec(1).udqm&lt;='0';\r\n\tramPort_rec(1).ramport&lt;=PORT_ROMRD;\r\n\r\n\t\r\n\t-- 68K RAM port\r\n\r\n\tramPort_rec(2).pending&lt;='1' when (ram68k_req \/= ram68k_ackReg) and (currentPort \/= PORT_RAM68K) else '0';\r\n\tramPort_rec(2).bank&lt;=ram68k_a((colAddrBits+2) downto (colAddrBits+1));\r\n\tramPort_rec(2).row&lt;=ram68k_a((colAddrBits+rowAddrBits+2) downto (colAddrBits+3));\r\n\tramPort_rec(2).col&lt;=ram68k_a(colAddrBits downto 1);\r\n\tramPort_rec(2).wr&lt;=ram68k_we;\r\n\tramPort_rec(2).burst&lt;='0';\r\n\tramPort_rec(2).ldqm&lt;=ram68k_l_n;\r\n\tramPort_rec(2).udqm&lt;=ram68k_u_n;\r\n\tramPort_rec(2).ramport&lt;=PORT_RAM68K;\r\n\r\n\t\t\r\n\t-- VRAM port\r\n\r\n\tramPort_rec(3).pending&lt;='1' when (vram_req \/= vram_ackReg) and (currentPort \/= PORT_VRAM) else '0';\r\n\tramPort_rec(3).bank&lt;=vram_a((colAddrBits+2) downto (colAddrBits+1));\r\n\tramPort_rec(3).row&lt;=vram_a((colAddrBits+rowAddrBits+2) downto (colAddrBits+3));\r\n\tramPort_rec(3).col&lt;=vram_a(colAddrBits downto 1);\r\n\tramPort_rec(3).wr&lt;=vram_we;\r\n\tramPort_rec(3).burst&lt;='0';\r\n\tramPort_rec(3).ldqm&lt;=vram_l_n;\r\n\tramPort_rec(3).udqm&lt;=vram_u_n;\r\n\tramPort_rec(3).ramport&lt;=PORT_VRAM;\r\n\r\n<\/pre>\n<p>Since this is all just combinational logic and signal routing, it should have minimal impact on the controller&#8217;s size, if any at all.<\/p>\n<p>Priority encoding is now as simple as this:<\/p>\n<pre>process(clk)\r\nbegin\t\r\n\tramPort_pri&lt;=0; -- Default value set to avoid a latch being created.\r\n\tramPort_req&lt;='0';\r\n\tfor i in 0 to 3 loop\r\n\t\tif ramPort_rec(i).pending='1' then\r\n\t\t\tramPort_pri&lt;=i;\r\n\t\t\tramPort_req&lt;='1';\r\n\t\tend if;\r\n\tend loop;\r\nend process;\r\n<\/pre>\n<p>This gives us a single signal which is high when one or more ports requires service, and sets ramPort_pri to the highest numbered active port. Finally we use the result of the priority encoding to multiplex the ports, like so:<\/p>\n<pre>process(clk)\r\nbegin\r\n\tnextRamState &lt;= RAM_IDLE;\r\n\tnextRamPort &lt;= PORT_NONE;\r\n\tnextRamBank &lt;= \"00\";\r\n\tnextRamRow &lt;= ( others =&gt; '0');\r\n\tnextRamCol &lt;= ( others =&gt; '0');\r\n\tnextLdqm &lt;= '0';\r\n\tnextUdqm &lt;= '0';\r\n\tnextBurst &lt;= '0';\r\n\r\n\tif ramPort_req='1' then\r\n\t\tnextRamState &lt;= RAM_READ_1;\r\n\t\tif ramPort_rec(ramPort_pri).wr = '1' then\r\n\t\t\tnextRamState &lt;= RAM_WRITE_1;\r\n\t\t\tnextLdqm &lt;= ramPort_rec(ramPort_pri).ldqm;\r\n\t\t\tnextUdqm &lt;= ramPort_rec(ramPort_pri).udqm;\r\n\t\tend if;\t\t\t\t\r\n\t\tnextBurst &lt;= ramPort_rec(ramPort_pri).burst;\r\n\t\tnextRamPort &lt;= ramPort_rec(ramPort_pri).ramport;\r\n\t\tnextRamBank &lt;= ramPort_rec(ramPort_pri).bank;\r\n\t\tnextRamRow &lt;= ramPort_rec(ramPort_pri).row;\r\n\t\tnextRamCol &lt;= ramPort_rec(ramPort_pri).col;\r\n\tend if;\r\n\r\nend process;\r\n<\/pre>\n<p>Separating the three distinct aspects of what was previously a monolithic code section should make it easier to work on, especially if I end up adding extra ports. The next**** signals are used by a state machine to perform the actual fetching, and it&#8217;s this state machine where I was able to save some cycles and improve throughput. This I shall describe in detail next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>2018-03-19 The Megadrive\/Genesis core has been plagued from the start with graphical issues that result from the SDRAM controller not responding quickly enough.\u00a0 Over the last few days I&#8217;ve finally put some time into understanding the SDRAM controller used by &hellip; <a href=\"http:\/\/retroramblings.net\/?p=1178\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,8],"tags":[],"class_list":["post-1178","post","type-post","status-publish","format-standard","hentry","category-fpga","category-hardware"],"_links":{"self":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1178","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1178"}],"version-history":[{"count":5,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1178\/revisions"}],"predecessor-version":[{"id":1183,"href":"http:\/\/retroramblings.net\/index.php?rest_route=\/wp\/v2\/posts\/1178\/revisions\/1183"}],"wp:attachment":[{"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1178"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/retroramblings.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}