Simulation vs Synthesis

On 11/30/2015 6:32 PM, Simon wrote:
Just to follow up, it definitely is because it's being optimised away. If I
add a port which links to a byte of the stack register space, and link it
to the top-level test bench...

module cpu_6502
(
...
output reg [`NW:0] stackff
);

////////////////////////////////////////////////////////////////////////////
// Set up the stack as a register array
////////////////////////////////////////////////////////////////////////////
reg [`NW:0] stack[0:255]; // Stack-page

always @ (posedge(clk))
stackff <= stack[255];


... the report tells me that bits [31:8] of 'newSPData' are optimised away, but bits [7:0]
are not.

Aside: The report is also saying: "INFO: [Synth 8-5545] ROM "stack_reg[255]" won't be
mapped to RAM because address size (32) is larger than maximum supported(25)"

Am I misunderstanding this, or is my declaration wrong ? I'm trying to
declare 256 8-bit (`NW is defined to be 7) registers to represent a single
page (the 6502 uses page-1 as a stack, so its stack pointer is only 8-bits in size).
256 bytes ought to fit into a 4K-byte block-ram...

Cheers
Simon.

I am not clear what you are trying to do with the stack here. Do you
have a relatively complete CPU implemented?

I would expect something like:
module CPU6502
(
output wire [ 7:0] data_out,
output wire [15:0] address_out,
input wire [ 7:0] data_in,

output wire write_enable,
output wire read_enable,

input wire irq_in,
input wire nmi_in,
input wire clk,
input wire rstn
)

I would expect that you would have an 8 bit stack pointer that would get
muxed onto the address bus, possibly with offsets from the instruction
stream. The newSP value would go into the stack pointer when you are
updating the stack.

RAM would get hung on the address and data buses with block decode logic
to decode the upper address bits into a chip select for the RAM and
peripherals. Since FPGA's don't do tri-state buses, there will be a read
data in mux to select the data source from the addressed bus target for
reads.

sort of like:
module mcu
(
output uart_txd,
input uart_rxd,
input clk,
input rstn
)

wire [15:0] address;
wire [ 7:0] data_out;
wire [ 7:0] ram_data, rom_data, uart_data;
reg [ 7:0] data_in;
wire ram_block_sel, rom_block_sel, uart_block_sel;
wire write_enable, read_enable;

CPU6502 cpu
(
.data_out (data_out),
.data_in (data_in),
.address_out (address),
.write_enable (write_enable),
.read_enable (read_enable),
.irq_in (irq),
.nmi_in (1'b0),
.clk (clk),
.rstn (rstn)
);

RAM_1Kx8 ram
(
.address_in (address[9:0]),
.data_in (data_out),
.data_out (ram_data),
.write_enable (write_enable),
.chip_sel (ram_block_sel)
);


ROM_1Kx8 rom
(
..address_in (address[9:0]),
..data_out (rom_data),
..read_enable (read_enable),
..chip_sel (rom_block_sel)
);

UART uart
(
..txd (uart_txd),
..rxd (uart_rxd),
..reg_select (address[1:0]), // 2 bits of address
..data_in (data_out),
..data_out (uart_data),
..write_enable (write_enable),
..read_enable (read_enable),
..irq_out (irq),
..clk (clk),
..rstn (rstn)
)

// address block decode
assign rom_block_sel = address [15:13] == 3'b111; // top address
assign ram_block_sel = address [15:13] == 3'b000; // bottom address
assign uart_block_sel = address [15:13] == 3'b001;

// read data path mux
always @( * )
begin
case (address[15:13])
3'b000: data_in = ram_data;
3'b001: data_in = uard_data;
3'b111: data_in = rom_data;
default: data_in = 8'h0;
endcase
end

endmodule
 
On Mon, 30 Nov 2015 11:02:14 -0800, Simon wrote:

Thanks for all the replies, guys :)

On Sunday, November 29, 2015 at 11:51:03 PM UTC-8, rickman wrote:

Usually logic is removed because the result is not used anywhere. You
can design and simulate a design only to see the synthesizer remove the
entire thing if it has no outputs that make it to an I/O pin.

So where are the outputs of your register used? Do they actually
connect?

Actually, this may be it. I had tried to counter this by exporting the
databus (both input and output) in the top-level test-bench module, but
thinking about it, the registers it's removing are from code that
exercises the BRK instruction, which only affects the stack-pointer and
program-counter, both of which are internal to the CPU in the design as
it stands, and the BRK instruction is currently the only thing to
manipulate the stack pointer (I'm going alphabetically through the
instruction list, and I've only got as far as EOR :)

At this stage, it's probably OK to simply trust synthesis until the
design is largely complete. I

f your simulation tests are thorough enough, that's what matters.

You can mess with a temporary framework of attributes to preserve
signals, but IMO it's wasted time and effort, especially since what's
"preserve"d through synthesis can still be trimmed by the mapper, so you
might have to push the rope a little harder.

You can possibly stub out blocks (containing some dummy observable, like
an OR gate) and fill them in later.

But I'd probably press ahead with proving the design in simulation until
there was enough to be worth synthesis.

-- Brian
 
Sorry, I didn't really explain the 32-bit newSPData register, did I ? In my defence, my 3-year old was clamouring for his evening meal, and his mother was busy :)

What I'd been trying to do was split up the code into separate areas by module, so generally speaking:

- there's a module ("decode.v") which takes in raw opcodes and outputs the instruction type, and the addressing mode of the opcode (one of {accumulator, Immediate, relative, absolute, zero-page, absolute-indexed-x, absolute-indexed-y, zero-page-indexed-x, zero-page-indexed-y, indirect, indirect-x, indirect-y})

- there's a module ("execute.v") that handles doing the actual work of each opcode, placing the results in intermediate registers (output ports of the module)

- and there's an overall harness-it-all-together module ("cpu_6502" in 6502.v) which instantiates the above

The stack and page-zero are special (as mentioned before) for speed reasons, and I don't know of a way to share an array of registers between modules. For zero-page this isn't an issue, there's only one byte to write, and it can be passed back as 'storeValue' with an 'action' of {UPDATE_A, UPDATE_X, UPDATE_Y} and that byte will be placed in the correct processor register based on the action.

For the stack, though, I need to pass back (so far) up to 3 bytes of data. The BRK instruction simulates an interrupt, pushing (in order) {PC high-byte, PC low-byte, Processor-status-flags} onto the stack, then reading the 2 bytes at the interrupt vector {16'hFFFE,16'hFFFF}, and setting the contents of those two bytes into PC.

The 32-bit (I went for 4 bytes not 3. If it needs to be changed, I can do so later) 'newSPData' register, combined with the 2-bit count 'numSPBytes' is how I implemented passing back the bytes from "execute" to the overall module to update the array of registers that constitute my stack. The "execute" module can pass back up to 4 bytes, and the overall module ("cpu_6502") that actually contains the stack register array will do the right thing, based on 'numSPBytes' if the 'action' contains the bit 'UPDATE_SP'. This is all done in the `EXECUTE stage of the overall module in 6502.v

The addition I made last night was to expose the 255th byte of the stack as an external top-level port (the stack grows downwards, so this is the first byte of the stack) and synthesise. The lower 8 bits of the 'newSPData' register are those that would be inserted into the first position on the stack, and indeed those lower 8 bits were not optimised away. From this, I conclude it is the stack being optimised away that is the root cause of my warning messages.

'stack' was otherwise totally internal to the "cpu_6502" module, and although it had a writer (the `EXECUTE stage can write up to 3 bytes to it), there is currently no reader for those registers. I'm up to 'EOR' (the 6502 uses EOR for what the rest of the world calls XOR), and the first instruction to implement reading the stack is 'PLA'. I might try jumping ahead to implementing that instruction rather than going strictly alphabetically (I didn't want to miss one :)

Hope that clears things up a little.

Cheers
Simon
 
Thanks :)

As I mentioned just above, I might jump ahead and implement PLA (which will force a *read* of the stack values rather than just the current writes) and see if that has an effect.

The simulation tests at the moment are me going through (for every opcode) (for every addressing mode) ...

- Check the decoding
- Check the timing (varies based on addressing mode)
- Check the results

.... in the simulator using the waveforms. It is, however, getting to the point where writing a formal test of each of the above would start to become beneficial. I want to make sure that any later additions don't affect any previous results. It's effort to do so, and my time is limited, but it will actually save time in the long run.

Cheers
Simon
 
On Tuesday, December 1, 2015 at 7:10:40 AM UTC-8, Tom Gardner wrote:
On 01/12/15 14:58, Simon wrote:
I want to make sure that any later additions don't affect any previous results. It's effort to do so, and my time is limited, but it will actually save time in the long run.

That's what suites of test benches are for. The software
world has triumphantly reinvented the concept and called
them "unit tests".

It is normal to have a hierarchy of test suites. Some
can be run frequently because they are are a fast "sanity
check" that just tests simple externally observable
behaviour of whatever unit is being tested. Some tests
are run at major points in the design because they test
the internal operation in detail, and hence are slow.

[grin] I'm well aware what unit tests are for, I've written a *lot* of them in my day job over the last few decades, although admittedly not in verilog :) The problem is not the lack of knowledge (for once), it's the will to sit down and do something that doesn't seemingly advance the project... It's a lot more fun to write code than to write code that tests code...

As I said though, it is getting to the point (in all honesty, it's way past the point) where manual checking of things like this is no longer viable. Unit tests feature in my future ...

Cheers
Simon.
 
On 01/12/15 01:32, Simon wrote:
> ... the report tells me that bits [31:8] of 'newSPData' are optimised away, but bits [7:0] are not.

Is this because the 6502's stack pointer is only 8 bits long?
It can only address 256 bytes of RAM, so bits 8-31 /cannot/ be
used.

From http://www.dwheeler.com/6502/oneelkruns/asm1step.html

Stack Pointer
-------------

When the microprocessor executes a JSR (Jump to SubRoutine)
instruction it needs to know where to return when finished. The 6502
keeps this information in low memory from $0100 to $01FF and uses the
stack pointer as an offset. The stack grows down from $01FF and makes
it possible to nest subroutines up to 128 levels deep. Not a problem
in most cases.
 
On 01/12/15 14:58, Simon wrote:
> I want to make sure that any later additions don't affect any previous results. It's effort to do so, and my time is limited, but it will actually save time in the long run.

That's what suites of test benches are for. The software
world has triumphantly reinvented the concept and called
them "unit tests".

It is normal to have a hierarchy of test suites. Some
can be run frequently because they are are a fast "sanity
check" that just tests simple externally observable
behaviour of whatever unit is being tested. Some tests
are run at major points in the design because they test
the internal operation in detail, and hence are slow.
 
On 01/12/15 16:07, Simon wrote:
On Tuesday, December 1, 2015 at 7:10:40 AM UTC-8, Tom Gardner wrote:
On 01/12/15 14:58, Simon wrote:
I want to make sure that any later additions don't affect any previous results. It's effort to do so, and my time is limited, but it will actually save time in the long run.

That's what suites of test benches are for. The software
world has triumphantly reinvented the concept and called
them "unit tests".

It is normal to have a hierarchy of test suites. Some
can be run frequently because they are are a fast "sanity
check" that just tests simple externally observable
behaviour of whatever unit is being tested. Some tests
are run at major points in the design because they test
the internal operation in detail, and hence are slow.

[grin] I'm well aware what unit tests are for, I've written a *lot* of them in my day job over the last few decades, although admittedly not in verilog :) The problem is not the lack of knowledge (for once), it's the will to sit down and do something that doesn't seemingly advance the project... It's a lot more fun to write code than to write code that tests code...

As I said though, it is getting to the point (in all honesty, it's way past the point) where manual checking of things like this is no longer viable. Unit tests feature in my future ...

:)

Ah, but are you ready for the softies next dogma, "TDD"?
Take a good thing, unit tests, and confidently state
that they are necessary /and sufficient/ for a good product.

None of this BUFD (big[1] up-front design) nonsense. Write
a test, create something that passes the test, and move on
to the next test. Never mind the quality/completeness of
the tests, if you get a green light after running the
tests then /by definition/ it works.

Yup, ignorant youngsters are taught that and believe it :(


[1] in typo veritas: I first wrote "bug" :)
 
Simon <google@gornall.net> wrote:
Sorry, I didn't really explain the 32-bit newSPData register, did I ?
In my defence, my 3-year old was clamouring for his evening meal,
and his mother was busy :)

What I'd been trying to do was split up the code into separate areas
by module, so generally speaking:

(snip)
- there's a module ("execute.v") that handles doing the actual work of
each opcode, placing the results in intermediate registers
(output ports of the module)

(snip)

Early in the processing the synthesis tools flatten the netlist.

That is, all the modules go away. Just a big collection of gates like
one big module. We find it easier to think about logic one module
at a time, but it seems not easier for the computer.

Not so much longer after that, duplicate logic, including duplicate
registers are detected. If you have two registers in different modules
with the same inputs and clocks, one will be removed. (Same module,
too, but that is more obvious to us.)

Later, any logic where the output doesn't go anywhere is removed,
recursively. Also, any logic that has a constant output is removed,
and replaced by the constant.

You might find: https://www.coursera.org/course/vlsicad interesting.

(Even though it has ended, it looks like it will still let you sign up.)

-- glen
 
Simon <google@gornall.net> wrote:
Thanks :)

As I mentioned just above, I might jump ahead and implement PLA
(which will force a *read* of the stack values rather than just the
current writes) and see if that has an effect.

As I said before, one reason to turn off the optimization is to see
how big it will be when it isn't optimized out.

It is sometimes useful to know early how big an FPGA is needed.

But for actual use, you might just as well let it optimize away.

-- glen
 
Simon <google@gornall.net> wrote:

(snip)


I'm well aware what unit tests are for, I've written a *lot*
of them in my day job over the last few decades, although admittedly
not in verilog :) The problem is not the lack of knowledge (for once),
it's the will to sit down and do something that doesn't seemingly
advance the project... It's a lot more fun to write code than to write
code that tests code...

I believe it is one of Brooks' laws of software engineering
(applies here, even though it isn't software):

"Writing the code takes the first 90% of the time,
debugging takes the second 90%."

https://en.wikipedia.org/wiki/The_Mythical_Man-Month

-- glen
 
So this evening I implemented the PLA instruction, which reads from the stack (at the current location of the stack pointer) and stores the value there into A. Synthesis took about 3x as long, and at the end of it there's a whole bunch of Info messages about how it wasn't storing the stack in a block ram for this reason or that.

Looking at the registers, I jumped from ~260 to ~520, so it looks as though the variably-indexed (via SP) set of stack registers were incorporated into the design again :) Phew!

I guess I'll just get on with it and implement more instructions - I was just afraid that as the design got larger, it would be harder to debug. Looks like it might have been easier :)

Thanks again for all the help everyone, especially the verilog examples Bob :)

Simon
 
On 11/30/2015 11:15 PM, Simon wrote:
My solution to the 2-cycle instructions was to declare 2 pages-worth
of registers: page-0, (which is special for the 6502, with special
opcodes that take less time to run if they access there) and the
stack (which is page-1). The 6502 has an 8-bit stack-pointer, that
it always prepends 01h to (to form 16'h01xx), providing a 256-deep stack.
The use of a register array for both these pages significantly helps
when I only have 2 clocks to play with. Obviously when the CPU wants
to store or read values, I need to determine if it's page-0 or page-1
and redirect accordingly, but that's not a high price to pay.

If I understand correctly, the root of the problem you are describing is
that you are trying to use an array of registers as RAM, and it is
optimizing out big chunks or all of it. Trying to build a synthesizable
array of addressable registers is a pain in the butt in Verilog. There
is probably a way to do it with genvars or maybe a for loop, but in the
past I have just brute forced it. Using genvars seems like the a
promising path, but the only exposure to them that I have had is
debugging cases where Xilinx ISE (v14) would not handle them as expected.

The brute force might look like:

module reg_ram
(
input wire [1:0] address,
input wire [7:0] write_data,
input wire write_en,
input wire clk,
input wire rstn,
output reg [7:0] read_data
);

reg [7:0] cell0, cell1, cell2, cell3;

always @(posedge clk or negedge rstn)
if (~rstn)
cell0 <= 8'h0;
else
if (write_en & (address == 2'h0))
cell0 <= write_data;

always @(posedge clk or negedge rstn)
if (~rstn)
cell1 <= 8'h0;
else
if (write_en & (address == 2'h1))
cell1 <= write_data;

always @(posedge clk or negedge rstn)
if (~rstn)
cell2 <= 8'h0;
else
if (write_en & (address == 2'h2))
cell2 <= write_data;

always @(posedge clk or negedge rstn)
if (~rstn)
cell3 <= 8'h0;
else
if (write_en & (address == 2'h3))
cell3 <= write_data;

always @( * )
case (address)
2'h0: read_data = cell0;
2'h0: read_data = cell1;
2'h0: read_data = cell2;
2'h0: read_data = cell3;
endcase
endmodule

As rude as this looks, most of the other structures that I can think of
result in something that looks like a huge barrel shifter and are larger
to implement.
 
On 12/1/2015 11:49 PM, Simon wrote:
So this evening I implemented the PLA instruction, which reads from the stack (at the current location of the stack pointer) and stores the value there into A. Synthesis took about 3x as long, and at the end of it there's a whole bunch of Info messages about how it wasn't storing the stack in a block ram for this reason or that.

Looking at the registers, I jumped from ~260 to ~520, so it looks as though the variably-indexed (via SP) set of stack registers were incorporated into the design again :) Phew!

I guess I'll just get on with it and implement more instructions - I was just afraid that as the design got larger, it would be harder to debug. Looks like it might have been easier :)

Thanks again for all the help everyone, especially the verilog examples Bob :)

You do have an issue if a block RAM is not being used. The code I've
seen looks like you are writing from a functional perspective rather
than structural. I would suggest you write a module for a block RAM
using example code provided by your chip manufacturer. Then incorporate
that RAM module into your code as appropriate.

Block RAM must have a register delay in the RAM itself. There are other
restrictions as well, the details depending on the vendor. If you code
the module by the provider's example you should get a block RAM. This
should also help you see the limitations of how you can use that RAM.

I have had similar problems coding adders when I was trying to use the
carry out. One small issue with how I was using the adder resulting in
a second adder being used to generate the carry out.

--

Rick
 
On 12/1/2015 8:55 PM, BobH wrote:
On 11/30/2015 5:34 PM, rickman wrote:
On 11/30/2015 6:44 PM, BobH wrote:
A mistake that I have made, is to mis-spell the wire connection and then
there is no user for the outputs. The easiest way to check that is to
inspect the simulation at the inputs to the next stage that uses the
data and make sure that they are wiggling as you expect and not showing
undefined as they would for an undriven wire. The second easiest way to
check that is to eyeball the naming for this problem.

If you make a spelling error, won't that be flagged because that signal
hasn't been declared?

Often the auto-wire "feature" will generate a replacement. If you go
through the logs, it is noted, and usually the auto-wire will be a
single wide signal instead of a bus, so it shows up that way too.

That is why VHDL has strong typing, errors like this are made *very* clear.

--

Rick
 
On 11/30/2015 5:34 PM, rickman wrote:
On 11/30/2015 6:44 PM, BobH wrote:
A mistake that I have made, is to mis-spell the wire connection and then
there is no user for the outputs. The easiest way to check that is to
inspect the simulation at the inputs to the next stage that uses the
data and make sure that they are wiggling as you expect and not showing
undefined as they would for an undriven wire. The second easiest way to
check that is to eyeball the naming for this problem.

If you make a spelling error, won't that be flagged because that signal
hasn't been declared?
Often the auto-wire "feature" will generate a replacement. If you go
through the logs, it is noted, and usually the auto-wire will be a
single wide signal instead of a bus, so it shows up that way too.
 
On Tuesday, December 1, 2015 at 9:02:26 PM UTC-8, rickman wrote:
You do have an issue if a block RAM is not being used. The code I've
seen looks like you are writing from a functional perspective rather
than structural. I would suggest you write a module for a block RAM
using example code provided by your chip manufacturer. Then incorporate
that RAM module into your code as appropriate.

But I don't want a block-ram. I don't want to pay the penalty of a clock-cycle for access to the values. I want a block of 256 registers, which I can access with as-close-to-zero time cost as possible. Block-ram's are great, but in this case I really want just a whole bunch of registers.

I'm conscious that something is screwy. I don't understand why an array of registers declared as...

////////////////////////////////////////////////////////////////////////////
// Set up zero-page as register-based for speed reasons
////////////////////////////////////////////////////////////////////////////
reg [`NW:0] zp[0:255]; // Zero-page

.... should exhibit a whole bunch of warnings along the lines of

INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because address size (32) is larger than maximum supported(25)"

Um, que ? Address size == 32 ? Even if you treat it as a 1-bit array, that's only 11 bits of address (8 * 256 = 2048) to access any given bit. Hmm, now there's a thought. I wonder if declaring:

reg [2047:0] zp;

... and doing the bit-selections might be a way to do it. No array, just a freaking huge register. I wonder how efficient it is at ganging up LUTs to make a combined single register...

I actually might try implementing a module along the lines of BobH's code above - rather than just declaring the register array, and see how that works out. At the moment I'm busy writing unit tests :)

Cheers
Simon
 
In article <n3lkei021un@news3.nntpjunkie.com>,
BobH <wanderingmetalhead.nospam.please@yahoo.com> wrote:
The brute force might look like:

module reg_ram
(
input wire [1:0] address,
input wire [7:0] write_data,
input wire write_en,
input wire clk,
input wire rstn,
output reg [7:0] read_data
);

reg [7:0] cell0, cell1, cell2, cell3;

always @(posedge clk or negedge rstn)
if (~rstn)
cell0 <= 8'h0;
else
if (write_en & (address == 2'h0))
cell0 <= write_data;

always @(posedge clk or negedge rstn)
if (~rstn)
cell1 <= 8'h0;
else
if (write_en & (address == 2'h1))
cell1 <= write_data;

snip
case (address)
2'h0: read_data = cell0;
2'h0: read_data = cell1;
2'h0: read_data = cell2;
2'h0: read_data = cell3;
endcase
endmodule

As rude as this looks, most of the other structures that I can think of
result in something that looks like a huge barrel shifter and are larger
to implement.
snip

Huh. I missed what led up to this, but explicity coding up each case
like this is entirely unneccesary in verilog.

reg [ 7 : 0] cell [ 3 : 0];
always @( posedge clk ) // NO ASYNC RESET - messes up optimization - no reset at all actually is prefered
if( write_en )
cell[ address ] <= write_data;

always @*
read_data = cell[ address ];

Done. If reset's are needed then it won't map to block RAM.
Xilinx has examples in their docs for how to successfully infer block RAM.

Regards,

Mark
 
On Wednesday, December 2, 2015 at 8:28:10 AM UTC-8, rickman wrote:
I actually might try implementing a module along the lines of BobH's
code above - rather than just declaring the register array, and see
how that works out. At the moment I'm busy writing unit tests :)

Now I am lost again. Why are you trying to change the code that is
giving you 256 registers? The only RAM in FPGAs these days is
synchronous RAM. If you don't want the address register delay then your
only choice is to use fabric FFs.

Maybe I'm reading/understanding it incorrectly - it looks to me that there's an always @ (posedge(clk)) dependency for writes - but I'm relatively fine with that - I won't need the data until the next clock anyway if I'm writing, because that's how the 6502 worked.

For reads, it looked to me as though it used always @ (*), and I (perhaps incorrectly) thought that would get me the results on the module's data bus as soon as the 'address' lines changed.

As for why to change it, I don't like it when I don't understand the error/info messages the tool is giving me. Given my (relatively limited) understanding of what the synthesis tool is actually *doing* under the hood, it probably means I'm not getting what I actually want, or if I am, it's in some highly-inefficient manner. Your comment about inferring extra adders unnecessarily is pretty relevant I feel :)

It does tie me to a single write/read per clock, whereas I could set N registers per clock (and thus "push" 3 elements onto the stack for the BRK instruction in a single clock for example), but I'm actually ok with that too, I think. The 6502 only had 1 databus, so *it* took multiple clocks to do multiple writes as well.

Its entirely possible my understanding of the module is flawed. I'm happy to be corrected :)

Cheers
Simon
 
On 12/2/2015 11:08 AM, Simon wrote:
On Tuesday, December 1, 2015 at 9:02:26 PM UTC-8, rickman wrote:

You do have an issue if a block RAM is not being used. The code
I've seen looks like you are writing from a functional perspective
rather than structural. I would suggest you write a module for a
block RAM using example code provided by your chip manufacturer.
Then incorporate that RAM module into your code as appropriate.

But I don't want a block-ram. I don't want to pay the penalty of a
clock-cycle for access to the values. I want a block of 256
registers, which I can access with as-close-to-zero time cost as
possible. Block-ram's are great, but in this case I really want just
a whole bunch of registers.

Ok, I understand better now.


I'm conscious that something is screwy. I don't understand why an
array of registers declared as...

////////////////////////////////////////////////////////////////////////////
// Set up zero-page as register-based for speed reasons
////////////////////////////////////////////////////////////////////////////
reg [`NW:0] zp[0:255]; // Zero-page

... should exhibit a whole bunch of warnings along the lines of

INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because
address size (32) is larger than maximum supported(25)"

Um, que ? Address size == 32 ? Even if you treat it as a 1-bit array,
that's only 11 bits of address (8 * 256 = 2048) to access any given
bit. Hmm, now there's a thought. I wonder if declaring:

reg [2047:0] zp;

.. and doing the bit-selections might be a way to do it. No array,
just a freaking huge register. I wonder how efficient it is at
ganging up LUTs to make a combined single register...

I actually might try implementing a module along the lines of BobH's
code above - rather than just declaring the register array, and see
how that works out. At the moment I'm busy writing unit tests :)

Now I am lost again. Why are you trying to change the code that is
giving you 256 registers? The only RAM in FPGAs these days is
synchronous RAM. If you don't want the address register delay then your
only choice is to use fabric FFs.

--

Rick
 

Welcome to EDABoard.com

Sponsor

Back
Top