ROM implementation in VHDL (not LUT based FPGA)

Guest
HI,

I tried to get a area and speed estimation for a ROM inside my design.
This code should be FPGA independed so I can't use a LUT.

I needed 9000 words with 20 bit. So I expected to have a function with
14 bit input and 20 bit output. Each output bit should depend on 14
input, so I expected to get someting like a 14 to 1 function consisting
of about 5 gates with 4 inputs plus some buffering per bit resulting in
~100-200 complex gates.

To get the estimation I used a constant declaration of a array of stl
vectors and a simple array index by integer (range 0 to 8999) address.

CONSTANT LUT: rom_t := (
"01000000001000110110",
....
);
Value<=LUT(address);

Im very supprised, that this code leads to a very large netlist. If
someone had an idea why this code gets so large? Is my first estimation
wrong or are synthesis tools (tried Synplify, Synopsys still running
after 18 h) so bad on reducing pure logic? Or is my quick and dirty
code that bad?

bye Thomas
 
usenet_10@stanka-web.de wrote:
HI,

I tried to get a area and speed estimation for a ROM inside my design.
This code should be FPGA independed so I can't use a LUT.

CONSTANT LUT: rom_t := (
"01000000001000110110",
....
);
Value<=LUT(address);

Im very supprised, that this code leads to a very large netlist.
If the device has no block ram, a large netlist should be expected.
If the device has block ram, there may be a size mismatch.
I use constant arrays to infer block ROMs, so it is possible.
Read your synthesis and device docs.

-- Mike Treseler
 
Hi,

Mike Treseler schrieb:

usenet_10@stanka-web.de wrote:
HI,

I tried to get a area and speed estimation for a ROM inside my design.
This code should be FPGA independed so I can't use a LUT.

CONSTANT LUT: rom_t := (
"01000000001000110110",
....
);
Value<=LUT(address);

Im very supprised, that this code leads to a very large netlist.

If the device has no block ram, a large netlist should be expected.
If the device has block ram, there may be a size mismatch.
I use constant arrays to infer block ROMs, so it is possible.
Read your synthesis and device docs.
The device has not sufficient RAM and the result should be independend
of RAM to allow easy transfer on ASIC technology.

I still could find no error in my thesis, that there should be a way to
minimize this table into a very small function as each output depends
only on 14 inputs. If your library supports 16 primitive gates with 2
inputs each, containing every possible way to generate an output out of
2 inputs, you would need a tree of 13 gates to produce the necessary
output for one bit for any given table.

The fanout of the 14 inputs would be maximum 20 to get 20 independend
trees.
So I could come along with something like 3 buffers per input (for
maxfanout of 10) and 20 times 13 gates resulting in 272 very primitive
gates. This should be possible in nearly every technology. The delay
should be around 6 gate delays which is OK for synthesis without timing
constraints (and should do in most designs).

Instead Synplify creates something using more than 4k buffers and a lot
of gates and Synopsys has no result after more than 48 hours, but
reports also high fanout gates.

So I'm still trying to figure out how to change my code to get good
results (or someone proving a failure in my thesis *g*)..

bye Thomas
 
Thomas
Your estimates seem to be based on a simple decoder. For
a ROM, there will be more logic. Without reduction of logic
and assuming that a bit is set 10% of the time, this would
be 900x your estimate. Synthesis will reduce this, but
at the cost of time - it is a big reduction problem.

In the past the big factor on synthesis time for a problem
like this was system RAM. Make sure you measure your RAM
in Gbytes.

Cheers,
Jim

HI,

I tried to get a area and speed estimation for a ROM inside my design.
This code should be FPGA independed so I can't use a LUT.

I needed 9000 words with 20 bit. So I expected to have a function with
14 bit input and 20 bit output. Each output bit should depend on 14
input, so I expected to get someting like a 14 to 1 function consisting
of about 5 gates with 4 inputs plus some buffering per bit resulting in
~100-200 complex gates.

To get the estimation I used a constant declaration of a array of stl
vectors and a simple array index by integer (range 0 to 8999) address.

CONSTANT LUT: rom_t := (
"01000000001000110110",
....
);
Value<=LUT(address);

Im very supprised, that this code leads to a very large netlist. If
someone had an idea why this code gets so large? Is my first estimation
wrong or are synthesis tools (tried Synplify, Synopsys still running
after 18 h) so bad on reducing pure logic? Or is my quick and dirty
code that bad?

bye Thomas

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jim Lewis
Director of Training mailto:Jim@SynthWorks.com
SynthWorks Design Inc. http://www.SynthWorks.com
1-503-590-4787

Expert VHDL Training for Hardware Design and Verification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 

Welcome to EDABoard.com

Sponsor

Back
Top