Estimating ROM gate count in ASIC

K

Kevin Neilson

Guest
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.
 
On Thursday, December 6, 2018 at 4:02:25 PM UTC-7, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

I meant each column would require D/4-1 ORs. Approximately.
 
On Thursday, December 6, 2018 at 4:05:29 PM UTC-7, Kevin Neilson wrote:
On Thursday, December 6, 2018 at 4:02:25 PM UTC-7, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

I meant each column would require D/4-1 ORs. Approximately.

I was right the first time. D/2-1
 
Am Freitag, 7. Dezember 2018 00:16:38 UTC+1 schrieb Kevin Neilson:
On Thursday, December 6, 2018 at 4:05:29 PM UTC-7, Kevin Neilson wrote:
On Thursday, December 6, 2018 at 4:02:25 PM UTC-7, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

Not so easy as you think as the content of the ROM has a very strong influence on the result.

Assume a simple ROM content with 50% '0' and 50% '1'

Bit 3210
-------------
Adr 000: 0000
Adr 001: 1010
Adr 010: 1100
Adr 011: 1110
Adr 100: 1000
Adr 101: 1010
Adr 110: 1100
Adr 111: 1111

This is
RomData(0)= Adr(0) AND Adr(1) AND Adr(3)
RomData(1)= Adr(0)
RomData(2)= Adr(1)
RomData(3)= Adr(0) OR Adr(1) OR ADR(3)

Now you can very easy scramble the content slightly to be no longer able to build that simple terms.

In worst case you end up with a full DNF for each Bit.

Each bit of datawidth would in worst case need a term in DNF with a number of clauses equal to the number of bit ='1' and each clause in DNF having addresswidth number of variables.

For a ROM of 10 bit address with equal distributed '1' and '0' this means 512 clauses of 10 variables in DNF is your upper limit per databit.

This would be 512 AND gates with 10 inputs and one OR gate with 512 inputs.

As no ASCI technology has an OR gate with 512 inputs or an AND gate with 10 inputs, you need to build this using a gate tree => 5 AND3 per clause and ~300 OR3 for the or-tree. This means something like 2.9k Gates with three inputs per bit instead of your assumed Depth/2-1 gates.

As you see it might still be possible to have a regularity in the formula that can reduce this to a simple gate per data bit, but in general random ROM data tend to have only low possibilities of term reduction and for larger depth of RAM you cannot see on first glance if any reduction is possible at all.

bye Thomas
 
Not so easy as you think as the content of the ROM has a very strong influence on the result.

Assume a simple ROM content with 50% '0' and 50% '1'

Bit 3210
-------------
Adr 000: 0000
Adr 001: 1010
Adr 010: 1100
Adr 011: 1110
Adr 100: 1000
Adr 101: 1010
Adr 110: 1100
Adr 111: 1111

Maybe I should have specified that depth D is large. A formula will not be accurate for tiny ROMs but should be as D increases.
 
On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

That would be pretty easy. Consider the costs of a D wide multiplexer with 1 or 0 on each input. That would be an upper bound in any case.

I believe my text book of many years ago used one of the input variables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Rick C.

Tesla referral code - https://ts.la/richard11209
Get 6 months of free supercharging
 
On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del...@gmail.com wrote:
On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

That would be pretty easy. Consider the costs of a D wide multiplexer with 1 or 0 on each input. That would be an upper bound in any case.

I believe my text book of many years ago used one of the input variables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Rick C.

Tesla referral code - https://ts.la/richard11209
Get 6 months of free supercharging

Thanks, but I am looking for an accurate estimate.
 
On Friday, December 28, 2018 at 2:49:37 PM UTC-5, Kevin Neilson wrote:
On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del...@gmail.com wrote:
On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

That would be pretty easy. Consider the costs of a D wide multiplexer with 1 or 0 on each input. That would be an upper bound in any case.

I believe my text book of many years ago used one of the input variables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Rick C.

Tesla referral code - https://ts.la/richard11209
Get 6 months of free supercharging

Thanks, but I am looking for an accurate estimate.

I'm confused. Do you want an accurate measurement or an estimate?

Rick C.

- Get 6 months of free supercharging
- Tesla referral code - https://ts.la/richard11209
 
On Saturday, December 29, 2018 at 9:53:20 AM UTC-7, gnuarm.del...@gmail.com wrote:
On Friday, December 28, 2018 at 2:49:37 PM UTC-5, Kevin Neilson wrote:
On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del...@gmail.com wrote:
On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

That would be pretty easy. Consider the costs of a D wide multiplexer with 1 or 0 on each input. That would be an upper bound in any case.

I believe my text book of many years ago used one of the input variables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Rick C.

Tesla referral code - https://ts.la/richard11209
Get 6 months of free supercharging

Thanks, but I am looking for an accurate estimate.

I'm confused. Do you want an accurate measurement or an estimate?

Rick C.

- Get 6 months of free supercharging
- Tesla referral code - https://ts.la/richard11209

Both. An estimate will be very close for large D,W. If I roll a die 1e6 times, estimating there will be 5e5 heads is pretty accurate. Estimating there will be less than or equal to the upper bound of 1e6 heads is correct but not helpful.
 
On Sunday, December 30, 2018 at 2:52:43 PM UTC-5, Kevin Neilson wrote:
On Saturday, December 29, 2018 at 9:53:20 AM UTC-7, gnuarm.del...@gmail.com wrote:
On Friday, December 28, 2018 at 2:49:37 PM UTC-5, Kevin Neilson wrote:
On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del...@gmail.com wrote:
On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

That would be pretty easy. Consider the costs of a D wide multiplexer with 1 or 0 on each input. That would be an upper bound in any case.

I believe my text book of many years ago used one of the input variables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Rick C.

Tesla referral code - https://ts.la/richard11209
Get 6 months of free supercharging

Thanks, but I am looking for an accurate estimate.

I'm confused. Do you want an accurate measurement or an estimate?

Rick C.

- Get 6 months of free supercharging
- Tesla referral code - https://ts.la/richard11209

Both. An estimate will be very close for large D,W. If I roll a die 1e6 times, estimating there will be 5e5 heads is pretty accurate. Estimating there will be less than or equal to the upper bound of 1e6 heads is correct but not helpful.

Good thing you aren't rolling a die.

How much do you think this upper bound will vary from your estimate? Have you tried any tests? What is the result of your "estimate" under-estimating?

I think if I were doing this I'd try to get a handle on the expected results and how much they might vary before I start looking for an equation to "estimate" the number. The one thing you didn't do with the die example is to figure out how much you need to adjust your estimate to get a bound that will include 99.9xxx% of your cases or whatever value you need. Do you know that equation?


Rick C.

+ Get 6 months of free supercharging
+ Tesla referral code - https://ts.la/richard11209
 
On Sunday, December 30, 2018 at 2:37:46 PM UTC-7, gnuarm.del...@gmail.com wrote:
On Sunday, December 30, 2018 at 2:52:43 PM UTC-5, Kevin Neilson wrote:
On Saturday, December 29, 2018 at 9:53:20 AM UTC-7, gnuarm.del...@gmail..com wrote:
On Friday, December 28, 2018 at 2:49:37 PM UTC-5, Kevin Neilson wrote:
On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del...@gmail.com wrote:
On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

That would be pretty easy. Consider the costs of a D wide multiplexer with 1 or 0 on each input. That would be an upper bound in any case.

I believe my text book of many years ago used one of the input variables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Rick C.

Tesla referral code - https://ts.la/richard11209
Get 6 months of free supercharging

Thanks, but I am looking for an accurate estimate.

I'm confused. Do you want an accurate measurement or an estimate?

Rick C.

- Get 6 months of free supercharging
- Tesla referral code - https://ts.la/richard11209

Both. An estimate will be very close for large D,W. If I roll a die 1e6 times, estimating there will be 5e5 heads is pretty accurate. Estimating there will be less than or equal to the upper bound of 1e6 heads is correct but not helpful.

Good thing you aren't rolling a die.

How much do you think this upper bound will vary from your estimate? Have you tried any tests? What is the result of your "estimate" under-estimating?

I think if I were doing this I'd try to get a handle on the expected results and how much they might vary before I start looking for an equation to "estimate" the number. The one thing you didn't do with the die example is to figure out how much you need to adjust your estimate to get a bound that will include 99.9xxx% of your cases or whatever value you need. Do you know that equation?


Rick C.

+ Get 6 months of free supercharging
+ Tesla referral code - https://ts.la/richard11209

Sorry, my response could've been more nicely expressed. I originally though that ASIC guys must have some formula for this but perhaps not--maybe they just run it through the synthesizer and check. I'm writing ASIC code but don't have direct access to the synthesizer. If I did I could maybe plot a few points and fit a curve to it. I only have access to FPGA synthesizers and of course they just implement ROMs in LUTs and the LUT count is proportional to the number of bits in the ROM and there is no logic sharing as the ROM size increases.

I did think about the die/coin example and how to find how good the estimate is. For example, say you flip a coin 1000 times. My expected number of heads is is 500. Say you want to know how many runs will have between 450 and 550 heads. You can use the Poisson CDF. In Matlab/Octave:

octave:32> n_flips=1000; exp_heads = n_flips*0.5; poisscdf(550,exp_heads)-poisscdf(450,exp_heads)
ans = 0.97469

So I'll be in that range 97.5% of the time.
 
On Tuesday, January 1, 2019 at 6:55:57 PM UTC-5, Kevin Neilson wrote:
On Sunday, December 30, 2018 at 2:37:46 PM UTC-7, gnuarm.del...@gmail.com wrote:
On Sunday, December 30, 2018 at 2:52:43 PM UTC-5, Kevin Neilson wrote:
On Saturday, December 29, 2018 at 9:53:20 AM UTC-7, gnuarm.del...@gmail.com wrote:
On Friday, December 28, 2018 at 2:49:37 PM UTC-5, Kevin Neilson wrote:
On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del....@gmail.com wrote:
On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson wrote:
I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

That would be pretty easy. Consider the costs of a D wide multiplexer with 1 or 0 on each input. That would be an upper bound in any case.

I believe my text book of many years ago used one of the input variables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Rick C.

Tesla referral code - https://ts.la/richard11209
Get 6 months of free supercharging

Thanks, but I am looking for an accurate estimate.

I'm confused. Do you want an accurate measurement or an estimate?

Rick C.

- Get 6 months of free supercharging
- Tesla referral code - https://ts.la/richard11209

Both. An estimate will be very close for large D,W. If I roll a die 1e6 times, estimating there will be 5e5 heads is pretty accurate. Estimating there will be less than or equal to the upper bound of 1e6 heads is correct but not helpful.

Good thing you aren't rolling a die.

How much do you think this upper bound will vary from your estimate? Have you tried any tests? What is the result of your "estimate" under-estimating?

I think if I were doing this I'd try to get a handle on the expected results and how much they might vary before I start looking for an equation to "estimate" the number. The one thing you didn't do with the die example is to figure out how much you need to adjust your estimate to get a bound that will include 99.9xxx% of your cases or whatever value you need. Do you know that equation?


Rick C.

+ Get 6 months of free supercharging
+ Tesla referral code - https://ts.la/richard11209

Sorry, my response could've been more nicely expressed. I originally though that ASIC guys must have some formula for this but perhaps not--maybe they just run it through the synthesizer and check. I'm writing ASIC code but don't have direct access to the synthesizer. If I did I could maybe plot a few points and fit a curve to it. I only have access to FPGA synthesizers and of course they just implement ROMs in LUTs and the LUT count is proportional to the number of bits in the ROM and there is no logic sharing as the ROM size increases.

I did think about the die/coin example and how to find how good the estimate is. For example, say you flip a coin 1000 times. My expected number of heads is is 500. Say you want to know how many runs will have between 450 and 550 heads. You can use the Poisson CDF. In Matlab/Octave:

octave:32> n_flips=1000; exp_heads = n_flips*0.5; poisscdf(550,exp_heads)-poisscdf(450,exp_heads)
ans = 0.97469

So I'll be in that range 97.5% of the time.

My point is that unless you can come up with a number like this for your estimate, how much good will it do you? Even if you are right 99% of the time, when designing a chip is that of much value?

I would think knowing the upper bound is a very useful thing indeed when planning an ASIC. But I don't have any more info on calculating the estimate you say you need, so I can't help you.

Rick C.

-- Get 6 months of free supercharging
-- Tesla referral code - https://ts.la/richard11209
 
Am Mittwoch, 2. Januar 2019 00:55:57 UTC+1 schrieb Kevin Neilson:

> So I'll be in that range 97.5% of the time.

I think you did not understand my posting. There is no way of estimating the synthesis result of a ROM right by simple formula.
Unless you build a model of the synthesis tool itself with this formula.

A change of 1 dataword in a ROM with 1024 words could significant change the synthesis result by more than 1%.

There is a simple upper bound and a simple lower bound but real ROMs are always in between.
Simple upper bound is calculated by building DNF for ROM in given technology and lower bound is 1 gate per bit (tie0 or tie1) for stand alone synthesis.

Take for example DES encryption algorithm S-Box. These are lookup tables (ROM) designed to be not easy reduced by synthesis tools. You could synthesis one of this S-Box 10 times with different seeds and would not get 2 identical results for the same S-Box.

If you have no access to ASIC synthesis tool than download free FPGA synthesis tool and test the results for FPGA synthesis. This gives at least a feeling how synthesis tools deal with your ROM.

bye Thomas
 

Welcome to EDABoard.com

Sponsor

Back
Top