How best do I implement routing boxes in RTL?

N

news reader

Guest
In the design I have 256 3-bit registers, every time I need to read or
write 16 of them (data_o0, 1, ...15).
The read/write address is not totally random.

For example, assuming that I arrange the register into a 16X16 matrix,
data_o0 accesses
among the zeros row or column. data_o1 may access from 20 of the
registers, but not 256, data_o2 may
access from 30 of the variables, etc.

If I code such that every output reads from the 256 registers, the final
logic will be overkill and highly redundant.



If I use case statements to list each of the senarios, the RTL code may end
up 500 kilobyte.
Will design compiler synthesize a 500KB design efficiently? Will NCVerilog
compile and simulate it efficiently?

Are there any neater techniques to attack this problem?
 
Hi "news reader", my humble perls in between..

news reader schrieb:

In the design I have 256 3-bit registers, every time I need to read or
write 16 of them (data_o0, 1, ...15).
The read/write address is not totally random.
It seems that you have an algorithm that handles a deterministic
distribution of the values to be accessed. Therefore you think you can
implement it with logic only.

I assume you are modeling an algorithm for a special matrix operation.

For example, assuming that I arrange the register into a 16X16 matrix,
data_o0 accesses among the zeros row or column. data_o1 may access from 20 of the
registers, but not 256, data_o2 may access from 30 of the variables, etc.
The values do not give us much info. data_ox (x = 1, 2, ...) is
accessing which elements and in which distribution?

If I code such that every output reads from the 256 registers, the final
logic will be overkill and highly redundant.
You think that the distribution of elements can be accessed with pure
logic.
Therefore you tried to model your logic to cover every case, or you
want to do it so.

If I use case statements to list each of the senarios, the RTL code may end
up 500 kilobyte.
This is reasonable then.

Will design compiler synthesize a 500KB design efficiently?
What means "efficience" for you? Speed or minimum logic?
If minimum logic, then please share with us the algorithm you are
trying to implement.

Will NCVerilog compile and simulate it efficiently?
NCVerilog does not care about logic implementation. It defines the
behaviour of the system, no matter how the objects are linked.

Are there any neater techniques to attack this problem?
Since you have not given much data, I think you can implement this
stuff with a RAM.
Why don't you use a RAM? Then you can define the RAM addresses to
model your matrix. You will generate addresses to define the positions
for your matrix which mimics your algorithm.

Utku.
 
"Utku Özcan" <utku.ozcan@gmail.com> wrote in message
news:1173384869.194349.20140@q40g2000cwq.googlegroups.com...
Hi "news reader", my humble perls in between..

news reader schrieb:

In the design I have 256 3-bit registers, every time I need to read or
write 16 of them (data_o0, 1, ...15).
The read/write address is not totally random.

It seems that you have an algorithm that handles a deterministic
distribution of the values to be accessed. Therefore you think you can
implement it with logic only.

I assume you are modeling an algorithm for a special matrix operation.
It's not matrix, but the memory access is intensive, must accomplish r/w in
single clock cycle, so register is used instead of memory.


For example, assuming that I arrange the register into a 16X16 matrix,
data_o0 accesses among the zeros row or column. data_o1 may access from
20 of the
registers, but not 256, data_o2 may access from 30 of the variables,
etc.

The values do not give us much info. data_ox (x = 1, 2, ...) is
accessing which elements and in which distribution?
In each clock cycle, 16 addresses are generated, and 16 data are
read/written. However,
each of the 16 data is read/written only to n/256 addresses (0<n<255).


If I code such that every output reads from the 256 registers, the final
logic will be overkill and highly redundant.

You think that the distribution of elements can be accessed with pure
logic.
Therefore you tried to model your logic to cover every case, or you
want to do it so.

If I use case statements to list each of the senarios, the RTL code may
end
up 500 kilobyte.

This is reasonable then.

By means of case statement, I use 32 case statements, in each case statement
there
are less than 256 choices. Some have only 20, 30 choices, etc.


Will design compiler synthesize a 500KB design efficiently?

What means "efficience" for you? Speed or minimum logic?
If minimum logic, then please share with us the algorithm you are
trying to implement.

Will NCVerilog compile and simulate it efficiently?

NCVerilog does not care about logic implementation. It defines the
behaviour of the system, no matter how the objects are linked.

For example in read operation,
--------------------- implementation A------------------
input [7:0] addr_i0, addr_r1, ...addr_r15;
output [2:0] dat_o0, dat_o1, ...dat_o15;

reg [2:0] mymemory[0:255]; // Main memory

dat_o0 <= mymemory[addr_i0];
dat_o1 <= mymemory[addr_i1];
.....
dat_o15 <= mymemory[addr_i15];
--------------------- End A------------------

--------------------- implementation B------------------

case (addr_i0) // I can calculate these options through simulations.
8'd0 : dat_o0 <= mymemory[0 ];
8'd5 : dat_o0 <= mymemory[5 ];
8'd54 : dat_o0 <= mymemory[54 ];
8'd122: dat_o0 <= mymemory[122];
8'd125: dat_o0 <= mymemory[125];
....
8'd166: dat_o0 <= mymemory[166];
8'd233: dat_o0 <= mymemory[233];
default: dat_o0 <= mymemory[0 ];
endcase



case (addr_i1)
8'd0 : dat_o1 <= mymemory[0 ];
8'd7 : dat_o1 <= mymemory[7 ];
8'd9 : dat_o1 <= mymemory[9 ];
8'd13 : dat_o1 <= mymemory[13 ];
8'd25 : dat_o1 <= mymemory[25 ];
8'd57 : dat_o1 <= mymemory[57 ];
8'd124: dat_o1 <= mymemory[124];
....
8'd133: dat_o1 <= mymemory[133];
8'd155: dat_o1 <= mymemory[155];
8'd277: dat_o1 <= mymemory[277];
default: dat_o1 <= mymemory[0 ];
endcase

....
case (addr_i15)
....
--------------------- End B------------------

In terms of hardware implementation, is it certain that implementation B
saves hardware
compared to A? Will the large chunks of RTL codes causes a DC or NCVerilog
to
choke up?



Are there any neater techniques to attack this problem?

Since you have not given much data, I think you can implement this
stuff with a RAM.
Why don't you use a RAM? Then you can define the RAM addresses to
model your matrix. You will generate addresses to define the positions
for your matrix which mimics your algorithm.
I used registers instead of RAM due to the memory throughput.



 

Welcome to EDABoard.com

Sponsor

Back
Top