How to make this circuit to operate at 400MHz in 0.15um proc

K

Krist Neot

Guest
The source codes is not coded nicely, but it's good in design compiler.
What I need is make sure it can operate at least 400MHz. Pipeline is
allowed.

BTW, does that mean my CRC has a process rate of 12.8Gbps?

Thanks




module test_crc (clk_in, rst_n_in, data_in, crc_en_in, init_crc_in,
crc_out);

parameter Tp = 1;

input clk_in;
input rst_n_in;
input [31:0] data_in;
input crc_en_in;
input init_crc_in;

output [31:0] crc_out;
wire [31:0] crc_out;

reg [31:0] crc;
reg [31:0] data_mem;

always @ (posedge clk_in or negedge rst_n_in)
begin
if (! rst_n_in)
crc <= #Tp 32'hffffffff;
else begin
if (init_crc_in)
crc <= data_in;
else begin
crc[00] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[1] & data_mem[1])
^(crc[2] & data_mem[2]) ^(crc[3] & data_mem[3]) ^(crc[4] & data_mem[4])
^(crc[6] & data_mem[6]) ^(crc[7] & data_mem[7]) ^(crc[8] & data_mem[8])
^(crc[16] & data_mem[16]) ^(crc[20] & data_mem[20]) ^(crc[22] &
data_mem[22]) ^(crc[23] & data_mem[23]) ^(crc[26] & data_mem[26]));
crc[01] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[4] & data_mem[4])
^(crc[6] & data_mem[6]) ^(crc[7] & data_mem[7]) ^(crc[10] & data_mem[10])
^(crc[21] & data_mem[21]) ^(crc[22] & data_mem[22]) ^(crc[23] &
data_mem[23]) ^(crc[24] & data_mem[24]) ^(crc[25] & data_mem[25]) ^(crc[27]
& data_mem[27]) ^(crc[28] & data_mem[28]) ^(crc[29] & data_mem[29]));
crc[02] <= crc_en_in & ((crc[5] & data_mem[5]) ^(crc[6] & data_mem[6])
^(crc[7] & data_mem[7]) ^(crc[8] & data_mem[8]) ^(crc[9] & data_mem[9])
^(crc[11] & data_mem[11]) ^(crc[12] & data_mem[12]) ^(crc[13] &
data_mem[13]) ^(crc[21] & data_mem[21]) ^(crc[25] & data_mem[25]) ^(crc[27]
& data_mem[27]) ^(crc[28] & data_mem[28]) ^(crc[31] & data_mem[31]));
crc[03] <= crc_en_in & ((crc[5] & data_mem[5]) ^(crc[9] & data_mem[9])
^(crc[11] & data_mem[11]) ^(crc[12] & data_mem[12]) ^(crc[15] &
data_mem[15]) ^(crc[24] & data_mem[24]) ^(crc[26] & data_mem[26]) ^(crc[27]
& data_mem[27]) ^(crc[28] & data_mem[28]) ^(crc[29] & data_mem[29])
^(crc[30] & data_mem[30]));
crc[04] <= crc_en_in & ((crc[8] & data_mem[8]) ^(crc[10] & data_mem[10])
^(crc[11] & data_mem[11]) ^(crc[12] & data_mem[12]) ^(crc[13] &
data_mem[13]) ^(crc[14] & data_mem[14]) ^(crc[16] & data_mem[16]) ^(crc[17]
& data_mem[17]) ^(crc[18] & data_mem[18]) ^(crc[26] & data_mem[26])
^(crc[30] & data_mem[30]));
crc[05] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[1] & data_mem[1])
^(crc[2] & data_mem[2]) ^(crc[10] & data_mem[10]) ^(crc[14] & data_mem[14])
^(crc[16] & data_mem[16]) ^(crc[17] & data_mem[17]) ^(crc[20] &
data_mem[20]) ^(crc[29] & data_mem[29]) ^(crc[31] & data_mem[31]));
crc[06] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[1] & data_mem[1])
^(crc[4] & data_mem[4]) ^(crc[13] & data_mem[13]) ^(crc[15] & data_mem[15])
^(crc[16] & data_mem[16]) ^(crc[17] & data_mem[17]) ^(crc[18] &
data_mem[18]) ^(crc[19] & data_mem[19]) ^(crc[21] & data_mem[21]) ^(crc[22]
& data_mem[22]) ^(crc[23] & data_mem[23]) ^(crc[31] & data_mem[31]));
crc[07] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[1] & data_mem[1])
^(crc[2] & data_mem[2]) ^(crc[3] & data_mem[3]) ^(crc[5] & data_mem[5])
^(crc[6] & data_mem[6]) ^(crc[7] & data_mem[7]) ^(crc[15] & data_mem[15])
^(crc[19] & data_mem[19]) ^(crc[21] & data_mem[21]) ^(crc[22] &
data_mem[22]) ^(crc[25] & data_mem[25]));
crc[08] <= crc_en_in & ((crc[3] & data_mem[3]) ^(crc[5] & data_mem[5])
^(crc[6] & data_mem[6]) ^(crc[9] & data_mem[9]) ^(crc[18] & data_mem[18])
^(crc[20] & data_mem[20]) ^(crc[21] & data_mem[21]) ^(crc[22] &
data_mem[22]) ^(crc[23] & data_mem[23]) ^(crc[24] & data_mem[24]) ^(crc[26]
& data_mem[26]) ^(crc[27] & data_mem[27]) ^(crc[28] & data_mem[28]));
crc[09] <= crc_en_in & ((crc[2] & data_mem[2]) ^(crc[4] & data_mem[4])
^(crc[5] & data_mem[5]) ^(crc[6] & data_mem[6]) ^(crc[7] & data_mem[7])
^(crc[8] & data_mem[8]) ^(crc[10] & data_mem[10]) ^(crc[11] & data_mem[11])
^(crc[12] & data_mem[12]) ^(crc[20] & data_mem[20]) ^(crc[24] &
data_mem[24]) ^(crc[26] & data_mem[26]) ^(crc[27] & data_mem[27]) ^(crc[30]
& data_mem[30]));
crc[10] <= crc_en_in & ((crc[4] & data_mem[4]) ^(crc[8] & data_mem[8])
^(crc[10] & data_mem[10]) ^(crc[11] & data_mem[11]) ^(crc[14] &
data_mem[14]) ^(crc[20] & data_mem[20]) ^(crc[23] & data_mem[23]) ^(crc[25]
& data_mem[25]) ^(crc[26] & data_mem[26]) ^(crc[27] & data_mem[27])
^(crc[28] & data_mem[28]) ^(crc[29] & data_mem[29]) ^(crc[31] &
data_mem[31]));
crc[11] <= crc_en_in & ((crc[4] & data_mem[4]) ^(crc[7] & data_mem[7])
^(crc[9] & data_mem[9]) ^(crc[10] & data_mem[10]) ^(crc[11] & data_mem[11])
^(crc[12] & data_mem[12]) ^(crc[13] & data_mem[13]) ^(crc[15] &
data_mem[15]) ^(crc[16] & data_mem[16]) ^(crc[17] & data_mem[17]) ^(crc[25]
& data_mem[25]) ^(crc[29] & data_mem[29]) ^(crc[31] & data_mem[31]));
crc[12] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[1] & data_mem[1])
^(crc[9] & data_mem[9]) ^(crc[13] & data_mem[13]) ^(crc[15] & data_mem[15])
^(crc[16] & data_mem[16]) ^(crc[19] & data_mem[19]) ^(crc[24] &
data_mem[24]) ^(crc[26] & data_mem[26]) ^(crc[27] & data_mem[27]));
crc[13] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[3] & data_mem[3])
^(crc[8] & data_mem[8]) ^(crc[10] & data_mem[10]) ^(crc[11] & data_mem[11])
^(crc[17] & data_mem[17]) ^(crc[18] & data_mem[18]) ^(crc[20] &
data_mem[20]) ^(crc[21] & data_mem[21]) ^(crc[22] & data_mem[22]) ^(crc[24]
& data_mem[24]) ^(crc[28] & data_mem[28]) ^(crc[31] & data_mem[31]));
crc[14] <= crc_en_in & ((crc[1] & data_mem[1]) ^(crc[2] & data_mem[2])
^(crc[4] & data_mem[4]) ^(crc[5] & data_mem[5]) ^(crc[6] & data_mem[6])
^(crc[8] & data_mem[8]) ^(crc[12] & data_mem[12]) ^(crc[15] & data_mem[15])
^(crc[20] & data_mem[20]) ^(crc[21] & data_mem[21]) ^(crc[25] &
data_mem[25]) ^(crc[27] & data_mem[27]) ^(crc[28] & data_mem[28]));
crc[15] <= crc_en_in & ((crc[4] & data_mem[4]) ^(crc[5] & data_mem[5])
^(crc[9] & data_mem[9]) ^(crc[11] & data_mem[11]) ^(crc[12] & data_mem[12])
^(crc[18] & data_mem[18]) ^(crc[19] & data_mem[19]) ^(crc[21] &
data_mem[21]) ^(crc[22] & data_mem[22]) ^(crc[23] & data_mem[23]) ^(crc[25]
& data_mem[25]) ^(crc[29] & data_mem[29]));
crc[16] <= crc_en_in & ((crc[2] & data_mem[2]) ^(crc[3] & data_mem[3])
^(crc[5] & data_mem[5]) ^(crc[6] & data_mem[6]) ^(crc[7] & data_mem[7])
^(crc[9] & data_mem[9]) ^(crc[13] & data_mem[13]) ^(crc[16] & data_mem[16])
^(crc[21] & data_mem[21]) ^(crc[22] & data_mem[22]) ^(crc[28] &
data_mem[28]) ^(crc[30] & data_mem[30]));
crc[17] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[5] & data_mem[5])
^(crc[6] & data_mem[6]) ^(crc[12] & data_mem[12]) ^(crc[14] & data_mem[14])
^(crc[16] & data_mem[16]) ^(crc[17] & data_mem[17]) ^(crc[23] &
data_mem[23]) ^(crc[24] & data_mem[24]) ^(crc[26] & data_mem[26]) ^(crc[27]
& data_mem[27]) ^(crc[28] & data_mem[28]) ^(crc[30] & data_mem[30]));
crc[18] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[1] & data_mem[1])
^(crc[7] & data_mem[7]) ^(crc[8] & data_mem[8]) ^(crc[10] & data_mem[10])
^(crc[11] & data_mem[11]) ^(crc[12] & data_mem[12]) ^(crc[14] &
data_mem[14]) ^(crc[18] & data_mem[18]) ^(crc[21] & data_mem[21]) ^(crc[26]
& data_mem[26]) ^(crc[27] & data_mem[27]));
crc[19] <= crc_en_in & ((crc[2] & data_mem[2]) ^(crc[5] & data_mem[5])
^(crc[10] & data_mem[10]) ^(crc[11] & data_mem[11]) ^(crc[16] &
data_mem[16]) ^(crc[18] & data_mem[18]) ^(crc[20] & data_mem[20]) ^(crc[21]
& data_mem[21]) ^(crc[23] & data_mem[23]) ^(crc[24] & data_mem[24])
^(crc[28] & data_mem[28]) ^(crc[29] & data_mem[29]) ^(crc[31] &
data_mem[31]));
crc[20] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[2] & data_mem[2])
^(crc[4] & data_mem[4]) ^(crc[5] & data_mem[5]) ^(crc[7] & data_mem[7])
^(crc[8] & data_mem[8]) ^(crc[12] & data_mem[12]) ^(crc[13] & data_mem[13])
^(crc[15] & data_mem[15]) ^(crc[17] & data_mem[17]) ^(crc[19] &
data_mem[19]) ^(crc[20] & data_mem[20]) ^(crc[22] & data_mem[22]) ^(crc[31]
& data_mem[31]));
crc[21] <= crc_en_in & ((crc[1] & data_mem[1]) ^(crc[3] & data_mem[3])
^(crc[4] & data_mem[4]) ^(crc[6] & data_mem[6]) ^(crc[15] & data_mem[15])
^(crc[20] & data_mem[20]) ^(crc[22] & data_mem[22]) ^(crc[24] &
data_mem[24]) ^(crc[25] & data_mem[25]) ^(crc[27] & data_mem[27]) ^(crc[29]
& data_mem[29]));
crc[22] <= crc_en_in & ((crc[4] & data_mem[4]) ^(crc[6] & data_mem[6])
^(crc[8] & data_mem[8]) ^(crc[9] & data_mem[9]) ^(crc[11] & data_mem[11])
^(crc[13] & data_mem[13]) ^(crc[17] & data_mem[17]) ^(crc[18] &
data_mem[18]) ^(crc[22] & data_mem[22]) ^(crc[25] & data_mem[25]) ^(crc[26]
& data_mem[26]) ^(crc[30] & data_mem[30]));
crc[23] <= crc_en_in & ((crc[1] & data_mem[1]) ^(crc[2] & data_mem[2])
^(crc[6] & data_mem[6]) ^(crc[9] & data_mem[9]) ^(crc[10] & data_mem[10])
^(crc[14] & data_mem[14]) ^(crc[25] & data_mem[25]) ^(crc[27] &
data_mem[27]) ^(crc[29] & data_mem[29]) ^(crc[30] & data_mem[30]));
crc[24] <= crc_en_in & ((crc[9] & data_mem[9]) ^(crc[11] & data_mem[11])
^(crc[13] & data_mem[13]) ^(crc[14] & data_mem[14]) ^(crc[16] &
data_mem[16]) ^(crc[18] & data_mem[18]) ^(crc[22] & data_mem[22]) ^(crc[23]
& data_mem[23]) ^(crc[27] & data_mem[27]) ^(crc[30] & data_mem[30])
^(crc[31] & data_mem[31]));
crc[25] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[2] & data_mem[2])
^(crc[6] & data_mem[6]) ^(crc[7] & data_mem[7]) ^(crc[11] & data_mem[11])
^(crc[14] & data_mem[14]) ^(crc[15] & data_mem[15]) ^(crc[19] &
data_mem[19]) ^(crc[30] & data_mem[30]));
crc[26] <= crc_en_in & ((crc[3] & data_mem[3]) ^(crc[14] & data_mem[14])
^(crc[16] & data_mem[16]) ^(crc[18] & data_mem[18]) ^(crc[19] &
data_mem[19]) ^(crc[21] & data_mem[21]) ^(crc[23] & data_mem[23]) ^(crc[27]
& data_mem[27]) ^(crc[28] & data_mem[28]));
crc[27] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[2] & data_mem[2])
^(crc[3] & data_mem[3]) ^(crc[5] & data_mem[5]) ^(crc[7] & data_mem[7])
^(crc[11] & data_mem[11]) ^(crc[12] & data_mem[12]) ^(crc[16] &
data_mem[16]) ^(crc[19] & data_mem[19]) ^(crc[20] & data_mem[20]) ^(crc[24]
& data_mem[24]));
crc[28] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[3] & data_mem[3])
^(crc[4] & data_mem[4]) ^(crc[8] & data_mem[8]) ^(crc[16] & data_mem[16])
^(crc[19] & data_mem[19]) ^(crc[21] & data_mem[21]) ^(crc[23] &
data_mem[23]) ^(crc[24] & data_mem[24]) ^(crc[26] & data_mem[26]) ^(crc[28]
& data_mem[28]));
crc[29] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[3] & data_mem[3])
^(crc[5] & data_mem[5]) ^(crc[7] & data_mem[7]) ^(crc[8] & data_mem[8])
^(crc[10] & data_mem[10]) ^(crc[12] & data_mem[12]) ^(crc[16] &
data_mem[16]) ^(crc[17] & data_mem[17]) ^(crc[21] & data_mem[21]) ^(crc[24]
& data_mem[24]) ^(crc[25] & data_mem[25]) ^(crc[29] & data_mem[29]));
crc[30] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[1] & data_mem[1])
^(crc[5] & data_mem[5]) ^(crc[8] & data_mem[8]) ^(crc[9] & data_mem[9])
^(crc[13] & data_mem[13]) ^(crc[16] & data_mem[16]) ^(crc[17] &
data_mem[17]) ^(crc[20] & data_mem[20]) ^(crc[22] & data_mem[22]) ^(crc[24]
& data_mem[24]) ^(crc[25] & data_mem[25]) ^(crc[27] & data_mem[27])
^(crc[29] & data_mem[29]));
crc[31] <= crc_en_in & ((crc[0] & data_mem[0]) ^(crc[1] & data_mem[1])
^(crc[4] & data_mem[4]) ^(crc[6] & data_mem[6]) ^(crc[8] & data_mem[8])
^(crc[9] & data_mem[9]) ^(crc[11] & data_mem[11]) ^(crc[13] & data_mem[13])
^(crc[17] & data_mem[17]) ^(crc[18] & data_mem[18]) ^(crc[22] &
data_mem[22]) ^(crc[25] & data_mem[25]) ^(crc[26] & data_mem[26]) ^(crc[30]
& data_mem[30]));
end
end
end

always @ (posedge clk_in or negedge rst_n_in)
begin
if (! rst_n_in)
data_mem <= #Tp 32'hffffffff;
else
data_mem <= #Tp data_in;
end

assign crc_out = crc;

endmodule
 
Ignoring the first layer of AND logic, you have a worst-case 14 terms to XOR
together.

Pipelinging may be possible but I wouldn't expect to improve the results if
the data is valid on every clock - crc feeds back to itself in one clock
which makes pipelining a heck of a task.

The bigger question may be whether you have multi-input XOR primitives
available that are faster than the cascaded 2-input XORs. You have 4 levels
of logic to implement the 14-wide XOR in a tree. What speed can you get
from this basic structure?

Unfortunately there isn't a great amount of room for speedup. A 4-input XOR
"built from scratch" could be implemented in an AND/OR cascade with 8
4-input ANDs and an 8-input OR like the old PLDs. Two levels of these
elements would provide a 16-wide XOR function with 40 4-input ANDs and 5
8-input ORs. There are still 4 levels of logic but you might get the
primitive timing of the AND/OR gates (or equivalent NAND/NAND structures) to
beat out the "simple" XORs.

Good luck.


"Krist Neot" <Krist_Neot@hotmail.com> wrote in message
news:4294147b@news.starhub.net.sg...
The source codes is not coded nicely, but it's good in design compiler.
What I need is make sure it can operate at least 400MHz. Pipeline is
allowed.

BTW, does that mean my CRC has a process rate of 12.8Gbps?

Thanks
<code removed>
 
with this amount of cascaded logic, its near impossible to get it
running at 400 Mhz. as you said split it into stages and pipeline the
results. then you might achieve 400Mhz but unfortunately not your
dreamed throughput of 12.8 Gbps.
 
Yeah, I am thinking about pipelining. How do I calculate the throughput?
Is it not 400MHz * 32bit = 12.8Gbps?

Thank you.




"Neo" <zingafriend@yahoo.com> wrote in message
news:1117016731.137725.315700@o13g2000cwo.googlegroups.com...
with this amount of cascaded logic, its near impossible to get it
running at 400 Mhz. as you said split it into stages and pipeline the
results. then you might achieve 400Mhz but unfortunately not your
dreamed throughput of 12.8 Gbps.
 
Assume you can't achieve the speed in any other way one thing that
usually you can do is simple use wider data and thus being able to run
slower so for example instead of running 32 bit data in 400M you can
run 64 bit in 200M, 128 bit in 100M and so on.

The equation get a bit bigger but the extra period compensate for more
than this extra logic.

Have fun.
 

Welcome to EDABoard.com

Sponsor

Back
Top