Online tool that generates parallel CRC and Scrambler

J

Jake7

Guest
I've built a website - http://OutputLogic.com - with online tools
that generate a Verilog code for parallel CRC and Scrambler given data
width and polynomial coefficients.

Also, there are short posts that describe an efficient parallel CRC/
Scrambler generation algorithm for Verilog or VHDL that I've used.

-evgeni
 
On May 18, 10:14 am, Jake7 <evgen...@gmail.com> wrote:
I've built a website -http://OutputLogic.com-  with online tools
that generate a Verilog code for parallel CRC and Scrambler given data
width and polynomial coefficients.

Also, there are short posts that describe an efficient parallel CRC/
Scrambler generation algorithm for Verilog or VHDL that I've used.

-evgeni
Thatks for these tools !

How about making it a bit more interesting and adding byte enables ?!

Cheers,
rudi
 
Evgeni,

Ok neat. But why not just code the algorithm in straight verilog or
VHDL, instead of C generates verilog? The C generated verilog code
is unmanageable.

CRC, and LFSR algorithms are the top of the list for implenting in
hardware (i.e. HDLs) rather than C. C implementations are messy.

You don't need to calculate "one-bit per clock" - rather one-bit per
ITERATION. Who says each iteration must be a clock tick? Just
implement the procedural code for the logic update of one bit and
stick a 'for' loop around it for 'n' bits. Boom, done. Let
the synthesis tool optimize, and produce the big XOR trees.

The core of the verilog code that supports any polynomial
(width, taps), and any data size could consist of less than
10 lines of code.

--Mark
 
In comp.arch.fpga Mark <mark@cacurry.net> wrote:
(snip)

< CRC, and LFSR algorithms are the top of the list for implenting in
< hardware (i.e. HDLs) rather than C. C implementations are messy.

(snip)

< The core of the verilog code that supports any polynomial
< (width, taps), and any data size could consist of less than
< 10 lines of code.

Note that Xilinx FPGAs can do 16 bits of LFSR in one SRL16,
which takes up very little space. You could easily generate
many of them, also wider than 16 bits.

-- glen
 
Mark wrote:

Ok neat. But why not just code the algorithm in straight verilog or
VHDL, instead of C generates verilog?
....
You don't need to calculate "one-bit per clock" - rather one-bit per
ITERATION. Who says each iteration must be a clock tick? Just
implement the procedural code for the logic update of one bit and
stick a 'for' loop around it for 'n' bits. Boom, done. Let
the synthesis tool optimize, and produce the big XOR trees.
I agree, but not everyone is a language wonk.
This is straightforward in vhdl, and has been
covered repeatedly in the vhdl newsgroup.
If you have done it in verilog,
let's see the code.

The C generated verilog code
is unmanageable.
http://www.easics.be/webtools/crctool
Is a similar generator that has been around for years.
It produces the same "unmanageable" code that
many have nonetheless managed to
paste in and use successfully


-- Mike Treseler
 
On 2009-05-21, Mike Treseler <mtreseler@gmail.com> wrote:
I agree, but not everyone is a language wonk.
This is straightforward in vhdl, and has been
covered repeatedly in the vhdl newsgroup.
If you have done it in verilog,
let's see the code.
It is straight forward in Verilog as well. This is taken from an Ethernet
CRC32 module I wrote a long time ago:

// Ethernet's CRC32
always @(posedge clk125_i) begin
if(crc_enable) begin
crc_tmp = crc;

// Implement it as a for loop of the bit serial version
// and hope that the synthesizer optimize it well (at least ISE does)
for(i = 0; i < 8; i = i + 1) begin
fb = crc_tmp[31];
crc_tmp[31] = crc_tmp[30];
crc_tmp[30] = crc_tmp[29];
crc_tmp[29] = crc_tmp[28];
// ... and so on...
crc_tmp[2] = next_txd_o ^ fb ^ crc_tmp[1]; // x^2
crc_tmp[1] = next_txd_o ^ fb ^ crc_tmp[0]; // x^1
crc_tmp[0] = next_txd_o ^ fb; // 1
end // for (i = 0; i < 8; i = i + 1)

crc <= crc_tmp;
end else if(crc_clear) begin
crc <= 32'hffffffff;
end
end


/Andreas
 
Andreas Ehliar wrote:

It is straight forward in Verilog as well.
I agree, other than filling in
that ... and so on... part.

Thanks for the posting.
You have cracked the code and discovered verilog variables.
I won't tell anyone ;)

This is taken from an Ethernet
CRC32 module I wrote a long time ago:
I did it in vhdl with a compile time
constant table and a serial crc_shift function
overloaded for the parallel case something like:

begin
crc_v := crc; -- starting value
for i in data'range loop -- call serial shift below for each bit
-- left to right
crc_v := crc_shift(data(i), crc_v, crc_type);
end loop;
return crc_v;
end function crc_shift;


-- Mike Treseler








__________________________________________________
constant crc_table : crc_table_t := (
ppp32 => -- ethernet, hdlc, ppp, AAL5, fddi,
(
crc_len => 32,
poly_vec =>
(26|23|22|16|12|11|10|8|7|5|4|2|1|0 => '1', others => '0'),
crc_init => (others => '1'),
remainder => x"c704_dd7b"
), ...
_____________________________________________________
-- Base serial shifter, all of the other crc_shifts end up here
-- This gets called n times for the parallel versions above
function crc_shift -- Serial in, unsigned return
( data : in std_ulogic; -- input bit
crc : in unsigned; -- crc starting value
crc_type : in crc_t
)
return unsigned is
variable crc_v : unsigned(crc'range); -- CRC register
constant reg_len : natural :=
crc_table(crc_type).crc_len; -- look up length
subtype crc_vec is unsigned(reg_len-1 downto 0);
-- chop table poly to length
constant mask : crc_vec :=
crc_table(crc_type).poly_vec(crc_vec'range);
begin
crc_v := crc sll 1; -- shift it
if (crc(crc'left) xor data) = '1' then
-- maybe invert mask bits
crc_v := crc_v xor mask;
end if;
return unsigned(crc_v);
-- returns whole register each shift
end function crc_shift;
 
Mark <mark@cacurry.net> writes:

Ok neat. But why not just code the algorithm in straight verilog or
VHDL, instead of C generates verilog? The C generated verilog code
is unmanageable.
I think this historically from the time when synthesis tools did not
handle loops very well. It was very easy to do the expansion
symbolically in a language like Common Lisp and generate HDL code.
I've done this many times in the past and I disagree that the
generated code is unmanageable. You simply stick it in a module and
instantiate it.


Petter
--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
Andreas Ehliar <ehliar-nospam@isy.liu.se> writes:

It is straight forward in Verilog as well. This is taken from an Ethernet
CRC32 module I wrote a long time ago:
<snipped code>

But that's not any poly at any length...


Petter
--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
Andreas Ehliar wrote:
On 2009-05-21, Mike Treseler <mtreseler@gmail.com> wrote:
I agree, but not everyone is a language wonk.
This is straightforward in vhdl, and has been
covered repeatedly in the vhdl newsgroup.
If you have done it in verilog,
let's see the code.

It is straight forward in Verilog as well.
I agree. But consider what many Verilog designers have learned:

* Think Hardware.
* Don't mix blocking and non-blocking assignments.

If I "think hardware" on an example such as yours, I easily get
confused. To find such an elegant solution, I need to understand
what HDLs and synthesis tools can do.

Likewise, if I can't mix blocking and non-blocking assignments in
a clocked always block, I can't write code like yours.

In summary, unless Verilog RTL designers are prepared to discard
what they are learning from all kinds of papers and trainers,
they won't come up with such elegant solutions. No matter
how straightforward we might find them.

Jan

--
Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
Python as a HDL: http://www.myhdl.org
VHDL development, the modern way: http://www.sigasi.com
Analog design automation: http://www.mephisto-da.com
World-class digital design: http://www.easics.com
 
On May 27, 12:37 am, Petter Gustad <newsmailco...@gustad.com> wrote:
Andreas Ehliar <ehliar-nos...@isy.liu.se> writes:
It is straight forward in Verilog as well. This is taken from an Ethernet
CRC32 module I wrote a long time ago:

snipped code

But that's not any poly at any length...
Andreas's code snippet shows the point fairly well. It's
not hard to expand his to have "polynomial" as an input to
the module as well, and the data length and polynomial length
as parameters.

It's much easier to read his and understand what's happening -
for me at least. Without comments, one can puzzle out the
polynomial, and other CRC parameters fairly easily from the code.
Try that with the machine generated code.

I've inherited machine generated code like these in the past a few
times.
Comments are great, but often get out of date / incomplete / etc.
Next
design, we need to match the CRC in software for some reason. "What's
the polynomial?" the SW engineer asks. "I dunno." HW guy replies,
"So-and-so ran some tool on the net, and out popped this code...".

--Mark
 
Mark wrote:
On May 27, 12:37 am, Petter Gustad <newsmailco...@gustad.com> wrote:
Andreas Ehliar <ehliar-nos...@isy.liu.se> writes:
It is straight forward in Verilog as well. This is taken from an Ethernet
CRC32 module I wrote a long time ago:
snipped code

But that's not any poly at any length...

Andreas's code snippet shows the point fairly well. It's
not hard to expand his to have "polynomial" as an input to
the module as well, and the data length and polynomial length
as parameters.

It's much easier to read his and understand what's happening -
for me at least. Without comments, one can puzzle out the
polynomial, and other CRC parameters fairly easily from the code.
Try that with the machine generated code.
Automatically generated comments should be just fine.
However, there's no need to put the two approaches against each other
like that.

If synthesis tools were ideal, there would be no need for a
tool such as Easics' CRC Tool. It was developed in the mid 90s
because we found that the for-loop approach caused large synthesis
run-times and inefficient results in cases like:
* wide polynomials
* wide data widths
* CRC embedded in a large FSM

What CRC Tool actually does is a dedicated XOR-based optimization.
The synthesis improvements were quite dramatic.

Whether this is still the case, and for which synthesis tools,
I don't know. I can imagine that some synthesis tools contain
specific XOR-based optimization engines by now, which would
possibly remove the need for CRC Tool.

It all comes down to understanding the capabilities of
your synthesis tool.

Jan

--
Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
Python as a HDL: http://www.myhdl.org
VHDL development, the modern way: http://www.sigasi.com
Analog design automation: http://www.mephisto-da.com
World-class digital design: http://www.easics.com
 
On May 20, 1:46 pm, Mark <m...@cacurry.net> wrote:
Evgeni,

Ok neat.  But why not just code the algorithm in straight verilog or
VHDL, instead of C generates verilog?  The C generated verilog code
is unmanageable.

CRC, and LFSR algorithms are the top of the list for implenting in
hardware (i.e. HDLs) rather than C.  C implementations are messy.

You don't need to calculate "one-bit per clock" - rather one-bit per
ITERATION.  Who says each iteration must be a clock tick?  Just
implement the procedural code for the logic update of one bit and
stick a 'for' loop around it for 'n' bits.  Boom, done.  Let
the synthesis tool optimize, and produce the big XOR trees.

The core of the verilog code that supports any polynomial
(width, taps), and any data size could consist of less than
10 lines of code.

--Mark

Mark,

Actually, I've been using this algorithm coded in Verilog and running
from a simulator. Because it's a web-based tool and running on a
server it's coded in Perl. I also have versions that are coded in C
and JavaScript. It all depends on how and where it's been used.

Evgeni
 
On May 20, 5:43 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
In comp.arch.fpga Mark <m...@cacurry.net> wrote:
(snip)

CRC, and LFSR algorithms are the top of the list for implenting in
hardware (i.e. HDLs) rather than C.  C implementations are messy.

(snip)

The core of the verilog code that supports any polynomial
(width, taps), and any data size could consist of less than
10 lines of code.

Note that Xilinx FPGAs can do 16 bits of LFSR in one SRL16,
which takes up very little space.  You could easily generate
many of them, also wider than 16 bits.  

-- glen
Glen,

That's true. I didn't want the tools to generate FPGA-specific code.

Evgeni
 
In comp.arch.fpga Jake7 <evgenist@gmail.com> wrote:
(snip, I wrote)

<> Note that Xilinx FPGAs can do 16 bits of LFSR in one SRL16,
<> which takes up very little space. ?You could easily generate
<> many of them, also wider than 16 bits. ?

< That's true. I didn't want the tools to generate FPGA-specific code.

It seems that ISE is good at finding shift registers.
I don't know exactly what ISE find or doesn't find, but it
seems that the difference is efficient use of space.

-- glen
 
On May 29, 12:14 am, glen herrmannsfeldt <g...@ugcs.caltech.edu>
wrote:
In comp.arch.fpga Jake7 <evgen...@gmail.com> wrote:
(snip, I wrote)

Note that Xilinx FPGAs can do 16 bits of LFSR in one SRL16,
which takes up very little space. ?You could easily generate
many of them, also wider than 16 bits. ?

That's true. I didn't want the tools to generate FPGA-specific code.

It seems that ISE is good at finding shift registers.
I don't know exactly what ISE find or doesn't find, but it
seems that the difference is efficient use of space.

-- glen
Glen,

I observe that Xilinx ISE synthesizes shift registers (SRL16 or SRL32)
if the FFs don't have a reset, like this code:

always @(posedge clk) begin
reg_q2 <= reg_q1;
reg_q3 <= reg_q2;
end

So if your CRC/Scrambler/LFSR code doesn't have a reset, it's going to
be synthesized in the most compact way. The downside is that it would
take more effort to reset such a circuit. For example shift 0 through
the SRLs.

Evgeni
 
On Tuesday, May 26, 2009 6:25:01 AM UTC-4, Andreas Ehliar wrote:
On 2009-05-21, Mike Treseler <mtre ler@gmail.com> wrote:
I agree, but not everyone is a language wonk.
This is straightforward in vhdl, and has been
covered repeatedly in the vhdl newsgroup.
If you have done it in verilog,
let's see the code.

It is straight forward in Verilog as well. This is taken from an Ethernet
CRC32 module I wrote a long time ago:

// Ethernet's CRC32
always @(posedge clk125_i) begin
if(crc_enable) begin
crc_tmp = crc;

// Implement it as a for loop of the bit serial version
// and hope that the synthesizer optimize it well (at least ISE does)
for(i = 0; i < 8; i = i + 1) begin
fb = crc_tmp[31];
crc_tmp[31] = crc_tmp[30];
crc_tmp[30] = crc_tmp[29];
crc_tmp[29] = crc_tmp[28];
// ... and so on...
crc_tmp[2] = next_txd_o ^ fb ^ crc_tmp[1]; // x^2
crc_tmp[1] = next_txd_o ^ fb ^ crc_tmp[0]; // x^1
crc_tmp[0] = next_txd_o ^ fb; // 1
end // for (i = 0; i < 8; i = i + 1)

crc <= crc_tmp;
end else if(crc_clear) begin
crc <= 32'hffffffff;
end
end


/Andreas


Hi,

This is a very useful example for me, but I cannot make 'next_txd_o' out.
Could anyone to complete the above example for me? I also list again below:

`timescale 1 ns / 1 ns
module assign_deassign ();
// Ethernet's CRC32
reg clk125_i,crc,crc_enable;
wire i;
wire fb[31:0];
reg crc_tmp[31:0];
reg crc_clear;
always @(posedge clk125_i) begin
if(crc_enable) begin
crc_tmp = crc;

// Implement it as a for loop of the bit serial version
// and hope that the synthesizer optimize it well (at least ISE does)
for(i = 0; i < 8; i = i + 1) begin
fb = crc_tmp[31];
crc_tmp[31] = crc_tmp[30];
crc_tmp[30] = crc_tmp[29];
crc_tmp[29] = crc_tmp[28];
// ... and so on...
crc_tmp[2] = next_txd_o ^ fb ^ crc_tmp[1]; // x^2
crc_tmp[1] = next_txd_o ^ fb ^ crc_tmp[0]; // x^1
crc_tmp[0] = next_txd_o ^ fb; // 1
end // for (i = 0; i < 8; i = i + 1)

crc <= crc_tmp;
end else if(crc_clear) begin
crc <= 32'hffffffff;
end
end
endmodule


Thanks
 

Welcome to EDABoard.com

Sponsor

Back
Top