How to handle a data packet while calculating CRC.

Y

yogesh tripathi

Guest
Hi,

I'm trying to process a Ethernet type package. Suppose if i have detected SFD and now have a <1600Byte data.

I'm extracting different package element(ds_addr,src_addr,etc) concatenating them in a long shift register and at same time passing it to a fifo to buffer and calculating crc32 which will take some clock cycles(xoring and shifting). Now if calculated CRC matched what is received, pass data to nxt stage else rst fifo.


Is there a better technique for it?

Thank-You in advance.
 
On 2018-03-12 12:02, yogesh tripathi wrote:
Hi,

I'm trying to process a Ethernet type package. Suppose if i have detected SFD and now have a <1600Byte data.

I'm extracting different package element(ds_addr,src_addr,etc) concatenating them in a long shift register and at same time passing it to a fifo to buffer and calculating crc32 which will take some clock cycles(xoring and shifting). Now if calculated CRC matched what is received, pass data to nxt stage else rst fifo.


Is there a better technique for it?

Thank-You in advance.

Hi

Calculate CRC on-the-fly together with incoming data.

Adam
 
On Monday, March 12, 2018 at 6:22:39 PM UTC+5:30, Adam GĂłrski wrote:
On 2018-03-12 12:02, yogesh tripathi wrote:
Hi,

I'm trying to process a Ethernet type package. Suppose if i have detected SFD and now have a <1600Byte data.

I'm extracting different package element(ds_addr,src_addr,etc) concatenating them in a long shift register and at same time passing it to a fifo to buffer and calculating crc32 which will take some clock cycles(xoring and shifting). Now if calculated CRC matched what is received, pass data to nxt stage else rst fifo.


Is there a better technique for it?

Thank-You in advance.


Hi

Calculate CRC on-the-fly together with incoming data.

Adam

Hi Adam,

"Calculate CRC on-the-fly together with incoming data." , can you elaborate it a bit more.
I'm getting a 8bit data in one clock cycle from the decoder. Now for crc i need serial shift register.
 
Implement this transition function as your 8-bit data is coming in on each cyle:
https://en.wikipedia.org/wiki/Cyclic_redundancy_check#CRC-32_algorithm

Or you could have a 2x clock for the inner loop to fit the table access, or compute everything one byte per clock cycle by pipelining the accesses such as:
crc32_now = crc32_prev[23:0] XOR output_from_sram_one_cycle_delay
lookup_sram_index_next = crc32_now[7:0] XOR data_from_packet
crc32_prev = crc32_now

Then make sure you handle the reset conditions on packet start, end of packet, etc.

On Tuesday, March 13, 2018 at 12:05:15 AM UTC-4, yogesh tripathi wrote:
On Monday, March 12, 2018 at 6:22:39 PM UTC+5:30, Adam GĂłrski wrote:
On 2018-03-12 12:02, yogesh tripathi wrote:
Hi,

I'm trying to process a Ethernet type package. Suppose if i have detected SFD and now have a <1600Byte data.

I'm extracting different package element(ds_addr,src_addr,etc) concatenating them in a long shift register and at same time passing it to a fifo to buffer and calculating crc32 which will take some clock cycles(xoring and shifting). Now if calculated CRC matched what is received, pass data to nxt stage else rst fifo.


Is there a better technique for it?

Thank-You in advance.


Hi

Calculate CRC on-the-fly together with incoming data.

Adam

Hi Adam,

"Calculate CRC on-the-fly together with incoming data." , can you elaborate it a bit more.
I'm getting a 8bit data in one clock cycle from the decoder. Now for crc i need serial shift register.
 
On Mon, 12 Mar 2018 21:05:09 -0700 (PDT), yogesh tripathi
<yogitripathi47@gmail.com> wrote:
I'm getting a 8bit data in one clock cycle from the decoder. Now
for crc i =
need serial shift register.
You have some options:
1. Use a precalculated table (rainbow table) for CRC. With 8 bit the
size of the table is usually acceptable.
2. Use the 8x clock for the CRC calculating shift register (not
recommended here).
3. Use a pipelined CRC calculation: The calc module is 8 times in
parallel and you get the final result with eight clocks latency.

Bart
 
In article <almarsoft.7981620793307616371@news.eternal-september.org>,
Bart Fox <bartfox@gmx.net> wrote:
On Mon, 12 Mar 2018 21:05:09 -0700 (PDT), yogesh tripathi
yogitripathi47@gmail.com> wrote:
I'm getting a 8bit data in one clock cycle from the decoder. Now
for crc i =
need serial shift register.
You have some options:
1. Use a precalculated table (rainbow table) for CRC. With 8 bit the
size of the table is usually acceptable.
2. Use the 8x clock for the CRC calculating shift register (not
recommended here).
3. Use a pipelined CRC calculation: The calc module is 8 times in
parallel and you get the final result with eight clocks latency.

Table methods are very likely NOT the correct solution for FPGA
implementations. Those methods are tuned for SW solutions.

CRCs in hardware are actually quite easy / low resources. Shifting
through the 8 bits in one clock cycle will probably work just fine.
There's online websites that have "CRC" calculators which can
do some of this work for you. But I find just coding up the
algorithm in verilog / VHDL to be more compact and clear.

If that doesn't work - i.e. you're not hitting you're desired clock
frequencies - then Bart's suggestion of pipelining the calc is valid.
That can be a little trickier, but still doable. I don't think you'll
need to go this far, unless you're trying to hit some high clock rates.

Regards,

Mark
 
On 13/03/2018 04:05, yogesh tripathi wrote:
On Monday, March 12, 2018 at 6:22:39 PM UTC+5:30, Adam GĂłrski wrote:
On 2018-03-12 12:02, yogesh tripathi wrote:
Hi,

I'm trying to process a Ethernet type package. Suppose if i have detected SFD and now have a <1600Byte data.

I'm extracting different package element(ds_addr,src_addr,etc) concatenating them in a long shift register and at same time passing it to a fifo to buffer and calculating crc32 which will take some clock cycles(xoring and shifting). Now if calculated CRC matched what is received, pass data to nxt stage else rst fifo.


Is there a better technique for it?

Thank-You in advance.


Hi

Calculate CRC on-the-fly together with incoming data.

Adam

Hi Adam,

"Calculate CRC on-the-fly together with incoming data." , can you elaborate it a bit more.
I'm getting a 8bit data in one clock cycle from the decoder. Now for crc i need serial shift register.

Not necessarily, have a look at:
http://www.easics.com/webtools/crctool

I've used this, albeit a few years ago and the website has changed since.

If I recall you have to invert and swap the bit order to get the correct
CRC. I used a simulator to check the permutations until I got it right.

If you have 8bit data you can generate the CRC on the fly at the same rate.


--
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk
 
On Wednesday, March 14, 2018 at 1:18:40 AM UTC+5:30, Mike Perkins wrote:
On 13/03/2018 04:05, yogesh tripathi wrote:
On Monday, March 12, 2018 at 6:22:39 PM UTC+5:30, Adam GĂłrski wrote:
On 2018-03-12 12:02, yogesh tripathi wrote:
Hi,

I'm trying to process a Ethernet type package. Suppose if i have detected SFD and now have a <1600Byte data.

I'm extracting different package element(ds_addr,src_addr,etc) concatenating them in a long shift register and at same time passing it to a fifo to buffer and calculating crc32 which will take some clock cycles(xoring and shifting). Now if calculated CRC matched what is received, pass data to nxt stage else rst fifo.


Is there a better technique for it?

Thank-You in advance.


Hi

Calculate CRC on-the-fly together with incoming data.

Adam

Hi Adam,

"Calculate CRC on-the-fly together with incoming data." , can you elaborate it a bit more.
I'm getting a 8bit data in one clock cycle from the decoder. Now for crc i need serial shift register.

Not necessarily, have a look at:
http://www.easics.com/webtools/crctool

I've used this, albeit a few years ago and the website has changed since.

If I recall you have to invert and swap the bit order to get the correct
CRC. I used a simulator to check the permutations until I got it right.

If you have 8bit data you can generate the CRC on the fly at the same rate.


--
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk

Thank-You Mike.
The link gives a provide a HDL package which is basically a Parallel LFSR. You know any related text how to generate a custom LFSR for these CRC ,just for better understanding.
 
Hi,

I'm trying to process a Ethernet type package. Suppose if i have detected SFD and now have a <1600Byte data.

I'm extracting different package element(ds_addr,src_addr,etc) concatenating them in a long shift register and at same time passing it to a fifo to buffer and calculating crc32 which will take some clock cycles(xoring and shifting). Now if calculated CRC matched what is received, pass data to nxt stage else rst fifo.


Is there a better technique for it?

Thank-You in advance.


Hi

Calculate CRC on-the-fly together with incoming data.

Adam

Hi Adam,

"Calculate CRC on-the-fly together with incoming data." , can you elaborate it a bit more.
I'm getting a 8bit data in one clock cycle from the decoder. Now for crc i need serial shift register.

So look for CRC implementation able to process 8bits ( byte ) in single
clock and store data in same time to fifo and to CRC unit.

Hint: Online CRC VHDL generator. There is many.

Best regards

Adam GĂłrski
 
On 15/03/2018 11:28, yogesh tripathi wrote:
On Wednesday, March 14, 2018 at 1:18:40 AM UTC+5:30, Mike Perkins
wrote:
On 13/03/2018 04:05, yogesh tripathi wrote:
On Monday, March 12, 2018 at 6:22:39 PM UTC+5:30, Adam GĂłrski
wrote:
On 2018-03-12 12:02, yogesh tripathi wrote:
Hi,

I'm trying to process a Ethernet type package. Suppose if i
have detected SFD and now have a <1600Byte data.

I'm extracting different package
element(ds_addr,src_addr,etc) concatenating them in a long
shift register and at same time passing it to a fifo to
buffer and calculating crc32 which will take some clock
cycles(xoring and shifting). Now if calculated CRC matched
what is received, pass data to nxt stage else rst fifo.


Is there a better technique for it?

Thank-You in advance.


Hi

Calculate CRC on-the-fly together with incoming data.

Adam

Hi Adam,

"Calculate CRC on-the-fly together with incoming data." , can you
elaborate it a bit more. I'm getting a 8bit data in one clock
cycle from the decoder. Now for crc i need serial shift
register.

Not necessarily, have a look at:
http://www.easics.com/webtools/crctool

I've used this, albeit a few years ago and the website has changed
since.

If I recall you have to invert and swap the bit order to get the
correct CRC. I used a simulator to check the permutations until I
got it right.

If you have 8bit data you can generate the CRC on the fly at the
same rate.


-- Mike Perkins Video Solutions Ltd www.videosolutions.ltd.uk

Thank-You Mike. The link gives a provide a HDL package which is
basically a Parallel LFSR. You know any related text how to generate
a custom LFSR for these CRC ,just for better understanding.

I looked into this a long while ago and gave up!

A LFSR is a simple concept in it's own right but I didn't have time to
decipher a Parallel LFSR. I made the choice of using the result rather
than spend time trying to understand something I would only use once in
a long while.

I'm sure there are proof and theorems on the 'net somewhere!


--
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk
 
On Tue, 13 Mar 2018 19:28:17 -0000 (UTC), gtwrek@sonic.net (gtwrek)
wrote:
Table methods are very likely NOT the correct solution for FPGA
implementations. Those methods are tuned for SW solutions.

Sorry for respeak.Of course is the usage of a lookup-table a valid
design technic for FPGAs.
Maybe not in the range of megabytes, but e.g. for 8 bit input and 8
bit output it's very acceptable.
One example: a low fidelity DDS usually use a small ROM with a sine
table...

Bart Fox
 
In article <almarsoft.7623430815815314146@news.eternal-september.org>,
Bart Fox <bartfox@gmx.net> wrote:
On Tue, 13 Mar 2018 19:28:17 -0000 (UTC), gtwrek@sonic.net (gtwrek)
wrote:
Table methods are very likely NOT the correct solution for FPGA
implementations. Those methods are tuned for SW solutions.

Sorry for respeak.Of course is the usage of a lookup-table a valid
design technic for FPGAs.
Maybe not in the range of megabytes, but e.g. for 8 bit input and 8
bit output it's very acceptable.
One example: a low fidelity DDS usually use a small ROM with a sine
table...

Table lookups, in general, are a very valid tool for some hardware
solutions. Just not for CRCs. A "brute force" table lookup for an
8-bit input, 8-bit CRC would require 2**16 entries by 8 bits. So a
64 KByte table. Not efficient. So one looks up many of the
"Software" table generated techniques to reduce the requirements.
Problem is those techniques are tuned to optimizes SW, not HW.

.... All to replace something less than a 100 LUT6s, and 8 FFs.

CRCs in hardware are very efficient just coded brute force.

Regards,

Mark
 
Den fredag den 16. marts 2018 kl. 01.29.54 UTC+1 skrev gtwrek:
In article <almarsoft.7623430815815314146@news.eternal-september.org>,
Bart Fox <bartfox@gmx.net> wrote:
On Tue, 13 Mar 2018 19:28:17 -0000 (UTC), gtwrek@sonic.net (gtwrek)
wrote:
Table methods are very likely NOT the correct solution for FPGA
implementations. Those methods are tuned for SW solutions.

Sorry for respeak.Of course is the usage of a lookup-table a valid
design technic for FPGAs.
Maybe not in the range of megabytes, but e.g. for 8 bit input and 8
bit output it's very acceptable.
One example: a low fidelity DDS usually use a small ROM with a sine
table...

Table lookups, in general, are a very valid tool for some hardware
solutions. Just not for CRCs. A "brute force" table lookup for an
8-bit input, 8-bit CRC would require 2**16 entries by 8 bits. So a
64 KByte table. Not efficient. So one looks up many of the
"Software" table generated techniques to reduce the requirements.
Problem is those techniques are tuned to optimizes SW, not HW.

... All to replace something less than a 100 LUT6s, and 8 FFs.

CRCs in hardware are very efficient just coded brute force.

http://www.sunshine2k.de/articles/coding/crc/understanding_crc.html#ch6

doesn't look too bad
 
In article <b12f68ce-d635-4556-8fdc-ece405559d98@googlegroups.com>,
<lasselangwadtchristensen@gmail.com> wrote:
Den fredag den 16. marts 2018 kl. 01.29.54 UTC+1 skrev gtwrek:
In article <almarsoft.7623430815815314146@news.eternal-september.org>,
Bart Fox <bartfox@gmx.net> wrote:
On Tue, 13 Mar 2018 19:28:17 -0000 (UTC), gtwrek@sonic.net (gtwrek)
wrote:
Table methods are very likely NOT the correct solution for FPGA
implementations. Those methods are tuned for SW solutions.

Sorry for respeak.Of course is the usage of a lookup-table a valid
design technic for FPGAs.
Maybe not in the range of megabytes, but e.g. for 8 bit input and 8
bit output it's very acceptable.
One example: a low fidelity DDS usually use a small ROM with a sine
table...

Table lookups, in general, are a very valid tool for some hardware
solutions. Just not for CRCs. A "brute force" table lookup for an
8-bit input, 8-bit CRC would require 2**16 entries by 8 bits. So a
64 KByte table. Not efficient. So one looks up many of the
"Software" table generated techniques to reduce the requirements.
Problem is those techniques are tuned to optimizes SW, not HW.

... All to replace something less than a 100 LUT6s, and 8 FFs.

CRCs in hardware are very efficient just coded brute force.


http://www.sunshine2k.de/articles/coding/crc/understanding_crc.html#ch6

doesn't look too bad

My favorite reference is:
http://www.ross.net/crc/download/crc_v3.txt

Hardware engineers should STOP reading after section 8. That's all you
need to know as a hardware designer. Everything after section 8 is
dedicated to software optimizations where one doesn't have free access
to any bit - like we do in hardware.

Regards,

Mark
 
On 16/03/18 01:29, gtwrek wrote:
In article <almarsoft.7623430815815314146@news.eternal-september.org>,
Bart Fox <bartfox@gmx.net> wrote:
On Tue, 13 Mar 2018 19:28:17 -0000 (UTC), gtwrek@sonic.net (gtwrek)
wrote:
Table methods are very likely NOT the correct solution for FPGA
implementations. Those methods are tuned for SW solutions.

Sorry for respeak.Of course is the usage of a lookup-table a valid
design technic for FPGAs.
Maybe not in the range of megabytes, but e.g. for 8 bit input and 8
bit output it's very acceptable.
One example: a low fidelity DDS usually use a small ROM with a sine
table...

Table lookups, in general, are a very valid tool for some hardware
solutions. Just not for CRCs. A "brute force" table lookup for an
8-bit input, 8-bit CRC would require 2**16 entries by 8 bits. So a
64 KByte table. Not efficient.

No, it is 8-bit in (the address), 32-bit out for a 32-bit CRC - 1024 bytes.

So one looks up many of the
"Software" table generated techniques to reduce the requirements.
Problem is those techniques are tuned to optimizes SW, not HW.

... All to replace something less than a 100 LUT6s, and 8 FFs.

CRCs in hardware are very efficient just coded brute force.

CRC's in hardware, using a shift register, are very efficient if your
data is coming in a bit at a time. If you have data in memory or
arriving in a wider path, table lookup can be very useful. You can
choose your sizes to give a balance between speed and size. For
example, you could use 4-bit in, 32-bit out tables and run 4 bits at a
time. (Such a solution can be pipelined for greater throughput.)
 
On 2018-03-15 18:35, Adam GĂłrski wrote:
Hi,

I'm trying to process a Ethernet type package. Suppose if i have
detected SFD and now have a  <1600Byte  data.

I'm extracting different package element(ds_addr,src_addr,etc)
concatenating them in a long shift register and at same time passing
it to a fifo to buffer and calculating crc32 which will take some
clock cycles(xoring and shifting). Now if calculated CRC matched
what is received, pass data to nxt stage else rst fifo.

Is there a better technique for it?

Thank-You in advance.


Hi

Calculate CRC on-the-fly together with incoming data.

Adam

Hi Adam,

  "Calculate CRC on-the-fly together with incoming data." , can you
elaborate it a bit more.
I'm getting a 8bit data in one clock cycle from the decoder. Now for
crc i need serial shift register.


So look for CRC implementation able to process 8bits ( byte ) in single
clock and store data in same time to fifo and to CRC unit.

Hint: Online CRC VHDL generator. There is many.

Best regards

Adam GĂłrski
Example:

Ethernet CRC generator code is ( http://www.easics.com/webtools/crctool ):

--------------------------------------------------------------------------------
-- Copyright (C) 1999-2008 Easics NV.
-- This source file may be used and distributed without restriction
-- provided that this copyright statement is not removed from the file
-- and that any derivative work contains the original copyright notice
-- and the associated disclaimer.
--
-- THIS SOURCE FILE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS
-- OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
-- WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
--
-- Purpose : synthesizable CRC function
-- * polynomial: x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10
+ x^8 + x^7 + x^5 + x^4 + x^2 + x^1 + 1
-- * data width: 8
--
-- Info : tools@easics.be
-- http://www.easics.com
--------------------------------------------------------------------------------
library ieee;
use ieee.std_logic_1164.all;

package PCK_CRC32_D8 is
-- polynomial: x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10
+ x^8 + x^7 + x^5 + x^4 + x^2 + x^1 + 1
-- data width: 8
-- convention: the first serial bit is D[7]
function nextCRC32_D8
(Data: std_logic_vector(7 downto 0);
crc: std_logic_vector(31 downto 0))
return std_logic_vector;
end PCK_CRC32_D8;


package body PCK_CRC32_D8 is

-- polynomial: x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10
+ x^8 + x^7 + x^5 + x^4 + x^2 + x^1 + 1
-- data width: 8
-- convention: the first serial bit is D[7]
function nextCRC32_D8
(Data: std_logic_vector(7 downto 0);
crc: std_logic_vector(31 downto 0))
return std_logic_vector is

variable d: std_logic_vector(7 downto 0);
variable c: std_logic_vector(31 downto 0);
variable newcrc: std_logic_vector(31 downto 0);

begin
d := Data;
c := crc;

newcrc(0) := d(6) xor d(0) xor c(24) xor c(30);
newcrc(1) := d(7) xor d(6) xor d(1) xor d(0) xor c(24) xor c(25)
xor c(30) xor c(31);
newcrc(2) := d(7) xor d(6) xor d(2) xor d(1) xor d(0) xor c(24) xor
c(25) xor c(26) xor c(30) xor c(31);
newcrc(3) := d(7) xor d(3) xor d(2) xor d(1) xor c(25) xor c(26)
xor c(27) xor c(31);
newcrc(4) := d(6) xor d(4) xor d(3) xor d(2) xor d(0) xor c(24) xor
c(26) xor c(27) xor c(28) xor c(30);
newcrc(5) := d(7) xor d(6) xor d(5) xor d(4) xor d(3) xor d(1) xor
d(0) xor c(24) xor c(25) xor c(27) xor c(28) xor c(29) xor c(30) xor c(31);
newcrc(6) := d(7) xor d(6) xor d(5) xor d(4) xor d(2) xor d(1) xor
c(25) xor c(26) xor c(28) xor c(29) xor c(30) xor c(31);
newcrc(7) := d(7) xor d(5) xor d(3) xor d(2) xor d(0) xor c(24) xor
c(26) xor c(27) xor c(29) xor c(31);
newcrc(8) := d(4) xor d(3) xor d(1) xor d(0) xor c(0) xor c(24) xor
c(25) xor c(27) xor c(28);
newcrc(9) := d(5) xor d(4) xor d(2) xor d(1) xor c(1) xor c(25) xor
c(26) xor c(28) xor c(29);
newcrc(10) := d(5) xor d(3) xor d(2) xor d(0) xor c(2) xor c(24)
xor c(26) xor c(27) xor c(29);
newcrc(11) := d(4) xor d(3) xor d(1) xor d(0) xor c(3) xor c(24)
xor c(25) xor c(27) xor c(28);
newcrc(12) := d(6) xor d(5) xor d(4) xor d(2) xor d(1) xor d(0) xor
c(4) xor c(24) xor c(25) xor c(26) xor c(28) xor c(29) xor c(30);
newcrc(13) := d(7) xor d(6) xor d(5) xor d(3) xor d(2) xor d(1) xor
c(5) xor c(25) xor c(26) xor c(27) xor c(29) xor c(30) xor c(31);
newcrc(14) := d(7) xor d(6) xor d(4) xor d(3) xor d(2) xor c(6) xor
c(26) xor c(27) xor c(28) xor c(30) xor c(31);
newcrc(15) := d(7) xor d(5) xor d(4) xor d(3) xor c(7) xor c(27)
xor c(28) xor c(29) xor c(31);
newcrc(16) := d(5) xor d(4) xor d(0) xor c(8) xor c(24) xor c(28)
xor c(29);
newcrc(17) := d(6) xor d(5) xor d(1) xor c(9) xor c(25) xor c(29)
xor c(30);
newcrc(18) := d(7) xor d(6) xor d(2) xor c(10) xor c(26) xor c(30)
xor c(31);
newcrc(19) := d(7) xor d(3) xor c(11) xor c(27) xor c(31);
newcrc(20) := d(4) xor c(12) xor c(28);
newcrc(21) := d(5) xor c(13) xor c(29);
newcrc(22) := d(0) xor c(14) xor c(24);
newcrc(23) := d(6) xor d(1) xor d(0) xor c(15) xor c(24) xor c(25)
xor c(30);
newcrc(24) := d(7) xor d(2) xor d(1) xor c(16) xor c(25) xor c(26)
xor c(31);
newcrc(25) := d(3) xor d(2) xor c(17) xor c(26) xor c(27);
newcrc(26) := d(6) xor d(4) xor d(3) xor d(0) xor c(18) xor c(24)
xor c(27) xor c(28) xor c(30);
newcrc(27) := d(7) xor d(5) xor d(4) xor d(1) xor c(19) xor c(25)
xor c(28) xor c(29) xor c(31);
newcrc(28) := d(6) xor d(5) xor d(2) xor c(20) xor c(26) xor c(29)
xor c(30);
newcrc(29) := d(7) xor d(6) xor d(3) xor c(21) xor c(27) xor c(30)
xor c(31);
newcrc(30) := d(7) xor d(4) xor c(22) xor c(28) xor c(31);
newcrc(31) := d(5) xor c(23) xor c(29);
return newcrc;
end nextCRC32_D8;

end PCK_CRC32_D8;
 
In article <p8fufb$8es$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 16/03/18 01:29, gtwrek wrote:
In article <almarsoft.7623430815815314146@news.eternal-september.org>,
Bart Fox <bartfox@gmx.net> wrote:
On Tue, 13 Mar 2018 19:28:17 -0000 (UTC), gtwrek@sonic.net (gtwrek)
wrote:
Table methods are very likely NOT the correct solution for FPGA
implementations. Those methods are tuned for SW solutions.

Sorry for respeak.Of course is the usage of a lookup-table a valid
design technic for FPGAs.
Maybe not in the range of megabytes, but e.g. for 8 bit input and 8
bit output it's very acceptable.
One example: a low fidelity DDS usually use a small ROM with a sine
table...

Table lookups, in general, are a very valid tool for some hardware
solutions. Just not for CRCs. A "brute force" table lookup for an
8-bit input, 8-bit CRC would require 2**16 entries by 8 bits. So a
64 KByte table. Not efficient.

No, it is 8-bit in (the address), 32-bit out for a 32-bit CRC - 1024 bytes.

So the "brute force" table is even larger.

So one looks up many of the
"Software" table generated techniques to reduce the requirements.
Problem is those techniques are tuned to optimizes SW, not HW.

... All to replace something less than a 100 LUT6s, and 8 FFs.

CRCs in hardware are very efficient just coded brute force.


CRC's in hardware, using a shift register, are very efficient if your
data is coming in a bit at a time. If you have data in memory or
arriving in a wider path, table lookup can be very useful. You can

I don't agree. Those table methods described in all those papers you
find are tuned for software solutions, not hardware. It's quite easy
to handle one-bit, or N-bits at a time in hardware.

Code up a simple amount of logic to shift one bit at a time - in whatever
language.

next_crc = shift_crc( current_crc, new_data_bit_in );

The shift_crc function is very simple to implement in verilog/vhdl.

Now for N bits at a time, stick a 'for' loop around that function
call... Done. The for loop (as always in hardware) describes parallel
hardware, not sequential operation. And it works to do exactly what is
desired. You get a clear description of what's happening in shockingly
few lines of code. It reproduces what all those online CRC generators
spit out as a glob of random XOR code....

If that's not fast enough, then look at pipelining it. Table methods
are hardly ever the right solution for CRCs in hardware.

Burning even one block memory for a CRC calculation in hardware just seems
absurdly excessive to me. But maybe your designs have a lot of spare
block memories, (and few LUTs available).

Regards,

Mark
 
On 16/03/18 15:52, gtwrek wrote:
In article <p8fufb$8es$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 16/03/18 01:29, gtwrek wrote:
In article <almarsoft.7623430815815314146@news.eternal-september.org>,
Bart Fox <bartfox@gmx.net> wrote:
On Tue, 13 Mar 2018 19:28:17 -0000 (UTC), gtwrek@sonic.net (gtwrek)
wrote:
Table methods are very likely NOT the correct solution for FPGA
implementations. Those methods are tuned for SW solutions.

Sorry for respeak.Of course is the usage of a lookup-table a valid
design technic for FPGAs.
Maybe not in the range of megabytes, but e.g. for 8 bit input and 8
bit output it's very acceptable.
One example: a low fidelity DDS usually use a small ROM with a sine
table...

Table lookups, in general, are a very valid tool for some hardware
solutions. Just not for CRCs. A "brute force" table lookup for an
8-bit input, 8-bit CRC would require 2**16 entries by 8 bits. So a
64 KByte table. Not efficient.

No, it is 8-bit in (the address), 32-bit out for a 32-bit CRC - 1024 bytes.

So the "brute force" table is even larger.

No, they are not.

Let's take the simple case of 8-bit CRC, with data coming in 8-bit
lumps. This is the C code for it:

static const uint8_t crcTable[256] = {
0x00, 0x8d, 0x97, 0x1a, 0xa3, 0x2e, 0x34, 0xb9,
// ...
}

uint8_t calcCrc(uint8_t crc, const uint8_t * p, size_t n)
{
while (n--) {
crc = crcTable[crc ^ *p++];
}
return crc;
}

In hardware terms, that means you take your incoming 8-bit data, xor it
with your 8-bit current crc, then use that as the address for the lookup
in your 256-entry (8-bit address, 8-bit data) table. The value from the
table is the new crc to use for the next round of 8-bit data.

The table is 256 bytes - 2K bits, if you prefer. Not 2^16.

I suspect that what you are missing in your understanding here is that
you do not need a table that combines the current crc and the incoming
byte independently - /that/ would need a 2^16 entry table. You take
advantage of the way the CRC is defined to combine the parts with xor
(using plain logic) first.

If you have a wider CRC and data coming in as 8-bit lumps, you need more
than one 256-entry table and you combine the steps with xor's. So for a
32-bit CRC, you can use 4 256-entry, 8-bit wide lookup tables. The four
lookups and the xors between them can easily be pipelined.

You can reduce the depth of the pipelines by having bigger tables - 2^16
entry tables will half the pipeline depth, but is unlikely to be worth
it due to the size of the tables. You can also use smaller tables and
more steps, but 8-bit lookups often work well. It is also possible to
take advantage of wider tables if your incoming data is in wider batches.

So one looks up many of the
"Software" table generated techniques to reduce the requirements.
Problem is those techniques are tuned to optimizes SW, not HW.

... All to replace something less than a 100 LUT6s, and 8 FFs.

CRCs in hardware are very efficient just coded brute force.


CRC's in hardware, using a shift register, are very efficient if your
data is coming in a bit at a time. If you have data in memory or
arriving in a wider path, table lookup can be very useful. You can

I don't agree. Those table methods described in all those papers you
find are tuned for software solutions, not hardware. It's quite easy
to handle one-bit, or N-bits at a time in hardware.

I haven't looked at the particular papers here. One-bit CRC hardware
is, as you say, quite easy to make and it is fine if your data is coming
in 1 bit at a time. But it is much slower if the data is coming in
parallel bunches as is often the case for high-speed serial lines. If
you have 1 GBit Ethernet, it is far easier to handle data that is 8-bit
wide at 125 MHz than data that is 1-bit wide at 1 GHz.

Code up a simple amount of logic to shift one bit at a time - in whatever
language.

next_crc = shift_crc( current_crc, new_data_bit_in );

The shift_crc function is very simple to implement in verilog/vhdl.

Now for N bits at a time, stick a 'for' loop around that function
call... Done. The for loop (as always in hardware) describes parallel
hardware, not sequential operation. And it works to do exactly what is
desired. You get a clear description of what's happening in shockingly
few lines of code. It reproduces what all those online CRC generators
spit out as a glob of random XOR code....

If that's not fast enough, then look at pipelining it. Table methods
are hardly ever the right solution for CRCs in hardware.

Burning even one block memory for a CRC calculation in hardware just seems
absurdly excessive to me. But maybe your designs have a lot of spare
block memories, (and few LUTs available).

Regards,

Mark
 
If that's not fast enough, then look at pipelining it. Table methods are
hardly ever the right solution for CRCs in hardware.

In FPGA, almost all methods are table methods.
But not always the same ones as software.
 
On 20/03/18 16:27, mac wrote:
If that's not fast enough, then look at pipelining it. Table methods are
hardly ever the right solution for CRCs in hardware.

In FPGA, almost all methods are table methods.
But not always the same ones as software.

(Your quoting is jumbled. You posted a follow-up to my post, but quoted
from gtwrek's post. And you failed to give proper attributions. Please
try to get this right - it is not hard, it is common courtesy, and it
makes newsgroup threads much easier to follow. Thanks.)
 

Welcome to EDABoard.com

Sponsor

Back
Top