Bitstream compression

R

Rob Gaddi

Guest
Hey all--

So my experience with the native bitstream compression algorithms
provided by the FPGA vendors has been that they don't actually achieve
all that much compression. This makes sense given that the expansion
has to be accomplished in the FPGA hardware, and so can't be all that
aggressive.

I generally find myself programming FPGAs from microcontrollers, and so
I've got clock cycles and RAM to throw at the problem to try to do
better. We've had some limited luck implementing really stupid RLE,
just compressing the runs of 0x00 and 0xFF and calling it a day. But I
was wondering whether anyone's found a better, more universal approach.

The ideal would be a compression/expansion library with fairly good
compression ratios, where the expansion side could operate without
malloc and on-the-fly on a stream of data rather than requiring large
buffers. The compression side could use phenominal amounts of compute
power and RAM, since it would be happening on the PC. But the goal
would be able to decompress on something like an LPC1754 (32kB RAM total).

Anyone have any thoughts?

--
Rob Gaddi, Highland Technology
Email address is currently out of order
 
Rob Gaddi wrote:

Hey all--

So my experience with the native bitstream compression algorithms
provided by the FPGA vendors has been that they don't actually achieve
all that much compression.

The ideal would be a compression/expansion library with fairly good
compression ratios, where the expansion side could operate without
malloc and on-the-fly on a stream of data rather than requiring large
buffers.
It won't be generally possible to beat vendor provided basic compression
more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't
really improve anything.

The compression side could use phenominal amounts of compute
power and RAM, since it would be happening on the PC. But the goal
would be able to decompress on something like an LPC1754 (32kB RAM total).

Anyone have any thoughts?
Just try 7zip on the FPGA binaries and see for yourself.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
 
On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:
Rob Gaddi wrote:

Hey all--

So my experience with the native bitstream compression algorithms
provided by the FPGA vendors has been that they don't actually achieve
all that much compression.

The ideal would be a compression/expansion library with fairly good
compression ratios, where the expansion side could operate without
malloc and on-the-fly on a stream of data rather than requiring large
buffers.

It won't be generally possible to beat vendor provided basic compression
more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't
really improve anything.

The compression side could use phenominal amounts of compute power and
RAM, since it would be happening on the PC. But the goal would be able
to decompress on something like an LPC1754 (32kB RAM total).

Anyone have any thoughts?

Just try 7zip on the FPGA binaries and see for yourself.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
Just did, on an FPGA with an admittedly small fill ratio. The
uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx
bitstream compression gets it down to 814,336 bits, or about 100kB.
7-Zip knocked it down to 16kB.

Another design uses a decent about of an XC6S45. Native size is
11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip
gives me 395kB.

I've got a project coming up around an Altera Arria II, with 30Mb of
configuration. If I could get a 3:1, 4:1 compression ratio it would be
a pretty radical savings on flash size.

--
Rob Gaddi, Highland Technology
Email address is currently out of order
 
I cant really see why you are trying to do this. If its just for fun the
great, but flash isn't that expensive and you are only talking about 30M
not 30Gb.

Jon

---------------------------------------
Posted through http://www.FPGARelated.com
 
On 29/07/2011 09:20, maxascent wrote:
I cant really see why you are trying to do this. If its just for fun then
great, but flash isn't that expensive and you are only talking about 30Mb
not 30Gb.

Jon

---------------------------------------
Posted through http://www.FPGARelated.com
Many embedded processors have a limited amount of Flash memory, so it
would be an advantage to efficiently compress a bitstream to save a
component and IO.

--
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk
 
On Thu, 28 Jul 2011 17:40:25 -0700, Rob Gaddi wrote:

On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:


Rob Gaddi wrote:

Hey all--

So my experience with the native bitstream compression algorithms
provided by the FPGA vendors has been that they don't actually achieve
all that much compression.

The ideal would be a compression/expansion library with fairly good
compression ratios, where the expansion side could operate without
malloc and on-the-fly on a stream of data rather than requiring large
buffers.

It won't be generally possible to beat vendor provided basic
compression more then by a factor of ~1.5 or so. The gain of 1.5 times
wouldn't really improve anything.

The compression side could use phenominal amounts of compute power and
RAM, since it would be happening on the PC. But the goal would be able
to decompress on something like an LPC1754 (32kB RAM total).

Anyone have any thoughts?

Just try 7zip on the FPGA binaries and see for yourself.


Vladimir Vassilevsky DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Just did, on an FPGA with an admittedly small fill ratio. The
uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx
bitstream compression gets it down to 814,336 bits, or about 100kB.
7-Zip knocked it down to 16kB.

Another design uses a decent about of an XC6S45. Native size is
11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip
gives me 395kB.

I've got a project coming up around an Altera Arria II, with 30Mb of
configuration. If I could get a 3:1, 4:1 compression ratio it would be
a pretty radical savings on flash size.

A design done for a client, using Virtex 6, with gzip -9 for
compression. Figures are from memory and are approximate.

Uncompressed: 7.5MBytes
compressed: a few 10s of kB (empty FPGA)
compressed: 500kBytes (mostly empty FPGA)
compressed: 2MBytes (mostly full FPGA)
compressed: 7.5MBytes (with bitstream encryption)

Note: there's no point using compression with encryption, as the
encrypted images don't compress.

I used gzip as the decompresser is open source and fairly "light".

The Xilinx built-in compression doesn't do a good job because it merely
joins identical frames. The chance of getting identical frames is low
for any design that uses a reasonable amount of the fabric. (If you're
not using most of the fabric, you could be using a smaller, cheaper
device.)
OTOH, if you are using bitstream encryption, the built-in compression is
the only compression you can use and it will be better than nothing.

Regards,
Allan
 
On 29/07/2011 09:20, maxascent wrote:
I cant really see why you are trying to do this. If its just for fu
then
great, but flash isn't that expensive and you are only talking abou
30Mb
not 30Gb.

Jon

---------------------------------------
Posted through http://www.FPGARelated.com

Many embedded processors have a limited amount of Flash memory, so it
would be an advantage to efficiently compress a bitstream to save a
component and IO.

--
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk
Well you can get serial flash devices that use a small amount of IO an
have large capacities. I dont think you would use the internal flash of
processor to store a bitstream unless it was very small. I just cant se
the point of going to all the trouble to do this unless you could shrin
the bitstream to almost nothing.

Jon

---------------------------------------
Posted through http://www.FPGARelated.com
 
From my memory, Altera has a much better "native" compression than
Xilinx, with typical compression-ratios of about 50%.

Regards,

Thomas

www.entner-electronics.com
 
Rob Gaddi <rgaddi@technologyhighland.com> wrote:

Hey all--

So my experience with the native bitstream compression algorithms
provided by the FPGA vendors has been that they don't actually achieve
all that much compression. This makes sense given that the expansion
has to be accomplished in the FPGA hardware, and so can't be all that
aggressive.

I generally find myself programming FPGAs from microcontrollers, and so
I've got clock cycles and RAM to throw at the problem to try to do
better. We've had some limited luck implementing really stupid RLE,
just compressing the runs of 0x00 and 0xFF and calling it a day. But I
was wondering whether anyone's found a better, more universal approach.
The problem is that Hufman based algorithms don't gain that much. I
recall someone has had lots of succes using an RLE like scheme on
nibbles (4 bit) instead of whole bytes.

Look here:
http://www.sulimma.de/prak/ws0001/projekte/ralph/Projekt/index.htm
--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
 
Rob Gaddi wrote:

On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:
Rob Gaddi wrote:

So my experience with the native bitstream compression algorithms
provided by the FPGA vendors has been that they don't actually achieve
all that much compression.

It won't be generally possible to beat vendor provided basic compression
more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't
really improve anything.

Native size is
11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip
gives me 395kB.
Interesting.

I tried to compress JBCs from heavily used Altera Cyclone, got only
about x1.5 of compression.

As for compression algorithm, something like a bit oriented LZSS or LZW
would be easy to implement. The decompression part is trivial. If you
don't care about speed, the compression part is very simple, too. I
don't know if the further sophistication of the algorithm would do much
of a difference.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
 
[ NB: I've added comp.compression to the mix ]

Jason wrote:

Rob Gaddi wrote:

Just did, on an FPGA with an admittedly small fill ratio. The
uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx
bitstream compression gets it down to 814,336 bits, or about 100kB.
7-Zip knocked it down to 16kB.

Another design uses a decent about of an XC6S45. Native size is
11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip
gives me 395kB.

I've got a project coming up around an Altera Arria II, with 30Mb of
configuration. If I could get a 3:1, 4:1 compression ratio it would be
a pretty radical savings on flash size.

The algorithm that 7-Zip uses internally for .7z file compression is
LZMA:

http://en.wikipedia.org/wiki/Lzma

One characteristic of LZMA is that its decoder is much simpler than
the encoder, making it well-suited for your application. You would be
most interested in the open-source LZMA SDK:

http://www.7-zip.org/sdk.html

They provide an ANSI C implementation of a decoder that you can port
to your platform. Also, there is a reference application for an
encoder (I believe also written in C). I have used this in the past to
compress firmware for embedded systems with good success. I use the
encoder as a post-build step to compress the firmware image into an
LZMA stream (note that the compression is not done with 7-Zip, as the .
7z format is a full-blown archive; the reference encoder just gives
you a stream of compressed data, which is what you want). The
resulting file is then decompressed on the embedded target at firmware
update time. The decoder source code is most amenable to porting to 32-
bit architectures; I have implemented it on the LPC2138 ARM7 device
(with the same 32 KB of RAM as your part) as well as a few AVR32UC3
parts.

A couple other things: I originally did this ~4 years ago with a much
older version of the SDK; it's possible that things have changed since
then, but it should still be worth a look. LZMA provides good
compression ratios with a decoder that in my experience runs well on
embedded platforms. Secondly, you do have to be careful with the
parameters you use at the encoder if you want to bound the amount of
memory required at the decoder. More specifically, you need to be
careful which "dictionary size" you use for the encoder. As you might
expect, a larger dictionary gives you better compression ratios, but
the target running the decoder will require at least that much memory
(e.g. an 8 KB dictionary size will require at least 8 KB of memory for
the decoder).
Lempel-Ziv-Oberhumer (LZO) might also be worth investigating.

http://en.wikipedia.org/wiki/Lempel–Ziv–Oberhumer

The LZO library implements a number of algorithms with the
following characteristics:
* Compression is comparable in speed to deflate compression.
* Very fast decompression
* Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level).
* Requires no additional memory for decompression other than the source and destination buffers.
* Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression.

Regards.
 
On Jul 28, 8:40 pm, Rob Gaddi <rga...@technologyhighland.com> wrote:
On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:











Rob Gaddi wrote:

Hey all--

So my experience with the native bitstream compression algorithms
provided by the FPGA vendors has been that they don't actually achieve
all that much compression.

The ideal would be a compression/expansion library with fairly good
compression ratios, where the expansion side could operate without
malloc and on-the-fly on a stream of data rather than requiring large
buffers.

It won't be generally possible to beat vendor provided basic compression
more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't
really improve anything.

The compression side could use phenominal amounts of compute power and
RAM, since it would be happening on the PC. But the goal would be able
to decompress on something like an LPC1754 (32kB RAM total).

Anyone have any thoughts?

Just try 7zip on the FPGA binaries and see for yourself.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Just did, on an FPGA with an admittedly small fill ratio.  The
uncompressed bitstream for an XC3S200 is 1,047,616 bits.  Using Xilinx
bitstream compression gets it down to 814,336 bits, or about 100kB.
7-Zip knocked it down to 16kB.

Another design uses a decent about of an XC6S45.  Native size is
11,875,104 bits (~1.5MB).  Bitstream compresson gives me 1.35MB.  7-Zip
gives me 395kB.

I've got a project coming up around an Altera Arria II, with 30Mb of
configuration.  If I could get a 3:1, 4:1 compression ratio it would be
a pretty radical savings on flash size.

--
Rob Gaddi, Highland Technology
Email address is currently out of order
The algorithm that 7-Zip uses internally for .7z file compression is
LZMA:

http://en.wikipedia.org/wiki/Lzma

One characteristic of LZMA is that its decoder is much simpler than
the encoder, making it well-suited for your application. You would be
most interested in the open-source LZMA SDK:

http://www.7-zip.org/sdk.html

They provide an ANSI C implementation of a decoder that you can port
to your platform. Also, there is a reference application for an
encoder (I believe also written in C). I have used this in the past to
compress firmware for embedded systems with good success. I use the
encoder as a post-build step to compress the firmware image into an
LZMA stream (note that the compression is not done with 7-Zip, as the .
7z format is a full-blown archive; the reference encoder just gives
you a stream of compressed data, which is what you want). The
resulting file is then decompressed on the embedded target at firmware
update time. The decoder source code is most amenable to porting to 32-
bit architectures; I have implemented it on the LPC2138 ARM7 device
(with the same 32 KB of RAM as your part) as well as a few AVR32UC3
parts.

A couple other things: I originally did this ~4 years ago with a much
older version of the SDK; it's possible that things have changed since
then, but it should still be worth a look. LZMA provides good
compression ratios with a decoder that in my experience runs well on
embedded platforms. Secondly, you do have to be careful with the
parameters you use at the encoder if you want to bound the amount of
memory required at the decoder. More specifically, you need to be
careful which "dictionary size" you use for the encoder. As you might
expect, a larger dictionary gives you better compression ratios, but
the target running the decoder will require at least that much memory
(e.g. an 8 KB dictionary size will require at least 8 KB of memory for
the decoder).

Jason
 
Not all applications necessarily configure FPGAs from flash. Many apps
do it over the network, including initial configuration and subsequent
partial reconfigurations. So it's beneficial to have a good bitstream
compression scheme to reduce the amount of traffic.

Thanks,
Evgeni
http://outputlogic.com
 
On 7/29/2011 7:36 AM, Noob wrote:
Jason wrote:

Rob Gaddi wrote:

Just did, on an FPGA with an admittedly small fill ratio. The
uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx
bitstream compression gets it down to 814,336 bits, or about 100kB.
7-Zip knocked it down to 16kB.

Another design uses a decent about of an XC6S45. Native size is
11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip
gives me 395kB.

I've got a project coming up around an Altera Arria II, with 30Mb of
configuration. If I could get a 3:1, 4:1 compression ratio it would be
a pretty radical savings on flash size.

The algorithm that 7-Zip uses internally for .7z file compression is
LZMA:

http://en.wikipedia.org/wiki/Lzma

One characteristic of LZMA is that its decoder is much simpler than
the encoder, making it well-suited for your application. You would be
most interested in the open-source LZMA SDK:

http://www.7-zip.org/sdk.html

They provide an ANSI C implementation of a decoder that you can port
to your platform. Also, there is a reference application for an
encoder (I believe also written in C). I have used this in the past to
compress firmware for embedded systems with good success. I use the
encoder as a post-build step to compress the firmware image into an
LZMA stream (note that the compression is not done with 7-Zip, as the .
7z format is a full-blown archive; the reference encoder just gives
you a stream of compressed data, which is what you want). The
resulting file is then decompressed on the embedded target at firmware
update time. The decoder source code is most amenable to porting to 32-
bit architectures; I have implemented it on the LPC2138 ARM7 device
(with the same 32 KB of RAM as your part) as well as a few AVR32UC3
parts.

A couple other things: I originally did this ~4 years ago with a much
older version of the SDK; it's possible that things have changed since
then, but it should still be worth a look. LZMA provides good
compression ratios with a decoder that in my experience runs well on
embedded platforms. Secondly, you do have to be careful with the
parameters you use at the encoder if you want to bound the amount of
memory required at the decoder. More specifically, you need to be
careful which "dictionary size" you use for the encoder. As you might
expect, a larger dictionary gives you better compression ratios, but
the target running the decoder will require at least that much memory
(e.g. an 8 KB dictionary size will require at least 8 KB of memory for
the decoder).

Lempel-Ziv-Oberhumer (LZO) might also be worth investigating.

http://en.wikipedia.org/wiki/Lempel–Ziv–Oberhumer

The LZO library implements a number of algorithms with the
following characteristics:
* Compression is comparable in speed to deflate compression.
* Very fast decompression
* Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level).
* Requires no additional memory for decompression other than the source and destination buffers.
* Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression.

Regards.
Got the following in an email from Magnus Eriksson, who pled lack of
Usenet access:
--------------

A bunch of years ago I read about something called "pucrunch", for
compressing programs for use on the C64, or in modern parlance perhaps
"a CPU and memory restricted target system" -- it has a 1 MHz CPU with
few registers, and 64K RAM in total.

Pucrunch uses LZ77 and run-length encoding, with some tweaks; that is
what the author decided was the right compromise between compression
ratio and memory usage. And it is a "cruncher", which means that the
end result is a self-decompressing binary, one that unpacks a program to
RAM, and the decompressing code has to stay out of its way as far as
possible -- I believe the generic 6502 decompression routine takes up
only _354 bytes_ of code, if I'm reading the code comments correctly.

The catch is that you'll have to write your own decompression routine
(but that should hardly come as a surprise). There is pseudocode in the
article, and 6502 and Z80 code linked. The compressor is just one file
of C, so it should be easy to test the compression ratio first, at
least.

It will almost certainly be worse than 7-zip, but just how well it does
on a bitstream might be interesting to see.


You can find it here:

Article:
An Optimizing Hybrid LZ77 RLE Data Compression Program, aka
Improving Compression Ratio for Low-Resource Decompression

http://www.cs.tut.fi/~albert/Dev/pucrunch/

some related things in the parent directory too:
http://www.cs.tut.fi/~albert/Dev/


Hope that might be of some help, or inspiration to further
experimentation.


Take care,
Magnus
 
On 29/07/2011 13:25, maxascent wrote:
On 29/07/2011 09:20, maxascent wrote:
I cant really see why you are trying to do this. If its just for fun
then
great, but flash isn't that expensive and you are only talking about
30Mb
not 30Gb.

Jon

---------------------------------------
Posted through http://www.FPGARelated.com

Many embedded processors have a limited amount of Flash memory, so it
would be an advantage to efficiently compress a bitstream to save a
component and IO.


Well you can get serial flash devices that use a small amount of IO and
have large capacities. I dont think you would use the internal flash of a
processor to store a bitstream unless it was very small. I just cant see
the point of going to all the trouble to do this unless you could shrink
the bitstream to almost nothing.
It depends, it so happens for one product I use an LPC2378 which has
enough Flash storage for a Spartan XC3S400A bit file. It saves an extra
device, as well as allowing an easy upgrade path.

It's very much down to personal preference and choice of reducing costs.

--
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk
 
On Fri, 29 Jul 2011 09:21:45 -0500, Vladimir Vassilevsky
<nospam@nowhere.com> wrote:

Rob Gaddi wrote:

On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:
Rob Gaddi wrote:

So my experience with the native bitstream compression algorithms
provided by the FPGA vendors has been that they don't actually achieve
all that much compression.

It won't be generally possible to beat vendor provided basic compression
more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't
really improve anything.

Native size is
11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip
gives me 395kB.

Interesting.

I tried to compress JBCs from heavily used Altera Cyclone, got only
about x1.5 of compression.

As for compression algorithm, something like a bit oriented LZSS or LZW
would be easy to implement. The decompression part is trivial. If you
don't care about speed, the compression part is very simple, too. I
don't know if the further sophistication of the algorithm would do much
of a difference.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
I've done a very simple byte-oriented RLL thing on Xilinx chips and
got config files that were 20 to 40% as big as the original binaries.
Even a pretty-full chip has long runs of 0x00 and 0xFF bytes in the
bitstream. The uP code to decompress this and bang the FPGA is pretty
simple. In serial bit-bang mode, decompressing and configuring is
faster than uncompressed, because spitting out a long string of
identical bits can be done as a fast special case... just wiggle the
config clock!

John
 

Welcome to EDABoard.com

Sponsor

Back
Top