Best FPGA for algorithmic acceleration

J

Jordan Fix

Guest
Hello,

I'm looking at different options to use FPGAs as coprocessors for
algorithmic acceleration. Between the Xilinx Virtex 6 (LXT or SXT), or
the Altera Stratix IV (360, 530 or 820), what would be my best option?
The Xilinx Spartan 6 may also be a possibility.

Thanks,
Jordan
 
Jordan Fix <jfix71@gmail.com> wrote:

I'm looking at different options to use FPGAs as coprocessors for
algorithmic acceleration. Between the Xilinx Virtex 6 (LXT or SXT), or
the Altera Stratix IV (360, 530 or 820), what would be my best option?
The Xilinx Spartan 6 may also be a possibility.
There are a number of different families from each company, many of
which will work will for algorithmic acceleration.

If there is a reason to use one over the other, it will likely
depend on details of the problem at hand.

For many such designs, the product of the number of CLBs and
the clock speed that you can run it at is most important, in
addition to the cost per chip.

-- glen
 
Jordan Fix wrote:
I'm looking at different options to use FPGAs as coprocessors for
algorithmic acceleration. Between the Xilinx Virtex 6 (LXT or SXT), or
the Altera Stratix IV (360, 530 or 820), what would be my best option?
The Xilinx Spartan 6 may also be a possibility.
What algorithm do you want to implement with it? If you don't need many
parallel calculation or hard realtime, usually a fast PC with a GPU and
something like CUDA or OpenCL is more cost-effective, and much easier to
program.

--
Frank Buss, http://www.frank-buss.de
electronics and more: http://www.youtube.com/user/frankbuss
 
On Apr 9, 12:13 am, Jordan Fix <jfi...@gmail.com> wrote:
Hello,

I'm looking at different options to use FPGAs as coprocessors for
algorithmic acceleration. Between the Xilinx Virtex 6 (LXT or SXT), or
the Altera Stratix IV (360, 530 or 820), what would be my best option?
The Xilinx Spartan 6 may also be a possibility.

Thanks,
Jordan
Things to consider when choosing an FPGA for algorithm accelleration,
presumably off-loading a CPU, include what types of communications/
memory interfaces would be required to support transferring required
data to/frrom the CPU or system? How well does each candidate support
those interfaces?

Also, what types of internal memory (multi-port, different read/write
data widths, ECC protected, etc), and how much, would be required to
support the algorithm? How well would each candidate support that?

Andy
 
I think it has been already siad that it depends on what you are doing
but I will make some general comments.

Some applications like data manipulation like video data need lot's of
memory so easy access to DDR memory might be a point to look for. As
an example Spartan-6 has a hardened controller which is good but only
if the 2/4 16bit potential interfaces offer enough bandwidth and size.
Other FPGA could offer harder to implement but bigger and faster
DDR2/3.

More expensive FPGAs e.g. Virtex and Stratix tend to offer more
internaal SRAM and DSP blocks so this may be a reason to go this way.
I weill counter that by saying an array approach like our Merrick3/4/6
boards might be a lower power, cheaper, alternative.

You might find that you will need a higher performance PCIe interface
to handle your data flow into a host PC. Here the more expensive FPGAs
tend to be better but there are other ways that might be worth
consideration.

These are all general statements and real way to do this is to look at
the system design level. If want more specific comment contact me
though the Enterpoint contact page http://enterpoint.co.uk/about/ and
I will happy to discuss this in more detail.

John Adair
Enterpoint Ltd.

On Apr 9, 6:13 am, Jordan Fix <jfi...@gmail.com> wrote:
Hello,

I'm looking at different options to use FPGAs as coprocessors for
algorithmic acceleration. Between the Xilinx Virtex 6 (LXT or SXT), or
the Altera Stratix IV (360, 530 or 820), what would be my best option?
The Xilinx Spartan 6 may also be a possibility.

Thanks,
Jordan
 
Frank Buss <fb@frank-buss.de> wrote:

Jordan Fix wrote:
I'm looking at different options to use FPGAs as coprocessors for
algorithmic acceleration. Between the Xilinx Virtex 6 (LXT or SXT), or
the Altera Stratix IV (360, 530 or 820), what would be my best option?
The Xilinx Spartan 6 may also be a possibility.

What algorithm do you want to implement with it? If you don't need many
parallel calculation or hard realtime, usually a fast PC with a GPU and
something like CUDA or OpenCL is more cost-effective, and much easier to
program.
I think this is the best suggestion so far. Some manufacturers even
offer GPU cards without connectors for a monitor. The computational
power of a GPU is huge! It will be very hard to beat with an FPGA.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
 
Nico Coesel <nico@puntnl.niks> wrote:

(snip, someone wrote)
What algorithm do you want to implement with it? If you don't need many
parallel calculation or hard realtime, usually a fast PC with a GPU and
something like CUDA or OpenCL is more cost-effective, and much easier to
program.

I think this is the best suggestion so far. Some manufacturers even
offer GPU cards without connectors for a monitor. The computational
power of a GPU is huge! It will be very hard to beat with an FPGA.
In some cases, it is easy to beat with an FPGA.

There are some dynamic programming algorithms that need many
eight bit add/subtract/compares. I can fit hundreds of cells,
each with about five such operations, in a Spartan 3E.

Multiply and divide are much harder in an FPGA, as is floating point,
but small fixed point add/subtract is easy and fast.

-- glen
 
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

Nico Coesel <nico@puntnl.niks> wrote:

(snip, someone wrote)
What algorithm do you want to implement with it? If you don't need many
parallel calculation or hard realtime, usually a fast PC with a GPU and
something like CUDA or OpenCL is more cost-effective, and much easier to
program.

I think this is the best suggestion so far. Some manufacturers even
offer GPU cards without connectors for a monitor. The computational
power of a GPU is huge! It will be very hard to beat with an FPGA.

In some cases, it is easy to beat with an FPGA.

There are some dynamic programming algorithms that need many
eight bit add/subtract/compares. I can fit hundreds of cells,
each with about five such operations, in a Spartan 3E.
A single GPU offers about 250Gflops of computational power. Maybe you
don't need the floating point but even then it might be faster than a
Spartan 3E doing fixed point operations. And don't forget the data has
to be fetched and stored somewhere. Another piece of cake for a GPU.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
 
Nico Coesel <nico@puntnl.niks> wrote:

(snip, I wrote)
In some cases, it is easy to beat with an FPGA.

There are some dynamic programming algorithms that need many
eight bit add/subtract/compares. I can fit hundreds of cells,
each with about five such operations, in a Spartan 3E.

A single GPU offers about 250Gflops of computational power. Maybe you
don't need the floating point but even then it might be faster than a
Spartan 3E doing fixed point operations. And don't forget the data has
to be fetched and stored somewhere. Another piece of cake for a GPU.
For a linear systolic array, it is pretty easy to get the data in,
which goes at a fairly slow rate. Coming out depends on the actual
data, and can be high or low. Also, a linear systolic array can
be extended by adding more chips fairly easily. You do need to
power and cool them, but otherwise it is a linear array of as
many chips as you can afford.

-- glen
 
On Apr 9, 11:38 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
Nico Coesel <n...@puntnl.niks> wrote:

(snip, I wrote)

In some cases, it is easy to beat with an FPGA.
There are some dynamic programming algorithms that need many
eight bit add/subtract/compares. I can fit hundreds of cells,
each with about five such operations, in a Spartan 3E.
A single GPU offers about 250Gflops of computational power. Maybe you
don't need the floating point but even then it might be faster than a
Spartan 3E doing fixed point operations. And don't forget the data has
to be fetched and stored somewhere. Another piece of cake for a GPU.

For a linear systolic array, it is pretty easy to get the data in,
which goes at a fairly slow rate. Coming out depends on the actual
data, and can be high or low. Also, a linear systolic array can
be extended by adding more chips fairly easily. You do need to
power and cool them, but otherwise it is a linear array of as
many chips as you can afford.

-- glen
The reduced development effort of GPU vs FPGA should not be
understated. It's essentially software design vs. hardware design;
this comes with all the advantages (development cycles, portability,
and tightly-integrated/mature-ish tools are the big ones).

Last I heard, the conventional wisdom here was that GPUs can get you
10x with a few weeks of effort for most problems. FPGAs may get you
100x for some specific problems, but at 10x the development effort
(even more if you're not targeting COTS).

Ultimately, they are both just means to an end, but it's usually
better to let the problem dictate the solution, as opposed to the
other way around.
 

Welcome to EDABoard.com

Sponsor

Back
Top