fft size in fpga

P

PJ

Guest
Hello,

I am implementing a 128 point real Radix-2 fft, data and coefficient
widths are 16 bit.
I am synthezising it for use in an FPGA. However, it is taking a very
long time to synthesize. (approx 3 days using Leonardo on a 2 GHz
machine with 512 MByte RAM) I am using a 20K1000 Altera FPGA. The ram
required by the fft will be internal to the FPGA

Will this design take up all the space on the device. From past
experience, can someone give me an indication of what area of the
device the fft will occupy.

Surely if it takes up most of the device, then it will be too big, as
I have other features to implement in the FPGA also !!

Thank you
PJ
 
Several of the textbooks on FFTs such as Smith & Smith "Handbook of realtime FFTs"
describe higher radix FFTs. I believe the Xilinx FFT core data sheet goes into a fair
amount of detail on a radix 4 kernel. The easiest way to do hardware sharing is to use
one kernel and run the data through it in multiple passes. You'll need to change the
twiddle factors (which are really a phase rotation) and the data ordering. The twiddles
can be handled by changing the input to a multiplier using a table. The data reordering
is doable by permuting the bits of your address counter.

DA is really a technique for doing multiply-accumulate, not just a multiply. There are
many ways to do the multiplications, for example if speed is not an issue you can use a
scaling accumulator for a fairly compact multiplier. See the page about multiplication on
my website. Flip-flop count is not the whole story, you also have to account for the
combinatorial logic between flip-flops. You could do a totally combinatorial multiplier
with no flip=flops, for example (although it would be quite slow if it is reasonably
sized)

PJ wrote:

Hi Ray/Andyman,

Thanks for replying.

Ray ....
Is there a good explanation on radix-4 ffts ? If I can mod the code
quickly, I will certainly do it. As regarding shared hardware, I
thought about that when implementing my design, but I really couldn't
figure out a way to do it. Are you talking about reducing the number
of stages ? Did you use RAM to provide the hardware sharing ?

I have let the synthesis tools generate the multiplier. I just used
the '*' function. This is one area I think I could make good space
saving. I have seen your articles about multipliers on your website.
Would a DA type multiplier save me much space ?

What is the most efficient multiplier in terms of flop count, and is
there VHDL code or verilog code available. I'll gladly write it
myself, if I can get some pointers. I can trade speed for gate count.
My problem now is the lack of time I have to get this project
completed.

Andyman

I have implemented RAM in my design, and thankfully it synthesis to
RAM fine. However in saying that...that is for very small designs.
This certainly might be a problem. I am using leonardo spectrum, and
will give synplify a try next.
If you found any solutions...I would be glad to hear.

Thank you both for your time and your replies.

Good luck
PJ

Ray Andraka <ray@andraka.com> wrote in message news:<3F664CBB.86459E77@andraka.com>...
It very heavily depends on your implementation. The fact that it is
radix2 and
not radix 4 or higher tells me already that your implementation is not a
very
efficient one. A radix 4 kernel is very little added complexity over a
radix 2
kernel and cuts the processing time and area considerable. Depending on
your
speed requirements and your design prowess, you can certainly fit a 128
point
real-only FFT in a much smaller part than a 1M gate part. As your speed
requirements decrease, you can take advantage of iterative or shared
hardware
to reduce the gate count considerably. I did a 4096 point design in a
Xilinx
XCV1000 that does the complex transform in 68 us (the floorplan and a
brief
description are on the gallery page on my website). That one only
occupies
about 40% of the FPGA and includes some floating point stuff as well as
windowing multipliers and some other goodies. It takes intimate
knowledge
of the FPGA structure and of algorithms to achieve that level of density
(the customer called the design "a work of art"), but even a novice can
achieve a density/performance level of half that design with some
carefully
thought out design.

PJ wrote:

Hello,

I am implementing a 128 point real Radix-2 fft, data and coefficient
widths are 16 bit.
I am synthezising it for use in an FPGA. However, it is taking a very
long time to synthesize. (approx 3 days using Leonardo on a 2 GHz
machine with 512 MByte RAM) I am using a 20K1000 Altera FPGA. The ram
required by the fft will be internal to the FPGA

Will this design take up all the space on the device. From past
experience, can someone give me an indication of what area of the
device the fft will occupy.

Surely if it takes up most of the device, then it will be too big, as
I have other features to implement in the FPGA also !!

Thank you
PJ

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 

Welcome to EDABoard.com

Sponsor

Back
Top