All-real FFT for FPGA

T

Tim Wescott

Guest
So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for
FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!
 
On Sunday, February 12, 2017 at 12:05:25 PM UTC-6, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for
FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

It's been a long time, as I remember:
The Hartley transform will work.
Shuffling the data before and after a half size complex FFT will work.
And you can use one of them to check the other.
 
On 2/12/2017 1:05 PM, Tim Wescott wrote:

So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for
FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

Protip: the synthesizer should trim out the unneeded logic, so
you don't need an optimized library macro.

Steve, always helpful
 
On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for
FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a bit so
maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed by
taking advantage of some of the symmetries in the FFT. I don't recall
the details as it has been over a decade since I worked with this, but
you can fold half of the real data into the imaginary portion, run a
size N/2 FFT and then unfold the results. I believe you have to run a
final N sized complex pass on these results to get your final answer. I
do recall it saved a *lot* when performing FFTs, nearly 50%.

--

Rick C
 
On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote:

On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically
for FPGAs. Was my search too quick, or are there no IP sources to do
this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a bit so
maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed by
taking advantage of some of the symmetries in the FFT. I don't recall
the details as it has been over a decade since I worked with this, but
you can fold half of the real data into the imaginary portion, run a
size N/2 FFT and then unfold the results. I believe you have to run a
final N sized complex pass on these results to get your final answer. I
do recall it saved a *lot* when performing FFTs, nearly 50%.

My understanding is that there were some software packages that baked
that into the algorithm, for what savings I don't know. I was wondering
if it was done for FFTs as well.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!
 
On 2/13/2017 12:03 AM, Tim Wescott wrote:
On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote:

On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically
for FPGAs. Was my search too quick, or are there no IP sources to do
this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a bit so
maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed by
taking advantage of some of the symmetries in the FFT. I don't recall
the details as it has been over a decade since I worked with this, but
you can fold half of the real data into the imaginary portion, run a
size N/2 FFT and then unfold the results. I believe you have to run a
final N sized complex pass on these results to get your final answer. I
do recall it saved a *lot* when performing FFTs, nearly 50%.

My understanding is that there were some software packages that baked
that into the algorithm, for what savings I don't know. I was wondering
if it was done for FFTs as well.

I'm not sure what you mean. When you say "baking" it into the
algorithm, it would do pretty much what I described. That *is* the
algorithm. I haven't heard of any other optimizations. The savings is
in time/size. Essentially it does an N/2 size FFT with an extra pass,
so instead of N*log(N) step it takes (N+1)*log(N/2) steps.

--

Rick C
 
On 2/13/2017 1:48 AM, rickman wrote:
On 2/13/2017 12:03 AM, Tim Wescott wrote:
On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote:

On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically
for FPGAs. Was my search too quick, or are there no IP sources to do
this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a bit so
maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed by
taking advantage of some of the symmetries in the FFT. I don't recall
the details as it has been over a decade since I worked with this, but
you can fold half of the real data into the imaginary portion, run a
size N/2 FFT and then unfold the results. I believe you have to run a
final N sized complex pass on these results to get your final answer. I
do recall it saved a *lot* when performing FFTs, nearly 50%.

My understanding is that there were some software packages that baked
that into the algorithm, for what savings I don't know. I was wondering
if it was done for FFTs as well.

I'm not sure what you mean. When you say "baking" it into the
algorithm, it would do pretty much what I described. That *is* the
algorithm. I haven't heard of any other optimizations. The savings is
in time/size. Essentially it does an N/2 size FFT with an extra pass,
so instead of N*log(N) step it takes (N+1)*log(N/2) steps.

Opps, I wrote that wrong. The optimized result would be order N/2 *
(log(N)+1). Just to be clear (or less clear depending on how you read
this), there are actually N/2 butterflies in each pass of the FFT. I
didn't show that since the constant 1/2 applies to both the standard FFT
and the optimized FFT. The point is the number of butterflies is cut in
half on each pass of the FFT for the optimized approach.

--

Rick C
 
On 2/13/2017 12:24 AM, Steve Pope wrote:
On 2/12/2017 1:05 PM, Tim Wescott wrote:

So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for
FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

Protip: the synthesizer should trim out the unneeded logic, so
you don't need an optimized library macro.

I don't think the synthesizer is capable of getting the same savings.
The optimizations would see the zero imaginary inputs and optimize the
first pass of the FFT. All passes would be N/2 butterflies while the
optimized approach would use half that many at the expense of an extra
pass. This is a big savings that the synthesis tools aren't likely to
figure out unless they recognize you are performing an FFT.

Someone refresh my memory. If you do an FFT with zeros in the imaginary
part of the inputs, the output has a symmetry that can be used to
process two real streams at once. I can't recall how it works exactly,
but that symmetry is the basis for separating the results of the two
halves of the original sequence before completing the last pass. One
portion is pulled out because of the even symmetry and the other portion
is pulled out because of the odd symmetry.

I found this page that appears to explain it, but I haven't taken the
time to dig into the math. I think I'd have to start all over again,
it's just been too long.

--

Rick C
 
On Sunday, February 12, 2017 at 11:05:25 AM UTC-7, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I could be mistaken, but doesn't the DCT, which is used for video compression, operate only on real data? It seems like you could find a DCT core designed for JPEG.
 
On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote:

On 2/13/2017 12:03 AM, Tim Wescott wrote:
On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote:

On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data,
that save (I think) roughly 2x the calculations of FFTs for complex
data.

I did a quick search, but didn't find any that are made specifically
for FPGAs. Was my search too quick, or are there no IP sources to do
this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a bit
so maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed
by taking advantage of some of the symmetries in the FFT. I don't
recall the details as it has been over a decade since I worked with
this, but you can fold half of the real data into the imaginary
portion, run a size N/2 FFT and then unfold the results. I believe
you have to run a final N sized complex pass on these results to get
your final answer. I do recall it saved a *lot* when performing FFTs,
nearly 50%.

My understanding is that there were some software packages that baked
that into the algorithm, for what savings I don't know. I was
wondering if it was done for FFTs as well.

I'm not sure what you mean. When you say "baking" it into the
algorithm, it would do pretty much what I described. That *is* the
algorithm. I haven't heard of any other optimizations. The savings is
in time/size. Essentially it does an N/2 size FFT with an extra pass,
so instead of N*log(N) step it takes (N+1)*log(N/2) steps.

There's a distinct symmetry to the Fourier transform that you could use
at each step of the way instead of doing the whole thing and fixing it up
at the end.

I don't know if it would save steps, but it would certainly be easier on
someone who just wants to apply an algorithm.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!
 
On Sun, 12 Feb 2017 12:05:17 -0600, Tim Wescott
<seemywebsite@myfooter.really> wrote:

So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for
FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

As has been mentioned, there are a number of algorithms that exploit
symmetries to prune the computations down near minimal, but I don't
know of any canned FPGA libraries that do it. I don't know of any
canned FPGA libraries that include a FHT (Fast Hartley Transofrm)
either, which would also serve essentially the same purpose.

As Steve suggested, you could just force the imaginary part to static
zeroes and let the synthesizer optimizations clean it up, but it
probably wouldn't be quite as good as using a well-designed RFFT if
one was available.

All that said, many of the existing FPGA FFT libraries are pretty
efficient, so it may be worth just brute-forcing it and see whether it
is good enough.



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
 
On 2/13/2017 2:34 AM, rickman wrote:
On 2/13/2017 12:24 AM, Steve Pope wrote:
On 2/12/2017 1:05 PM, Tim Wescott wrote:

So, there are algorithms out there to perform an FFT on real data, that
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for
FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

Protip: the synthesizer should trim out the unneeded logic, so
you don't need an optimized library macro.

I don't think the synthesizer is capable of getting the same savings.
The optimizations would see the zero imaginary inputs and optimize the
first pass of the FFT. All passes would be N/2 butterflies while the
optimized approach would use half that many at the expense of an extra
pass. This is a big savings that the synthesis tools aren't likely to
figure out unless they recognize you are performing an FFT.

Someone refresh my memory. If you do an FFT with zeros in the imaginary
part of the inputs, the output has a symmetry that can be used to
process two real streams at once. I can't recall how it works exactly,
but that symmetry is the basis for separating the results of the two
halves of the original sequence before completing the last pass. One
portion is pulled out because of the even symmetry and the other portion
is pulled out because of the odd symmetry.

I found this page that appears to explain it, but I haven't taken the
time to dig into the math. I think I'd have to start all over again,
it's just been too long.

Opps, forgot the link.

http://www.engineeringproductivitytools.com/stuff/T0001/PT10.HTM

--

Rick C
 
On 2/13/2017 12:56 PM, Tim Wescott wrote:
On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote:

On 2/13/2017 12:03 AM, Tim Wescott wrote:
On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote:

On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data,
that save (I think) roughly 2x the calculations of FFTs for complex
data.

I did a quick search, but didn't find any that are made specifically
for FPGAs. Was my search too quick, or are there no IP sources to do
this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include
these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a bit
so maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed
by taking advantage of some of the symmetries in the FFT. I don't
recall the details as it has been over a decade since I worked with
this, but you can fold half of the real data into the imaginary
portion, run a size N/2 FFT and then unfold the results. I believe
you have to run a final N sized complex pass on these results to get
your final answer. I do recall it saved a *lot* when performing FFTs,
nearly 50%.

My understanding is that there were some software packages that baked
that into the algorithm, for what savings I don't know. I was
wondering if it was done for FFTs as well.

I'm not sure what you mean. When you say "baking" it into the
algorithm, it would do pretty much what I described. That *is* the
algorithm. I haven't heard of any other optimizations. The savings is
in time/size. Essentially it does an N/2 size FFT with an extra pass,
so instead of N*log(N) step it takes (N+1)*log(N/2) steps.

There's a distinct symmetry to the Fourier transform that you could use
at each step of the way instead of doing the whole thing and fixing it up
at the end.

I don't know if it would save steps, but it would certainly be easier on
someone who just wants to apply an algorithm.

Are you referring to the nature of the FFT on real signals (i.e. data
sets that have zeros for the imaginary data)? That would only help with
the first pass of butterflies. Once your perform those the imaginary
part is not zero anymore.

That's why you get very little optimization by plugging in the zero data
and letting the tools work it out.

On the other hand, the vendor's FFT logic generator should support real
data inputs. It is a common enough need.

--

Rick C
 
On 2/14/2017 1:56 AM, Christian Gollwitzer wrote:
Am 14.02.17 um 03:39 schrieb Tim Wescott:
The Xilinx logic generators also seem to only support 2^n vector sizes.
I don't know how convoluted things get, but I know that with a general-
purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n
one. Don't ask me how -- it's just a box with magic inside for me at
this point.

:) It is very "convoluted" :))))) Sorry for the pun. Prime-sized FFTs
(of long length*) are computed using the Bluestein algorithm. Bluestein
rewrites the FFT of any size as a convolution. This convolution is then
computed using a padded FFT of longer size...

A typical FFT library first tries to factor the length into prime
numbers, and contains optimized FFTs for small prime factors. If that
fails, it uses the Bluestein algorithm.

I'm totally ignorant of FPGA, but somehow it goes over my imagination
how to express such a "convoluted" thing as gates.

People think sequential code is easier because they started with that
and it is now second nature to them. I remember some of the
misconceptions students would have because they don't get that the
computer is a sequential beast. HDLs are written to support parallel
processes as second nature because that is the way hardware works. You
describe each piece and it runs 100% of the time. Things only happen
when the inputs changes, but the hardware is sitting there waiting for
that to happen.

In reality it is *easier* to express most things in HDL I think. But if
you only think in sequential logic, then just write your HDL as a
process. The code inside a process is purely sequential.

--

Rick C
 
On 2/14/2017 1:52 AM, Steve Pope wrote:
Tim Wescott <seemywebsite@myfooter.really> wrote:

On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote:

On 2/13/2017 9:39 PM, Tim Wescott wrote:

Are you referring to the nature of the FFT on real signals (i.e. data
sets that have zeros for the imaginary data)? That would only help
with the first pass of butterflies. Once your perform those the
imaginary part is not zero anymore.

Yes, but the imaginary and real parts are still symmetrical around f =
0,
so you should neither have to compute nor use them.

That's why you get very little optimization by plugging in the zero
data and letting the tools work it out.

Yes, unless the optimizers get _really_ smart.

We are talking two different things. The HDL synthesis optimizers are
optimizing *logic*. They don't know diddly about what you are using the
logic for.

Well, yes, but these computers are getting smarter all the time. When
you tell Synopsis to synthesize and it says "your fly is unzipped",
you'll know that the optimizers are too damned smart.

If for example you compute (a * b) and also compute (a * -b),
the synthesizer is smart enough to know there are not two full
multipliers needed.

I'm very unclear that it would not optimize past the first butterfly.
They can also optimize across pipeline stages and also modify
the number of bits of state and their logic values within a pipeline
register.

What other optimizations do you see?


But, to know for sure, try running it and see what happens.
(And you want to use Synopsis or the equivalent, and probably not
Xilinx/Altera native synthesizers...)

If faced with this design I would certainly give it a shot and
see if the synthesis is good enough, rather than assuming it isn't
and embarking on a major bit of perhaps unneeded design work.

Ok, who will bell the cat? I'm not in the mood for designing and then
analyzing FFTs at the moment.

--

Rick C
 
Am 14.02.17 um 03:39 schrieb Tim Wescott:
The Xilinx logic generators also seem to only support 2^n vector sizes.
I don't know how convoluted things get, but I know that with a general-
purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n
one. Don't ask me how -- it's just a box with magic inside for me at
this point.

:) It is very "convoluted" :))))) Sorry for the pun. Prime-sized FFTs
(of long length*) are computed using the Bluestein algorithm. Bluestein
rewrites the FFT of any size as a convolution. This convolution is then
computed using a padded FFT of longer size...

A typical FFT library first tries to factor the length into prime
numbers, and contains optimized FFTs for small prime factors. If that
fails, it uses the Bluestein algorithm.

I'm totally ignorant of FPGA, but somehow it goes over my imagination
how to express such a "convoluted" thing as gates.

Christian

>

* there is also the Winograd transform which works for a handful of
selected prime numbers, e.g. 11 and 13, but not in the general case
 
On Mon, 13 Feb 2017 16:08:00 -0500, rickman wrote:

On 2/13/2017 12:56 PM, Tim Wescott wrote:
On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote:

On 2/13/2017 12:03 AM, Tim Wescott wrote:
On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote:

On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data,
that save (I think) roughly 2x the calculations of FFTs for complex
data.

I did a quick search, but didn't find any that are made
specifically for FPGAs. Was my search too quick, or are there no
IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to
include these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a bit
so maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed
by taking advantage of some of the symmetries in the FFT. I don't
recall the details as it has been over a decade since I worked with
this, but you can fold half of the real data into the imaginary
portion, run a size N/2 FFT and then unfold the results. I believe
you have to run a final N sized complex pass on these results to get
your final answer. I do recall it saved a *lot* when performing
FFTs,
nearly 50%.

My understanding is that there were some software packages that baked
that into the algorithm, for what savings I don't know. I was
wondering if it was done for FFTs as well.

I'm not sure what you mean. When you say "baking" it into the
algorithm, it would do pretty much what I described. That *is* the
algorithm. I haven't heard of any other optimizations. The savings
is in time/size. Essentially it does an N/2 size FFT with an extra
pass, so instead of N*log(N) step it takes (N+1)*log(N/2) steps.

There's a distinct symmetry to the Fourier transform that you could use
at each step of the way instead of doing the whole thing and fixing it
up at the end.

I don't know if it would save steps, but it would certainly be easier
on someone who just wants to apply an algorithm.

Are you referring to the nature of the FFT on real signals (i.e. data
sets that have zeros for the imaginary data)? That would only help with
the first pass of butterflies. Once your perform those the imaginary
part is not zero anymore.

Yes, but the imaginary and real parts are still symmetrical around f = 0,
so you should neither have to compute nor use them.

That's why you get very little optimization by plugging in the zero data
and letting the tools work it out.

Yes, unless the optimizers get _really_ smart.

On the other hand, the vendor's FFT logic generator should support real
data inputs. It is a common enough need.

I agree, but when I looked I didn't see it.

The Xilinx logic generators also seem to only support 2^n vector sizes.
I don't know how convoluted things get, but I know that with a general-
purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n
one. Don't ask me how -- it's just a box with magic inside for me at
this point.

--
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work! See my website if you're interested
http://www.wescottdesign.com
 
On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote:

On 2/13/2017 9:39 PM, Tim Wescott wrote:
On Mon, 13 Feb 2017 16:08:00 -0500, rickman wrote:

On 2/13/2017 12:56 PM, Tim Wescott wrote:
On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote:

On 2/13/2017 12:03 AM, Tim Wescott wrote:
On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote:

On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real
data, that save (I think) roughly 2x the calculations of FFTs for
complex data.

I did a quick search, but didn't find any that are made
specifically for FPGAs. Was my search too quick, or are there no
IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to
include these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a
bit so maybe I lost that one.

An FFT is inherently a complex function. Real data can be
processed by taking advantage of some of the symmetries in the
FFT. I don't recall the details as it has been over a decade
since I worked with this, but you can fold half of the real data
into the imaginary portion, run a size N/2 FFT and then unfold the
results. I believe you have to run a final N sized complex pass
on these results to get your final answer. I do recall it saved a
*lot* when performing FFTs,
nearly 50%.

My understanding is that there were some software packages that
baked that into the algorithm, for what savings I don't know. I
was wondering if it was done for FFTs as well.

I'm not sure what you mean. When you say "baking" it into the
algorithm, it would do pretty much what I described. That *is* the
algorithm. I haven't heard of any other optimizations. The savings
is in time/size. Essentially it does an N/2 size FFT with an extra
pass, so instead of N*log(N) step it takes (N+1)*log(N/2) steps.

There's a distinct symmetry to the Fourier transform that you could
use at each step of the way instead of doing the whole thing and
fixing it up at the end.

I don't know if it would save steps, but it would certainly be easier
on someone who just wants to apply an algorithm.

Are you referring to the nature of the FFT on real signals (i.e. data
sets that have zeros for the imaginary data)? That would only help
with the first pass of butterflies. Once your perform those the
imaginary part is not zero anymore.

Yes, but the imaginary and real parts are still symmetrical around f =
0,
so you should neither have to compute nor use them.

That's why you get very little optimization by plugging in the zero
data and letting the tools work it out.

Yes, unless the optimizers get _really_ smart.

We are talking two different things. The HDL synthesis optimizers are
optimizing *logic*. They don't know diddly about what you are using the
logic for.

Well, yes, but these computers are getting smarter all the time. When
you tell Synopsis to synthesize and it says "your fly is unzipped",
you'll know that the optimizers are too damned smart.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!
 
On 2/13/2017 9:39 PM, Tim Wescott wrote:
On Mon, 13 Feb 2017 16:08:00 -0500, rickman wrote:

On 2/13/2017 12:56 PM, Tim Wescott wrote:
On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote:

On 2/13/2017 12:03 AM, Tim Wescott wrote:
On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote:

On 2/12/2017 1:05 PM, Tim Wescott wrote:
So, there are algorithms out there to perform an FFT on real data,
that save (I think) roughly 2x the calculations of FFTs for complex
data.

I did a quick search, but didn't find any that are made
specifically for FPGAs. Was my search too quick, or are there no
IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to
include these algorithms in their FFT libraries.

I thought I replied to this, but my computer has been crashing a bit
so maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed
by taking advantage of some of the symmetries in the FFT. I don't
recall the details as it has been over a decade since I worked with
this, but you can fold half of the real data into the imaginary
portion, run a size N/2 FFT and then unfold the results. I believe
you have to run a final N sized complex pass on these results to get
your final answer. I do recall it saved a *lot* when performing
FFTs,
nearly 50%.

My understanding is that there were some software packages that baked
that into the algorithm, for what savings I don't know. I was
wondering if it was done for FFTs as well.

I'm not sure what you mean. When you say "baking" it into the
algorithm, it would do pretty much what I described. That *is* the
algorithm. I haven't heard of any other optimizations. The savings
is in time/size. Essentially it does an N/2 size FFT with an extra
pass, so instead of N*log(N) step it takes (N+1)*log(N/2) steps.

There's a distinct symmetry to the Fourier transform that you could use
at each step of the way instead of doing the whole thing and fixing it
up at the end.

I don't know if it would save steps, but it would certainly be easier
on someone who just wants to apply an algorithm.

Are you referring to the nature of the FFT on real signals (i.e. data
sets that have zeros for the imaginary data)? That would only help with
the first pass of butterflies. Once your perform those the imaginary
part is not zero anymore.

Yes, but the imaginary and real parts are still symmetrical around f = 0,
so you should neither have to compute nor use them.

That's why you get very little optimization by plugging in the zero data
and letting the tools work it out.

Yes, unless the optimizers get _really_ smart.

We are talking two different things. The HDL synthesis optimizers are
optimizing *logic*. They don't know diddly about what you are using the
logic for.


On the other hand, the vendor's FFT logic generator should support real
data inputs. It is a common enough need.

I agree, but when I looked I didn't see it.

The Xilinx logic generators also seem to only support 2^n vector sizes.
I don't know how convoluted things get, but I know that with a general-
purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n
one. Don't ask me how -- it's just a box with magic inside for me at
this point.

When I first learned FFTs, I did it by looking at the equations in terms
of the twiddle factors each input was multiplied by as it contributed to
the final value. It works out to be the same calculation as the DFT.
It is taking advantage of the cyclic nature of the twiddle factors to
avoid redundant calculations. The same basis provides the various
symmetries.

I can't say how prime sized data sets work out. I'm very surprised they
even exist. I can't picture how the butterflies would work.

--

Rick C
 
Tim Wescott <seemywebsite@myfooter.really> wrote:

On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote:

On 2/13/2017 9:39 PM, Tim Wescott wrote:

Are you referring to the nature of the FFT on real signals (i.e. data
sets that have zeros for the imaginary data)? That would only help
with the first pass of butterflies. Once your perform those the
imaginary part is not zero anymore.

Yes, but the imaginary and real parts are still symmetrical around f =
0,
so you should neither have to compute nor use them.

That's why you get very little optimization by plugging in the zero
data and letting the tools work it out.

Yes, unless the optimizers get _really_ smart.

We are talking two different things. The HDL synthesis optimizers are
optimizing *logic*. They don't know diddly about what you are using the
logic for.

Well, yes, but these computers are getting smarter all the time. When
you tell Synopsis to synthesize and it says "your fly is unzipped",
you'll know that the optimizers are too damned smart.

If for example you compute (a * b) and also compute (a * -b),
the synthesizer is smart enough to know there are not two full
multipliers needed.

I'm very unclear that it would not optimize past the first butterfly.
They can also optimize across pipeline stages and also modify
the number of bits of state and their logic values within a pipeline
register.

But, to know for sure, try running it and see what happens.
(And you want to use Synopsis or the equivalent, and probably not
Xilinx/Altera native synthesizers...)

If faced with this design I would certainly give it a shot and
see if the synthesis is good enough, rather than assuming it isn't
and embarking on a major bit of perhaps unneeded design work.



Steve
 

Welcome to EDABoard.com

Sponsor

Back
Top