Cyclone V decimation

P

Piotr Wyderski

Guest
Hi,

the input signal is 14 bits signed@750ksps. I would like to decimate it
by a modest factor of ~3000. What would be the best way of doing it on a
Cyclone V, resource-wise? My usual approach would be a cascade of CIC
decimators followed by a FIR corrector, but since there are the DSP
blocks, I don't feel it to be the "right" (albeit correct) approach. I'm
new to the V family and lack the proper intuitions, so could someone
more versed
suggest me a good direction?

In fact, there will be 12 such channels, all going in sync,
so maybe a considerable resouce sharing can be achieved?

Best regards, Piotr
 
On Saturday, February 23, 2019 at 2:32:04 AM UTC-5, Piotr Wyderski wrote:
Hi,

the input signal is 14 bits signed@750ksps. I would like to decimate it
by a modest factor of ~3000. What would be the best way of doing it on a
Cyclone V, resource-wise? My usual approach would be a cascade of CIC
decimators followed by a FIR corrector, but since there are the DSP
blocks, I don't feel it to be the "right" (albeit correct) approach. I'm
new to the V family and lack the proper intuitions, so could someone
more versed
suggest me a good direction?

In fact, there will be 12 such channels, all going in sync,
so maybe a considerable resouce sharing can be achieved?

Best regards, Piotr

To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals?

Rick C.
 
gnuarm.deletethisbit@gmail.com wrote:

> To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals?

Minimisation of resource usage, or in other words, a decimation
technique that maps best onto the underlying primitives. I believe
those 200+ DSP (multiply-accumulate) blocks are good for something...

Best regards, Piotr
 
On Saturday, February 23, 2019 at 6:17:28 PM UTC+2, Piotr Wyderski wrote:
gnuarm.deletethisbit@gmail.com wrote:

To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals?

Minimisation of resource usage, or in other words, a decimation
technique that maps best onto the underlying primitives. I believe
those 200+ DSP (multiply-accumulate) blocks are good for something...

Best regards, Piotr

If all you want is minimization of resource usage then just do CIC.

Something else makes sense only if you want very flat pass band and very sharp transition between pass band and stop band.

The problem with using generic FIR for decimation is not computation, which for your requirements would be minimal, but storage, both for coefficients and for delay line. Decimation by 3000 would need something like 15K coefficients for good filter shape or twice as many for very good shape. Coefficients storage could be cut in half due to filter's symmetry, but I am not aware of similar trick for delay line. So, overall you will need just 1 DSP block, but 40 to 80 M10K blocks.
Of course, you always can trade storage for simplicity, by building you decimation chain as a cascade, probably sizing each stage for delay line to fit in 1 M10K block. Then the whole chain will take 3 stages and only 6 M10K blocks and filter shape could still be excellent. Or, may be, even 2 M10K blocks if you are ready to complicate a control machine a little more by placing all delay lines in a common M10K and doing the same for coefficients, But it is worth an increased complexity? I am not sure.
And then there is variant in the middle - cascade of 2 stages instead of 3. Then each delay line and each set of FIR taps will fit in M9K, but two delay line wouldn't fit. So, with a bit of control acrobatics you could fit the whole cascade in 3 M9K blocks. Still, do it only if you care about shape of the filter , but don't do it for resources alone.
 
already5chosen@yahoo.com wrote:

If all you want is minimization of resource usage then just do CIC > Something else makes sense only if you want very flat pass band and
very sharp transition between pass band and stop band.

There is very little to no energy in the upper part of the band. The
high ADC speed is there for other reasons. Therefore, CIC will be more
than enough, at least in the first stages of the cascade. I don't know
yet if it would be sufficient for the final stage, but this is a detail
that can be tweaked in a later phase.

So I have a licensing type of a question: can I instantiate DSP blocks
in Quartus Lite? I know the DSP builder is an extra paid tool, but I
don't need it -- a purely Verilog instantiation would be sufficient.
This block appears to have a decent accumulator, so it could relieve the
ALMs otherwise needed by the register-hungry CIC.

Thank you!

Best regards, Piotr
 
First of all, since your sample rates are pretty low, I'd see if it's possible to use a DSP chip instead of an FPGA. Everything is easier in software..

Everything depends on your specs, which you have not stated. Namely: what is the attenuation of the stopband, and what is the slope between the passband and the stopband? You say there is not much in the upper frequencies, so this makes it sound like your filtering requirements are very low. If there is nothing much at all up there, you don't even need to filter. Just decimate. Take every nth sample.

The point of the CIC is to reduce the need for multipliers, but you have plenty of multipliers and low sample rates. The CIC has big sidelobes. It might be better to do a cascade of FIRs each with low numbers of taps.
 
On Saturday, February 23, 2019 at 11:17:28 AM UTC-5, Piotr Wyderski wrote:
gnuarm.deletethisbit@gmail.com wrote:

To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals?

Minimisation of resource usage, or in other words, a decimation
technique that maps best onto the underlying primitives. I believe
those 200+ DSP (multiply-accumulate) blocks are good for something...

Best regards, Piotr

Is that your only criterion? Along with the 200+ DSP blocks I would expect the chip has many thousands of LUTs and FFs. Why focus on DSP block usage?

I don't see a problem of using the CIC decimators if they otherwise work the way you want. A CIC filter had sharp nulls a particular points but doesn't do so much elsewhere while being very logic and energy efficient. They are typically finished by a relatively short FIR so the aggregate delay is not so large. Doing it all in a single filter would create a much longer delay, no?

Other than the power usage of a large decimating FIR filter, I can't think of other trade offs.

Rick C.
 
gnuarm.deletethisbit@gmail.com wrote:

> Is that your only criterion?

Well, basiclly, yes, it is the only degree of freedom. In other words:
I can design any filtering structure that satisfies my requirements from
the signal processing point of view, but not all structures are equally
welcome by the FPGA, let alone an FPGA with DSP slices. Hence my question.

I've already done it with a multistage CIC alone, but the hardware
was much simpler and CIC approach was the only viable one.

Along with the 200+ DSP blocks I would expect the chip has many
thousands of LUTs and FFs. Why focus on DSP block usage?

One reason is to learn them, other is the ability to use a smaller chip.
A DSP block is composed of two multipliers and an accumulator. The
accumulator is what a CIC needs. There will be plenty of other functions
occupying that FFs.

Best regards, Piotr
 
On Sunday, February 24, 2019 at 1:23:21 AM UTC-5, Piotr Wyderski wrote:
gnuarm.deletethisbit@gmail.com wrote:

Is that your only criterion?

Well, basiclly, yes, it is the only degree of freedom. In other words:
I can design any filtering structure that satisfies my requirements from
the signal processing point of view, but not all structures are equally
welcome by the FPGA, let alone an FPGA with DSP slices. Hence my question..

I've already done it with a multistage CIC alone, but the hardware
was much simpler and CIC approach was the only viable one.

Along with the 200+ DSP blocks I would expect the chip has many
thousands of LUTs and FFs. Why focus on DSP block usage?

One reason is to learn them, other is the ability to use a smaller chip.
A DSP block is composed of two multipliers and an accumulator. The
accumulator is what a CIC needs. There will be plenty of other functions
occupying that FFs.

You haven't given us much to go on. As some have pointed out you can do the decimation in multiple stages and use smaller FIR filters at each point, or use on ginormous FIR filter. In both cases a polyphase organization will reduce the number of calculations needed. Or you can use the CIC filter as a front end. I don't know any of the details, so I have no way of calculating the resource usage.

I think it is pretty obvious what the trade offs are. Squeeze here and this toothpaste comes out there. Squeeze there and other toothpaste comes out somewhere else.

To know where to squeeze and how hard the numbers are important.

Rick C.
 
already5chosen@yahoo.com wrote:

> If all you want is minimization of resource usage then just do CIC.

As an afterthought: given the number of channels, their relative slow
speed and the requirement of lockstep processing, perhaps a bit-serial
CIC would be a good idea?

Other parts of the design can benefit greatly from massive application
of this approach and it would be a powerful cerebral decalcifier. I think
it is worth doing even if just to learn it makes no sense.

Thank you all for your help!

Best regards, Piotr
 
On Monday, February 25, 2019 at 2:36:33 AM UTC-5, Piotr Wyderski wrote:
already5chosen@yahoo.com wrote:

If all you want is minimization of resource usage then just do CIC.

As an afterthought: given the number of channels, their relative slow
speed and the requirement of lockstep processing, perhaps a bit-serial
CIC would be a good idea?

Other parts of the design can benefit greatly from massive application
of this approach and it would be a powerful cerebral decalcifier. I think
it is worth doing even if just to learn it makes no sense.

Thank you all for your help!

When I have looked at performing bit serial calculations I've found it to not be a large savings of logic and often using more FFs. If you use some form of RAM, either distributed or block, the FF savings can be good. I suppose the Xilinx LUT shift registers come in handy for this. I think they are still the only ones doing that.

I suppose once you get your head wrapped around the bit serial thing, it can be easy to do. It can make it a bit harder to extend the precision at each stage since that means the bit count changes and so the timing.

Rick C.
 
On 2/22/19 11:31 PM, Piotr Wyderski wrote:
Hi,

the input signal is 14 bits signed@750ksps. I would like to decimate it
by a modest factor of ~3000. What would be the best way of doing it on a
Cyclone V, resource-wise? My usual approach would be a cascade of CIC
decimators followed by a FIR corrector, but since there are the DSP
blocks, I don't feel it to be the "right" (albeit correct) approach. I'm
new to the V family and lack the proper intuitions, so could someone
more versed
suggest me a good direction?

In fact, there will be 12 such channels, all going in sync,
so maybe a considerable resouce sharing can be achieved?

    Best regards, Piotr

This may be a better question over at comp.dsp.

That said, and given what you've said in other responses, your best
answer may be to use a polyphase decimating FIR filter. In effect,
you'd use a 12000 tap FIR filter, but only 4 taps of it at a time.

Understanding Digital Signal Processing (Lyons, 2011) has a good enough
treatment on the subject for a general purpose DSP book. Multirate
Digital Signal Processing (Crochiere and Rabiner, 1983) has an excellent
and extremely rigorous treatment on the subject, but is out-of-print and
a far less general book in general.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
 
Le samedi 23 fÊvrier 2019 02:32:04 UTC-5, Piotr Wyderski a Êcrit :
Hi,

the input signal is 14 bits signed@750ksps. I would like to decimate it
by a modest factor of ~3000. What would be the best way of doing it on a
Cyclone V, resource-wise? My usual approach would be a cascade of CIC
decimators followed by a FIR corrector, but since there are the DSP
blocks, I don't feel it to be the "right" (albeit correct) approach. I'm
new to the V family and lack the proper intuitions, so could someone
more versed
suggest me a good direction?

In fact, there will be 12 such channels, all going in sync,
so maybe a considerable resouce sharing can be achieved?

Best regards, Piotr

You could also use halfband FIR filters, they are really efficient. Again, I really recommed Rick Lyon DSP book, it is a really good book, it is not too mathy. Basically a 16-tap halfband filter will only use 4 multipliers instead of 16.

Assuming you decimate by 2048 i.e 2^11, you would need abut 44 multipliers. Furthermore, you can time-multiplex and reuse the multipliers, so you could probably get by using one hardware multiplier per stage for a total of 11 multipliers.
 
mandag den 25. februar 2019 kl. 22.38.02 UTC+1 skrev Benjamin Couillard:
Le samedi 23 fÊvrier 2019 02:32:04 UTC-5, Piotr Wyderski a Êcrit :
Hi,

the input signal is 14 bits signed@750ksps. I would like to decimate it
by a modest factor of ~3000. What would be the best way of doing it on a
Cyclone V, resource-wise? My usual approach would be a cascade of CIC
decimators followed by a FIR corrector, but since there are the DSP
blocks, I don't feel it to be the "right" (albeit correct) approach. I'm
new to the V family and lack the proper intuitions, so could someone
more versed
suggest me a good direction?

In fact, there will be 12 such channels, all going in sync,
so maybe a considerable resouce sharing can be achieved?

Best regards, Piotr

You could also use halfband FIR filters, they are really efficient. Again, I really recommed Rick Lyon DSP book, it is a really good book, it is not too mathy. Basically a 16-tap halfband filter will only use 4 multipliers instead of 16.

Assuming you decimate by 2048 i.e 2^11, you would need abut 44 multipliers. Furthermore, you can time-multiplex and reuse the multipliers, so you could probably get by using one hardware multiplier per stage for a total of 11 multipliers.

with each stage running at half the rate of the previous it should be
possible to stagger the calculations so you only need (slightly less)
than twice the first stage

1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1....
-2---2---2---2---2---2---2---2---....
---3-------3-------3-------3-----....
-------4---------------4---------....
 
gnuarm.deletethisbit@gmail.com wrote:

> When I have looked at performing bit serial calculations I've found it to not be a large savings of logic and often using more FFs.

You are right, several initial attempts indicate that the savings are
minor if I apply time multiplexing carefully. It was a refreshing
experience, though, so no time wasted.

The large decimation factor implies the final bandwidth is narrow, so
even a very modest 4-stage decimating by 4 CIC filter has about 100dB
of attenuation around the +/-20kHz DC image frequencies. There will be
considerable aliasing above that, but I'm going to filter it out anyway
later, so why bother. The subsequent filters will work at a much lower
data rate, so I can bump up their order or even change their topology to
something other than a CIC.

Lesson learned: narrow-band CIC attenuation doesn't depend on the filter
order considerably. Obvious when you think about it, but for some reason
it wasn't.

OK, I have my answer, thank you all for your contribution!

Best regards, Piotr
 

Welcome to EDABoard.com

Sponsor

Back
Top