DDS

M

maxascent

Guest
What is the best way to implement a multi channel DDS. I need a DDS tha
has 8 channels that are time-multiplexed. I am using a Sparatn 6.

Thanks

---------------------------------------
Posted through http://www.FPGARelated.com
 
Den mandag den 2. marts 2015 kl. 16.03.14 UTC+1 skrev maxascent:
What is the best way to implement a multi channel DDS. I need a DDS that
has 8 channels that are time-multiplexed. I am using a Sparatn 6.

what's wrong with this?

http://www.xilinx.com/support/documentation/ip_documentation/dds_compiler/v6_0/pg141-dds-compiler.pdf


-Lasse
 
On 3/2/2015 10:03 AM, maxascent wrote:
What is the best way to implement a multi channel DDS. I need a DDS that
has 8 channels that are time-multiplexed. I am using a Sparatn 6.

A DDS circuit is not as simple as some would think and it is not as
complex as others would lead you to believe. What you need to be aware
of is that they can produce spurs if not carefully designed.

That said, I'm not sure what a "multichannel" DDS is. If you mean 8 DDS
circuits, then ok, that is clear enough. Are you looking for some way
to share the circuitry? The circuitry is not overly complex - it
usually consists of a counter or adder to set the phase and a sine look
up table to convert the phase to a sine value for the output.

If you don't wish to fully duplicate this circuit and your speed
requirements are such that you can multiplex the logic, you only need to
duplicate the phase step size register and the phase accumulator an add
some circuitry to multiplex them through the adder and look up table.
This is easy to do in an FPGA by using LUTs as an 8 register bank.

If you have a phase offset register that needs to be replicated and
muxed as well.

If you need low spurs and high resolution to your DAC you can use some
approximations to a sine value using the trig function

sin(A+B) = sin A cos B + cos A sin B
or
cos(A+B) = cos A cos B − sin A sin B

Either one will do once we make the following approximations...

The main one is based on A being the MSBs and B being the LSBs. So A
will be a coarse angle over the full 90° and B will always be a very
small angle. The main approximation is that cos B will be very close to
1 so that you replace it with 1. Then the first term in each equation
will just be one trig lookup for the coarse value if sin A or cos A.

The second term can be looked up using the same first table for A and a
second table for B which has the fine values, then multiply to get the
product. Or you can make another approximation. Since the value of sin
B is very small, the second term is going to be very small. This means
the error from truncating the A and B inputs to the second term will
also be small. So instead of a full size trig table for B you can use a
single table with a truncated A and truncated B input with the second
term as the output saving the multiplier. This will give you larger
spurs than using the full look up tables and performing the multiply,
but will save some hardware. In either case the spurs will be *much*
smaller than if you simply use the truncated term sin A or cos A.

--

Rick
 
One possibility is to implement the waveform as polynomial / spline.

The Horner scheme on Spartan 6 works nicely with four cycles pipelin
delay. In other words, you can run four independent channels using the sam
multiplier and one port of a dual-port RAM for coefficients. The second RA
port can serve a second multiplier => 8 channels.

A "mainstream" DDS would be my first choice - don't fix it if it ain'
broken. The above might work, depending on your application's needs, an
would be fairly compact.

You can find example Verilog code for a four-variable pipelined polynomia
interpolator here, at the bottom of the post ("Pipelined RT
implementation")

http://www.dsprelated.com/showarticle/594.php

There is a matlab script included to calculate the fixed poin
coefficients, e.g. edit the "chirp example"
y = cos(2*pi*x.^2*5);
to a plain sine wave.


---------------------------------------
Posted through http://www.FPGARelated.com
 
On 3/5/2015 4:43 PM, mnentwig wrote:
One possibility is to implement the waveform as polynomial / spline.

The Horner scheme on Spartan 6 works nicely with four cycles pipeline
delay. In other words, you can run four independent channels using the same
multiplier and one port of a dual-port RAM for coefficients. The second RAM
port can serve a second multiplier => 8 channels.

A "mainstream" DDS would be my first choice - don't fix it if it ain't
broken. The above might work, depending on your application's needs, and
would be fairly compact.

The problem with the mainstream DDS is that for any app where spurs
close to the carrier is a problem, it *is* broken. That's why I
suggested the calculations to extend the precision of the LUT method.
Of course they are only needed if the phase noise is a problem.


You can find example Verilog code for a four-variable pipelined polynomial
interpolator here, at the bottom of the post ("Pipelined RTL
implementation")

http://www.dsprelated.com/showarticle/594.php

There is a matlab script included to calculate the fixed point
coefficients, e.g. edit the "chirp example"
y = cos(2*pi*x.^2*5);
to a plain sine wave.

I don't follow your notation. What does the period after the 'x' mean?

What sort of phase noise does your polynomial generate? Are there spurs
close to the carrier? Many apps need spurs to be -120 dB or so from the
carrier. For some apps the spurs need to be either that low to start
with or are far enough in frequency from the carrier so they can be
filtered to that level.

--

Rick
 
Hi Rick,

I don't have any hard data on the signal quality as I used this fo
audio-frequency (modeling a Vox Continental electron organ with somethin
like 96 independent oscillators). Spurs "should" be an implementatio
issue, but then most things are...

Pocket calculators use polynomials for function approximation, so th
method itself doesn't worry me. Increasing polynomial order is usuall
quite effective, compared to increasing lookup table resolution.
It might help to use a wider multiplier and more (e.g. 8-stage) pipelining
With 18 bit arithmetics in my example implementation, the total SNR ove
the whole bandwidth can't exceed 18*6+1.7 ~ 100 dB and the implementatio
is probably 10 dB worse than that (e.g. 1 LSB error would be 6 dB loss).

What makes the method attractive is that multi-channel operation ca
exploit the pipelining, which is needed anyway to manage the critical pat
in the Horner scheme calculation.

y = cos(2*pi*x.^2*5);
I don't follow your notation. What does the period after the 'x' mean?

This refers to the matlab script that calculates the polynomia
coefficients (link from the blog article). It's Matlab notation for "squar
every vector entry individually". This example creates a chirp wavefor
with linearly increasing frequency. To create any other waveform, e.g.
plain sinewave, put it here into the script.

Spur performance, I didn't analyze this.
Intuitively, I don't see anything that couldn't be "cleaned up" - i
nothing else helps, dither the phase accumulator before it goes into th
polynomial, lose some wideband SNR but get rid of discrete spurs.

In other words: I have used this for heavily multi-channel tone generation
but not to communications-quality requirements. I don't see any har
obstacles, but the proof is left to the reader.

---------------------------------------
Posted through http://www.FPGARelated.com
 
On 3/6/2015 4:40 AM, mnentwig wrote:
Hi Rick,

I don't have any hard data on the signal quality as I used this for
audio-frequency (modeling a Vox Continental electron organ with something
like 96 independent oscillators). Spurs "should" be an implementation
issue, but then most things are...

Of course, but the method used imposes costs for any given requirement
for spurs and that is the issue. How complex does the logic need to be
for a given quality of signal and in particular, how the specifics of
that quality level affects a given application.


Pocket calculators use polynomials for function approximation, so the
method itself doesn't worry me. Increasing polynomial order is usually
quite effective, compared to increasing lookup table resolution.
It might help to use a wider multiplier and more (e.g. 8-stage) pipelining.
With 18 bit arithmetics in my example implementation, the total SNR over
the whole bandwidth can't exceed 18*6+1.7 ~ 100 dB and the implementation
is probably 10 dB worse than that (e.g. 1 LSB error would be 6 dB loss).

That depends on the number of taps in the polynomial, which I assume you
equate to the number of stages in your pipeline. In that case the
number of stages is the number of multipliers, e.g. the cost in terms of
logic.


What makes the method attractive is that multi-channel operation can
exploit the pipelining, which is needed anyway to manage the critical path
in the Horner scheme calculation.

Utilizing pipelining is a separate issue really. Nearly any method can
do that, even a table lookup.


y = cos(2*pi*x.^2*5);
I don't follow your notation. What does the period after the 'x' mean?

This refers to the matlab script that calculates the polynomial
coefficients (link from the blog article). It's Matlab notation for "square
every vector entry individually". This example creates a chirp waveform
with linearly increasing frequency. To create any other waveform, e.g. a
plain sinewave, put it here into the script.

I can't say I follow the notation.


Spur performance, I didn't analyze this.
Intuitively, I don't see anything that couldn't be "cleaned up" - if
nothing else helps, dither the phase accumulator before it goes into the
polynomial, lose some wideband SNR but get rid of discrete spurs.

In other words: I have used this for heavily multi-channel tone generation,
but not to communications-quality requirements. I don't see any hard
obstacles, but the proof is left to the reader.

Ok, thanks.

--

Rick
 
Hi,

>> Of course, but the method used imposes costs for any given requiremen
for spurs and that is the issue. How complex does the logic need to be fo
a given quality of signal and in particular, how the specifics of tha
quality level affects a given application.

well, the answer could use some better requirements and a couple of days
working time :)

>> That depends on the number of taps in the polynomial, which I assume yo
equate to the number of stages in your pipeline. In that case the
number of stages is the number of multipliers, e.g. the cost in terms of
logic.

Not necessarily. My example implementation (previous link) maps fou
channels and all polynomial coefficients to a single multiplier. As audi
frequency example, clock it at 100 MHz for a 96 kHz sample rate => 100
cycles per sample. Use 10 cycles per waveform (e.g. 5th order polynomia
plus some overhead) and I can generate 100 independent waveforms using
single multiplier.
This can be very compact even with fully independent coefficients, becaus
the overhead is fairly cheap, when address selection in a block ram muxe
most of the wide signals.
Using one multiplier per polynomial term would be also possible for highe
output rate.

y = cos(2*pi*x.^2*5);
I can't say I follow the notation.

Never mind that. It's just a line from the Matlab example (link) where
can put the function, for which I want the fixed-point polynomia
coefficients.
It's fairly straightforward in the context of the Matlab script.

>> Utilizing pipelining is a separate issue really. Nearly any method ca

do that, even a table lookup.
Well, yes, true. But for the polynomial, the critical path is comparativel
long (i.e. four multiplications in series, each using four cycles delay) s
the pipelining makes a big difference.

Cheers

Markus


---------------------------------------
Posted through http://www.FPGARelated.com
 
On 3/8/2015 5:28 AM, mnentwig wrote:
Hi,

Of course, but the method used imposes costs for any given requirement
for spurs and that is the issue. How complex does the logic need to be for
a given quality of signal and in particular, how the specifics of that
quality level affects a given application.

well, the answer could use some better requirements and a couple of days'
working time :)

That depends on the number of taps in the polynomial, which I assume you
equate to the number of stages in your pipeline. In that case the
number of stages is the number of multipliers, e.g. the cost in terms of
logic.

Not necessarily. My example implementation (previous link) maps four
channels and all polynomial coefficients to a single multiplier. As audio
frequency example, clock it at 100 MHz for a 96 kHz sample rate => 1000
cycles per sample. Use 10 cycles per waveform (e.g. 5th order polynomial
plus some overhead) and I can generate 100 independent waveforms using a
single multiplier.

Yes, here your cost is time rather than logic. That is the tradeoff.
If you were doing faster calculations like radar, for example, you would
not have the option of multiplexing the hardware. You would need to
burn more logic.


This can be very compact even with fully independent coefficients, because
the overhead is fairly cheap, when address selection in a block ram muxes
most of the wide signals.
Using one multiplier per polynomial term would be also possible for higher
output rate.

The tradeoff remains. You have to do more work to get a higher
precision result with a polynomial, not just in terms of the order of
the polynomial, but increased resolution in the multiplies as well.


y = cos(2*pi*x.^2*5);
I can't say I follow the notation.

Never mind that. It's just a line from the Matlab example (link) where I
can put the function, for which I want the fixed-point polynomial
coefficients.
It's fairly straightforward in the context of the Matlab script.

Why not just explain your equation in terms I can understand? Are you
trying to obfuscate it?


Utilizing pipelining is a separate issue really. Nearly any method can

do that, even a table lookup.
Well, yes, true. But for the polynomial, the critical path is comparatively
long (i.e. four multiplications in series, each using four cycles delay) so
the pipelining makes a big difference.

You just gave an example where you had time to perform 1000 multiplies
for each result and so used no pipelining. Still, the point remains
that pipelining has nothing to do with resolution really.

--

Rick
 
Hi Rick,

when the critical path of an operation is x cycles long, you have a choic
that includes the options of
a) utilizing your hardware 1 cycle out of x cycles, effectively wastin
(x-1) cycles
and
b) pipelining x independent operations and utilizing the hardware x cycle
out of x.

For higher-order polynomial interpolation, x is relatively high (say, 2
cycles), that's where pipelining comes in in the context of this threa
(multi-channel DDS).

With regard to the equation, if you can't disentangle it, please start
new thread. It is an example describing an arbitrary chirp function, whic
is off-topic for this discussion, other than pointing out the line where t
put your desired function into my matlab script.


---------------------------------------
Posted through http://www.FPGARelated.com
 
On 3/14/2015 5:39 PM, mnentwig wrote:
Hi Rick,

when the critical path of an operation is x cycles long, you have a choice
that includes the options of
a) utilizing your hardware 1 cycle out of x cycles, effectively wasting
(x-1) cycles
and
b) pipelining x independent operations and utilizing the hardware x cycles
out of x.

Where did these cycles come from? Logic takes some amount of time to
process. I can make my clock cycles match my logic if I choose. I'm
not sure where you are going with this. I believe we all understand
pipelining.


For higher-order polynomial interpolation, x is relatively high (say, 20
cycles), that's where pipelining comes in in the context of this thread
(multi-channel DDS).

But that depends on many things such as the relative timing of your
clock and your logic. You seem to be supposing that each calculation in
your algorithm requires a register, a clock cycle and a pipeline stage.
The logic *can* be linear without registers. It depends on the
application.


With regard to the equation, if you can't disentangle it, please start a
new thread. It is an example describing an arbitrary chirp function, which
is off-topic for this discussion, other than pointing out the line where to
put your desired function into my matlab script.

I think I asked you to explain your script rather than my learning
Matlab. Any chance of using a more conventional notation?

--

Rick
 

Welcome to EDABoard.com

Sponsor

Back
Top