Correlator of a big antenna array on FPGA

S

ste3191

Guest
Hi, i have a serious problem with the architecture of a correlator for
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H) because
would obtain a covariance matrix of 256 x 256. Then i can think t
implement the spatial smoothing technique, namely it takes an average o
overlapped subarray, with the advantage to have a smaller covarianc
matrix. This is right but is slow technique!! I need efficient and fas
method to compute the covariance matrix on FPGA. with a less number o
multiplier possible. Infact for a covariance matrix 16 x 16 i need abou
6000 multipliers! So i have seen the correlators based on hard-limiting
sign+xor + counter) at this link

https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg=AFQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja

but i don't know if this technique is right, on simulink is very differen
from the results of normal correlator.
Can someone help me?

thank
--------------------------------------
Posted through http://www.FPGARelated.com
 
On 9/30/2015 8:13 AM, ste3191 wrote:
Hi, i have a serious problem with the architecture of a correlator for a
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H) because i
would obtain a covariance matrix of 256 x 256. Then i can think to
implement the spatial smoothing technique, namely it takes an average of
overlapped subarray, with the advantage to have a smaller covariance
matrix. This is right but is slow technique!! I need efficient and fast
method to compute the covariance matrix on FPGA. with a less number of
multiplier possible. Infact for a covariance matrix 16 x 16 i need about
6000 multipliers! So i have seen the correlators based on hard-limiting (
sign+xor + counter) at this link

https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg=AFQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja

but i don't know if this technique is right, on simulink is very different
from the results of normal correlator.
Can someone help me?

Even though your solution will be implemented in an FPGA, I'm not sure
the FPGA group is the best place to ask this question since it is about
the algorithm more than the FPGA implementation. I am cross posting to
the DSP group to see if anyone there has experience with it.

That said, you don't say what your data rate and processing rates are.
How often do you need to run this calculation? If it is slow enough you
can use the same multipliers for many computation to produce one result.
Or will this be run on every data sample at a high rate?

--

Rick
 
On 9/30/2015 8:13 AM, ste3191 wrote:
Hi, i have a serious problem with the architecture of a correlator fo
a
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H) becaus
i
would obtain a covariance matrix of 256 x 256. Then i can think to
implement the spatial smoothing technique, namely it takes an averag
of
overlapped subarray, with the advantage to have a smaller covariance
matrix. This is right but is slow technique!! I need efficient an
fast
method to compute the covariance matrix on FPGA. with a less number of
multiplier possible. Infact for a covariance matrix 16 x 16 i nee
about
6000 multipliers! So i have seen the correlators based on hard-limitin
(
sign+xor + counter) at this link


https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved DYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usgÂŻQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja

but i don't know if this technique is right, on simulink is very
different
from the results of normal correlator.
Can someone help me?

Even though your solution will be implemented in an FPGA, I'm not sure
the FPGA group is the best place to ask this question since it is about
the algorithm more than the FPGA implementation. I am cross posting to
the DSP group to see if anyone there has experience with it.

That said, you don't say what your data rate and processing rates are.
How often do you need to run this calculation? If it is slow enough yo

can use the same multipliers for many computation to produce one result

Or will this be run on every data sample at a high rate?

--

Rick

Yes, the sampling rate is higher than 80MSPS and i can't share resources
I posted it on dsp forum but nobody has answered yet
--------------------------------------
Posted through http://www.FPGARelated.com
 
On 10/1/2015 3:15 PM, ste3191 wrote:
On 9/30/2015 8:13 AM, ste3191 wrote:
Hi, i have a serious problem with the architecture of a correlator for
a
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H) because
i
would obtain a covariance matrix of 256 x 256. Then i can think to
implement the spatial smoothing technique, namely it takes an average
of
overlapped subarray, with the advantage to have a smaller covariance
matrix. This is right but is slow technique!! I need efficient and
fast
method to compute the covariance matrix on FPGA. with a less number of
multiplier possible. Infact for a covariance matrix 16 x 16 i need
about
6000 multipliers! So i have seen the correlators based on hard-limiting
(
sign+xor + counter) at this link


https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved DYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usgÂŻQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja

but i don't know if this technique is right, on simulink is very
different
from the results of normal correlator.
Can someone help me?

Even though your solution will be implemented in an FPGA, I'm not sure
the FPGA group is the best place to ask this question since it is about
the algorithm more than the FPGA implementation. I am cross posting to
the DSP group to see if anyone there has experience with it.

That said, you don't say what your data rate and processing rates are.
How often do you need to run this calculation? If it is slow enough you

can use the same multipliers for many computation to produce one result.

Or will this be run on every data sample at a high rate?

--

Rick

Yes, the sampling rate is higher than 80MSPS and i can't share resources.
I posted it on dsp forum but nobody has answered yet.

Yes, I saw that. Looks like you beat me to it. lol

I don't know where else to seek advice. Maybe talk to the FPGA vendors?
I know they have various expertise in applications. Is this something
you will end up building? If so, and it uses a lot of resources, you
should be able to get some application support.

You know, 80 MHz is not so fast for multiplies or adds. The multiplier
block in most newer FPGAs will run at 100's of MHz. So you certainly
should be able to multiplex the multiplier unit by 4x or more. But that
really doesn't solve your problem if you want to do it on a single chip.
I haven't looked at the high end, but I'm pretty sure they don't put
1500 multipliers on a chip. But it may put you in the ballpark where
you can do this with a small handful of large FPGAs. Very pricey though.

--

Rick
 
rickman wrote:
On 10/1/2015 3:15 PM, ste3191 wrote:
On 9/30/2015 8:13 AM, ste3191 wrote:
Hi, i have a serious problem with the architecture of a correlator for
a
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H) because
i
would obtain a covariance matrix of 256 x 256. Then i can think to
implement the spatial smoothing technique, namely it takes an average
of
overlapped subarray, with the advantage to have a smaller covariance
matrix. This is right but is slow technique!! I need efficient and
fast
method to compute the covariance matrix on FPGA. with a less number of
multiplier possible. Infact for a covariance matrix 16 x 16 i need
about
6000 multipliers! So i have seen the correlators based on hard-limiting
(
sign+xor + counter) at this link


https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved DYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usgÂŻQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja


but i don't know if this technique is right, on simulink is very
different
from the results of normal correlator.
Can someone help me?

Even though your solution will be implemented in an FPGA, I'm not sure
the FPGA group is the best place to ask this question since it is about
the algorithm more than the FPGA implementation. I am cross posting to
the DSP group to see if anyone there has experience with it.

That said, you don't say what your data rate and processing rates are.
How often do you need to run this calculation? If it is slow enough you

can use the same multipliers for many computation to produce one result.

Or will this be run on every data sample at a high rate?

--

Rick

Yes, the sampling rate is higher than 80MSPS and i can't share resources.
I posted it on dsp forum but nobody has answered yet.

Yes, I saw that. Looks like you beat me to it. lol

I don't know where else to seek advice. Maybe talk to the FPGA vendors?
I know they have various expertise in applications. Is this something
you will end up building? If so, and it uses a lot of resources, you
should be able to get some application support.

You know, 80 MHz is not so fast for multiplies or adds. The multiplier
block in most newer FPGAs will run at 100's of MHz. So you certainly
should be able to multiplex the multiplier unit by 4x or more. But that
really doesn't solve your problem if you want to do it on a single chip.
I haven't looked at the high end, but I'm pretty sure they don't put
1500 multipliers on a chip. But it may put you in the ballpark where
you can do this with a small handful of large FPGAs. Very pricey though.

Actually you can get up to 1,920 DSP slices on a Kintex-7 and
considerably more on the Virtex-7 and Virtex Ultrascale devices,
however a "multiplier" may eat more than one DSP slice depending
on the number of bits you want. On the other hand they are supposed
to run at 500 MHz in these parts.

--
Gabor
 
On 10/1/2015 4:14 PM, GaborSzakacs wrote:
rickman wrote:
On 10/1/2015 3:15 PM, ste3191 wrote:
On 9/30/2015 8:13 AM, ste3191 wrote:
Hi, i have a serious problem with the architecture of a correlator for
a
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H) because
i
would obtain a covariance matrix of 256 x 256. Then i can think to
implement the spatial smoothing technique, namely it takes an average
of
overlapped subarray, with the advantage to have a smaller covariance
matrix. This is right but is slow technique!! I need efficient and
fast
method to compute the covariance matrix on FPGA. with a less number of
multiplier possible. Infact for a covariance matrix 16 x 16 i need
about
6000 multipliers! So i have seen the correlators based on
hard-limiting
(
sign+xor + counter) at this link


https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved DYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usgÂŻQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja


but i don't know if this technique is right, on simulink is very
different
from the results of normal correlator.
Can someone help me?

Even though your solution will be implemented in an FPGA, I'm not sure
the FPGA group is the best place to ask this question since it is about
the algorithm more than the FPGA implementation. I am cross posting to
the DSP group to see if anyone there has experience with it.

That said, you don't say what your data rate and processing rates are.
How often do you need to run this calculation? If it is slow enough
you

can use the same multipliers for many computation to produce one
result.

Or will this be run on every data sample at a high rate?

--

Rick

Yes, the sampling rate is higher than 80MSPS and i can't share
resources.
I posted it on dsp forum but nobody has answered yet.

Yes, I saw that. Looks like you beat me to it. lol

I don't know where else to seek advice. Maybe talk to the FPGA
vendors? I know they have various expertise in applications. Is this
something you will end up building? If so, and it uses a lot of
resources, you should be able to get some application support.

You know, 80 MHz is not so fast for multiplies or adds. The
multiplier block in most newer FPGAs will run at 100's of MHz. So you
certainly should be able to multiplex the multiplier unit by 4x or
more. But that really doesn't solve your problem if you want to do it
on a single chip. I haven't looked at the high end, but I'm pretty
sure they don't put 1500 multipliers on a chip. But it may put you in
the ballpark where you can do this with a small handful of large
FPGAs. Very pricey though.


Actually you can get up to 1,920 DSP slices on a Kintex-7 and
considerably more on the Virtex-7 and Virtex Ultrascale devices,
however a "multiplier" may eat more than one DSP slice depending
on the number of bits you want. On the other hand they are supposed
to run at 500 MHz in these parts.

Are those the $1000 chips? I worked for a test equipment company once
and they used a $1500 chip in a product that sold for over $100 k. They
initially only used about 20% of the part so they could add more stuff
as upgrades. Lots of margin in a $100k product just like there's lots
of margin in a $1500 chip.

--

Rick
 
rickman wrote:
On 10/1/2015 4:14 PM, GaborSzakacs wrote:
rickman wrote:
On 10/1/2015 3:15 PM, ste3191 wrote:
On 9/30/2015 8:13 AM, ste3191 wrote:
Hi, i have a serious problem with the architecture of a correlator
for
a
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H)
because
i
would obtain a covariance matrix of 256 x 256. Then i can think to
implement the spatial smoothing technique, namely it takes an average
of
overlapped subarray, with the advantage to have a smaller covariance
matrix. This is right but is slow technique!! I need efficient and
fast
method to compute the covariance matrix on FPGA. with a less
number of
multiplier possible. Infact for a covariance matrix 16 x 16 i need
about
6000 multipliers! So i have seen the correlators based on
hard-limiting
(
sign+xor + counter) at this link


https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved DYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usgÂŻQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja



but i don't know if this technique is right, on simulink is very
different
from the results of normal correlator.
Can someone help me?

Even though your solution will be implemented in an FPGA, I'm not sure
the FPGA group is the best place to ask this question since it is
about
the algorithm more than the FPGA implementation. I am cross
posting to
the DSP group to see if anyone there has experience with it.

That said, you don't say what your data rate and processing rates are.
How often do you need to run this calculation? If it is slow enough
you

can use the same multipliers for many computation to produce one
result.

Or will this be run on every data sample at a high rate?

--

Rick

Yes, the sampling rate is higher than 80MSPS and i can't share
resources.
I posted it on dsp forum but nobody has answered yet.

Yes, I saw that. Looks like you beat me to it. lol

I don't know where else to seek advice. Maybe talk to the FPGA
vendors? I know they have various expertise in applications. Is this
something you will end up building? If so, and it uses a lot of
resources, you should be able to get some application support.

You know, 80 MHz is not so fast for multiplies or adds. The
multiplier block in most newer FPGAs will run at 100's of MHz. So you
certainly should be able to multiplex the multiplier unit by 4x or
more. But that really doesn't solve your problem if you want to do it
on a single chip. I haven't looked at the high end, but I'm pretty
sure they don't put 1500 multipliers on a chip. But it may put you in
the ballpark where you can do this with a small handful of large
FPGAs. Very pricey though.


Actually you can get up to 1,920 DSP slices on a Kintex-7 and
considerably more on the Virtex-7 and Virtex Ultrascale devices,
however a "multiplier" may eat more than one DSP slice depending
on the number of bits you want. On the other hand they are supposed
to run at 500 MHz in these parts.

Are those the $1000 chips? I worked for a test equipment company once
and they used a $1500 chip in a product that sold for over $100 k. They
initially only used about 20% of the part so they could add more stuff
as upgrades. Lots of margin in a $100k product just like there's lots
of margin in a $1500 chip.

The list price for the XC7K410T, which has 1,540 DSP slices starts at
about $1,300. A DSP slice includes a 25 x 18 bit signed multiplier.
The list price (you can see it at Digikey) for the largest Kintex-7 is
around $3,000. Virtex-7 is more expensive. I'm not suggesting this as
a solution unless there's no other way, including using several devices
which often saves money over using the largest available ones. On the
other hand you suggested that you can't get 1,500 multipliers in an
FPGA, and I was just pointing out that in fact you can get that many and
even more if you have the money to pay for it. If you can figure out
how to partition the design into say 3 or 4 pieces, you can use an
XC7K160T with 600 DSP units starting at about $210 each. This seems
to be the sweet spot (for now) in price per DSP in that series. An
Artix XC7A200T is in the same price range with a bit more logic and
740 DSP slices, but the fabric is a bit slower in that series.

My guess is that Altera has a range of parts with similar multiplier
counts, since they generally compete head to head with Xilinx and at
this point the Xilinx 7-series is old news.

--
Gabor
 
rickman wrote:
On 10/1/2015 4:14 PM, GaborSzakacs wrote:
rickman wrote:
On 10/1/2015 3:15 PM, ste3191 wrote:
On 9/30/2015 8:13 AM, ste3191 wrote:
Hi, i have a serious problem with the architecture of a correlato

for
a
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H)
because
i
would obtain a covariance matrix of 256 x 256. Then i can thin
to
implement the spatial smoothing technique, namely it takes an
average
of
overlapped subarray, with the advantage to have a smalle
covariance
matrix. This is right but is slow technique!! I need efficien
and
fast
method to compute the covariance matrix on FPGA. with a less
number of
multiplier possible. Infact for a covariance matrix 16 x 16
need
about
6000 multipliers! So i have seen the correlators based on
hard-limiting
(
sign+xor + counter) at this link



https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved DYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg¯QjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja




but i don't know if this technique is right, on simulink is very
different
from the results of normal correlator.
Can someone help me?

Even though your solution will be implemented in an FPGA, I'm not
sure
the FPGA group is the best place to ask this question since it is
about
the algorithm more than the FPGA implementation. I am cross
posting to
the DSP group to see if anyone there has experience with it.

That said, you don't say what your data rate and processing rates
are.
How often do you need to run this calculation? If it is slo
enough
you

can use the same multipliers for many computation to produce one
result.

Or will this be run on every data sample at a high rate?

--

Rick

Yes, the sampling rate is higher than 80MSPS and i can't share
resources.
I posted it on dsp forum but nobody has answered yet.

Yes, I saw that. Looks like you beat me to it. lol

I don't know where else to seek advice. Maybe talk to the FPGA
vendors? I know they have various expertise in applications. I
this
something you will end up building? If so, and it uses a lot of
resources, you should be able to get some application support.

You know, 80 MHz is not so fast for multiplies or adds. The
multiplier block in most newer FPGAs will run at 100's of MHz. S
you
certainly should be able to multiplex the multiplier unit by 4x or
more. But that really doesn't solve your problem if you want to d
it
on a single chip. I haven't looked at the high end, but I'm pretty
sure they don't put 1500 multipliers on a chip. But it may put yo
in
the ballpark where you can do this with a small handful of large
FPGAs. Very pricey though.


Actually you can get up to 1,920 DSP slices on a Kintex-7 and
considerably more on the Virtex-7 and Virtex Ultrascale devices,
however a "multiplier" may eat more than one DSP slice depending
on the number of bits you want. On the other hand they are supposed
to run at 500 MHz in these parts.

Are those the $1000 chips? I worked for a test equipment company onc

and they used a $1500 chip in a product that sold for over $100 k.
They
initially only used about 20% of the part so they could add more stuf

as upgrades. Lots of margin in a $100k product just like there's lot

of margin in a $1500 chip.


The list price for the XC7K410T, which has 1,540 DSP slices starts at
about $1,300. A DSP slice includes a 25 x 18 bit signed multiplier.
The list price (you can see it at Digikey) for the largest Kintex-7 is
around $3,000. Virtex-7 is more expensive. I'm not suggesting this as
a solution unless there's no other way, including using several devices
which often saves money over using the largest available ones. On the
other hand you suggested that you can't get 1,500 multipliers in an
FPGA, and I was just pointing out that in fact you can get that many and
even more if you have the money to pay for it. If you can figure out
how to partition the design into say 3 or 4 pieces, you can use an
XC7K160T with 600 DSP units starting at about $210 each. This seems
to be the sweet spot (for now) in price per DSP in that series. An
Artix XC7A200T is in the same price range with a bit more logic and
740 DSP slices, but the fabric is a bit slower in that series.

My guess is that Altera has a range of parts with similar multiplier
counts, since they generally compete head to head with Xilinx and at
this point the Xilinx 7-series is old news.

--
Gabor

A model of Virtex7 has more of 3000 multipliers at 600 MHz, but the
problem isn't the price but the way for compute or estimate efficiently
the large matrix.
---------------------------------------
Posted through http://www.FPGARelated.com
 
On 10/2/2015 9:20 AM, GaborSzakacs wrote:
The list price for the XC7K410T, which has 1,540 DSP slices starts at
about $1,300. A DSP slice includes a 25 x 18 bit signed multiplier.
The list price (you can see it at Digikey) for the largest Kintex-7 is
around $3,000. Virtex-7 is more expensive. I'm not suggesting this as
a solution unless there's no other way, including using several devices
which often saves money over using the largest available ones. On the
other hand you suggested that you can't get 1,500 multipliers in an
FPGA, and I was just pointing out that in fact you can get that many and
even more if you have the money to pay for it. If you can figure out
how to partition the design into say 3 or 4 pieces, you can use an
XC7K160T with 600 DSP units starting at about $210 each. This seems
to be the sweet spot (for now) in price per DSP in that series. An
Artix XC7A200T is in the same price range with a bit more logic and
740 DSP slices, but the fabric is a bit slower in that series.

My guess is that Altera has a range of parts with similar multiplier
counts, since they generally compete head to head with Xilinx and at
this point the Xilinx 7-series is old news.

Yes, thank you for bringing me up to date. I tend to work at the lower
end where you are happy if the parts *have* multipliers. lol

--

Rick
 
I've had to work at the low end, where the part is always full and I have to fake multiplication with lookup tables. Now I'm at the other end, where the volumes are low and the customer doesn't care about FPGA price so the parts are huge. They must cost a fortune. I still waste a lot of time of PAR issues, but it's wonderful having more gates, DSPs, and blockRAMs than I could ever need.
 
On 10/3/2015 3:42 PM, Kevin Neilson wrote:
> I've had to work at the low end, where the part is always full and I have to fake multiplication with lookup tables. Now I'm at the other end, where the volumes are low and the customer doesn't care about FPGA price so the parts are huge. They must cost a fortune. I still waste a lot of time of PAR issues, but it's wonderful having more gates, DSPs, and blockRAMs than I could ever need.

Personally I enjoy the challenge of fitting tight designs. To me trying
to get a part to meet timing is not as much fun as getting a part to fit
the device. I find timing analysis to be very tedious as you get
literally hundreds of failed path reports from what is basically the
same endpoints, just many variations. This makes it hard to see the
next longer path that is also failing. Reminds me of debugging a
program one mistake at a time in the old days when my first pass would
have many bugs... and the new days too sometimes. lol

Fitting can have very interesting tradeoffs. Often they are algorithmic
and require learning new ways of calculating results. I find that very
interesting.

--

Rick
 

Welcome to EDABoard.com

Sponsor

Back
Top