FIR filter running out of FPGA memory in stratix ep1s60

W

Wilhelm Klink

Guest
I've got an FIR design that runs out of FPGA memory in an ep1s60 when
I set the data width to 24-bit (The design fits with a data width of
16-bit). However only 13% of the total memory is used. I assume the
problem is that I have lots of smaller memories, and they cannot share
the same memory blocks (M512, M4K, M-RAM). Can anyone who has
experienced this problem share their strategies for dealing with this.
 
kommandantklink@hotmail.com (Wilhelm Klink) wrote in message news:<6011e208.0407112220.19b999a7@posting.google.com>...
I've got an FIR design that runs out of FPGA memory in an ep1s60 when
I set the data width to 24-bit (The design fits with a data width of
16-bit). However only 13% of the total memory is used. I assume the
problem is that I have lots of smaller memories, and they cannot share
the same memory blocks (M512, M4K, M-RAM). Can anyone who has
experienced this problem share their strategies for dealing with this.
After fitting there is the FITTER report (resource section --> fitter resource
usage summary).
Here you can see the usage of total memory bits and the usage of complete
M4K memory blocks.

Rgds
 
You'll need to provide more details as to how you set up the memory as
well as the filter. If the sample rate is one clock per sample, then it
is not really appropriate to use the memory because you are using only one
location per memory (and wasting the rest).

What is the ratio of your data rate to the clock?
How many taps is your filter?


Wilhelm Klink wrote:

I've got an FIR design that runs out of FPGA memory in an ep1s60 when
I set the data width to 24-bit (The design fits with a data width of
16-bit). However only 13% of the total memory is used. I assume the
problem is that I have lots of smaller memories, and they cannot share
the same memory blocks (M512, M4K, M-RAM). Can anyone who has
experienced this problem share their strategies for dealing with this.
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
After viewing the fitter RAM summary details I can say the following:

The filter is a cascaded polyphase FIR. There are three stages, and
the first stage is a decimate by 20, so has 20 polyphase arms. Each
polyphase arm comprises of a distributed arithmetic unit. The samples
are distributed as parallel data to each polyphase arm, and then
serialised. My current interface at the input of each polyphase arm
uses 3 large registers of size nis*data_width, where nis = number of
interleaved streams = 8, and data_width = 32. 3 x 256 = 768, and
multiplying this by the number of phase arms across all cascaded
stages, this number gets very very large. Because 1FF = 1LE, this
takes up heaps of LEs, so I decided to implement these registers in
memory. In my data_width = 16 implementation (which fits in the
device) these registers are 128 bits in size, and constitute a depth
1, width 128 memory. Clearly depth 1 memories will result in poor use
of memory resources. We have M512 = width 18, M4K = width 36, M-RAM =
width 144, so I'd expect each register to require 4 x M4Ks, or 8 x
M512s. Surprisingly, according to the fitter RAM summary, one of the
worst offending 128-bit registers used 54 x M4Ks and 8 x M512s.

Regardless of this problem I see that it was a BAD idea to fully
parallelise the data in the input interface of each polyphase arm
(seemed the easiest way at the time though).

Ray Andraka <ray@andraka.com> wrote in message news:<40F4532F.DBB4F364@andraka.com>...
You'll need to provide more details as to how you set up the memory as
well as the filter. If the sample rate is one clock per sample, then it
is not really appropriate to use the memory because you are using only one
location per memory (and wasting the rest).

What is the ratio of your data rate to the clock?
How many taps is your filter?


Wilhelm Klink wrote:

I've got an FIR design that runs out of FPGA memory in an ep1s60 when
I set the data width to 24-bit (The design fits with a data width of
16-bit). However only 13% of the total memory is used. I assume the
problem is that I have lots of smaller memories, and they cannot share
the same memory blocks (M512, M4K, M-RAM). Can anyone who has
experienced this problem share their strategies for dealing with this.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
Correction: I realise that the 54 x M4Ks and 8 x M512s memory usage
must be due to sharing of memories.

kommandantklink@hotmail.com (Wilhelm Klink) wrote in message news:<6011e208.0407140323.3f13a001@posting.google.com>...
After viewing the fitter RAM summary details I can say the following:

The filter is a cascaded polyphase FIR. There are three stages, and
the first stage is a decimate by 20, so has 20 polyphase arms. Each
polyphase arm comprises of a distributed arithmetic unit. The samples
are distributed as parallel data to each polyphase arm, and then
serialised. My current interface at the input of each polyphase arm
uses 3 large registers of size nis*data_width, where nis = number of
interleaved streams = 8, and data_width = 32. 3 x 256 = 768, and
multiplying this by the number of phase arms across all cascaded
stages, this number gets very very large. Because 1FF = 1LE, this
takes up heaps of LEs, so I decided to implement these registers in
memory. In my data_width = 16 implementation (which fits in the
device) these registers are 128 bits in size, and constitute a depth
1, width 128 memory. Clearly depth 1 memories will result in poor use
of memory resources. We have M512 = width 18, M4K = width 36, M-RAM =
width 144, so I'd expect each register to require 4 x M4Ks, or 8 x
M512s. Surprisingly, according to the fitter RAM summary, one of the
worst offending 128-bit registers used 54 x M4Ks and 8 x M512s.

Regardless of this problem I see that it was a BAD idea to fully
parallelise the data in the input interface of each polyphase arm
(seemed the easiest way at the time though).

Ray Andraka <ray@andraka.com> wrote in message news:<40F4532F.DBB4F364@andraka.com>...
You'll need to provide more details as to how you set up the memory as
well as the filter. If the sample rate is one clock per sample, then it
is not really appropriate to use the memory because you are using only one
location per memory (and wasting the rest).

What is the ratio of your data rate to the clock?
How many taps is your filter?


Wilhelm Klink wrote:

I've got an FIR design that runs out of FPGA memory in an ep1s60 when
I set the data width to 24-bit (The design fits with a data width of
16-bit). However only 13% of the total memory is used. I assume the
problem is that I have lots of smaller memories, and they cannot share
the same memory blocks (M512, M4K, M-RAM). Can anyone who has
experienced this problem share their strategies for dealing with this.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 

Welcome to EDABoard.com

Sponsor

Back
Top