Altera's altsyncram MAXIMUM_DEPTH

Peter Sommerfeld · Nov 17, 2003

What does this generic means?

I am wondering if I am missing out on a possible memory optimization.

Altera's docs are decidedly vague and a search on their website brings up nothing.

-- Pete

Manfred Mücke · Nov 17, 2003

Hi Peter,

I am wondering if I am missing out on a possible memory optimization.
yes you do.

Quartus allocates memory by depth first, 512x8bit therefore uses two M4Ks
in 512x4 mode. If your memory width and depth is a power of two, allocation
order doesn't matter except for some speed details. But a 700x8bit memory
is much better allocated by width than by depth (because only 3 M4Ks are
needed for the first compared to 4 for the latter). (see
http://www.altera.com/support/kdb/rd03292002_9305.html for further details)
MAXIMUM_DEPTH should help you to force Quartus not to waste this addtional
memory block.

Unfortunately it doesn't work. Not even the way Altera thinks it should
work. I had a long (and somewhat bizarre) service request the last entry
being the following one:

-- Altera wrote
This is to let you know that a software problem request has been filed in
order to reflect this issue. I will let you know as soon the software
group gets back to me with any infomation or when a resolution is made.
-- Altera wrote much more, but [snip]

This was written the 25th of august and the service request was closed
without further comment. I have posted an additional request asking for the
actual state of the problem request about one month ago and did not receive
any answer. Either Altera doesn't care or they don't want to state that
this is an issue at present before they are able to ship the new Quartus
4.0 (hopefully fixing this and a lot of other things) - who knows?
If anyone in the group thinks he can help on this topic or has further
details I would be thankful to hear about it as Quartus wastes a lot of my
memory and this has to change!

I have to say that life with Altera mySupport is very ambiguous to me.
Answers are generally quick and friendly (which is already a lot) but
generally only helpful when problems are simple. Whenever the problem gets
more complex or there is a bug thinks get very slow (or even stop).

Regards, Manfred

BTW: "Release notes for Service Pack 2 will be released on Friday, October
24, 2003." (seen on
https://www.altera.com/support/software/download/service_packs/quartus/dnl-
qii30sp2.jsp the 17th november)

======= Service Request Detail (reordered for your convenience)
Request #: 10363308 Status: Closed Date Opened (PDT): 8/19/03 9:03 AM
Date Closed (PDT): 9/4/03 6:52 PM Inquiry Type: Product Question

Device Family: CYCLONE Device:
Title: FIFO implementation size

Description: I have created a 1300word by 8bit FIFO (sfifo). The
implementation of this needs 16384 memory bits. Why?

The FIFO-size should result in about 1300x8=10400 memory bits. As the
blocksize of the embedded ram in Cyclone is 4096bits which can be organized
512x8 I expect Quartus to use three M4K's resulting in 4096*3=12288bits.
Obviously it uses a fourth block, why?

Regards, Manfred
------ 8/19/03 3:17 PM
To Customer
Hello Manfred,

This is to let you know that I am currently looking into this. I will let
you know as soon as I am able to verify the problem as you have described
and come into a resolution.

------ 8/19/03 4:20 PM
To Customer
Hello Manfred,

Since 1300 is larger than 1k, it'll use 2kx2 mode for best performance. To
get the x8 mode you'll need 4 M4Ks. Click custom on (page 6 out of 8 of
the megawizard), then you get an option to set Maximum depth option and if
you set 512 then it'll use that mode and should only need 3 M4Ks.

For more information on this, you may refer to the following link:

http://www.altera.com/support/kdb/rd03292002_9305.html

------ 8/20/03 12:36 AM
From Customer
Hello Marlon,

thanks for your quick and helpful reply. Now the behaviour of Quartus is
clear to me.
Unfortunately setting the parameter max. block depth to 512 in the
Megawizard Plug-In Manager as you proposed does not result in a smaller
memory consumption. I have attached the packed project for your
convenience.
Setting this parameter adds the following line in the scfifo instantiation
code: maximum_depth => 512,
however this parameter is not described in the Quartus II help page for the
scfifo-Megafunction. Why?

Regards, Manfred

------ 8/20/03 9:47 AM
To Customer
Hello Manfred,

The MAXIMUM_DEPTH parameter is an internal parameter so there won't be any
information on this in the Quartus II Help or Megawizard.

------ 8/20/03 11:26 PM
From Customer
Hello Marlon,

again: Unfortunately setting the parameter max. block depth to 512 in the
Megawizard Plug-In Manager as you proposed does NOT result in a smaller
memory consumption. Why? Please check with the attached project file.

Regards, Manfred
------ 8/21/03 5:08 PM
To Customer
Hello Manfred,

Sorry for the inconvenience, but actually, in order to get the x8 mode
you'll need 4 M4Ks.

------ 8/21/03 11:49 PM
From Customer
Hello Marlon,

could you please specify why it is not possible to implement a 1300x8 FIFO
in 3 M4K Blocks as this information is the opposite of both your first
advice and the mentioned support database page
(http://www.altera.com/support/kdb/rd03292002_9305.html).
What exactly is the parameter maximal block depth for then?

Regards, Manfred

------ 8/25/03 6:50 PM
To Customer
Hello Manfred,

This is to let you know that a software problem request has been filed in
order to reflect this issue. I will let you know as soon the software
group gets back to me with any infomation or when a resolution is made.

Subroto Datta · Nov 17, 2003

petersommerfeld@hotmail.com (Peter Sommerfeld) wrote in message news:<5c4d983.0311170541.5bd0c1db@posting.google.com>...

What does this generic means?

I am wondering if I am missing out on a possible memory optimization.

Altera's docs are decidedly vague and a search on their website brings up nothing.

-- Pete

MAXIMUM_DEPTH controls the underlying RAM block size that will be used
to construct the user's altsyncram memory. By default, the altsyncram
megafunction will round up the memory depth to the next power-of-2,
and use that as a RAM block size. For example, if you ask for a
3K-word memory, altsyncram will normally construct it from 4K RAM
blocks, because this gives the best performance. If you are running
short of RAM blocks, you could specify MAXIMUM_DEPTH=1024 for this
example, and the altsyncram megafunction will construct the 3K memory
from 1K-word RAM blocks, which might potentially use 1/4 fewer RAM
blocks. The penalty for doing this is that the 3K-word memory
constructed from 1K-word RAM blocks will need LEs to mux and de-mux
the data, and will also run slower as a result.

In summary, MAXIMUM_DEPTH is a control to increase memory efficiency
for non-power-of-2 memory depths, but at a cost of lower memory
performance, and a few LEs to stitch the smaller RAM blocks together.
MAXIMUM_DEPTH can only take power-of-2 values, with 32 being the
smallest meaningful value, since it corresponds to the shallowest M512
memory block configuration.

- Subroto Datta
Altera Corp.

Peter Sommerfeld · Nov 18, 2003

Hi Manfred, Subroto:

Thank you very much for your in-depth replies. I'm happy to see that
MAXIMUM_DEPTH does what I was hoping it does, because I need many RAMs
at non-power-of-2 bits storage, and I'm feeling a little too lazy to
write my own muxing logic.

Manfred, I compiled a design that had one depth-first and one
width-first RAM block, each being 1,089 x 32 bits. The depth-first
used 16 M4k's as 4096x2, and the width-first used 9 M4k's as 128x32,
so the functionality appears to be working for me. Perhaps certain
memory configuration work properly with MAXIMUM_DEPTH, while others
(ie. yours) do not?

As expected the critical path was in the width-first logic, but was
still 220 MHz+.

I am using Quartus II 3.0 SP2. I found the release notes at
http://www.altera.com/literature/rn/rn_qts.pdf.

Thanks again,

-- Pete

Manfred Mücke <manfred.getmuecke@ridgmxof.thisat> wrote in message news:<oprysn2vdygdoir8@news.inode.at>...

Hi Peter,

I am wondering if I am missing out on a possible memory optimization.
yes you do.

Quartus allocates memory by depth first, 512x8bit therefore uses two M4Ks
in 512x4 mode. If your memory width and depth is a power of two, allocation
order doesn't matter except for some speed details. But a 700x8bit memory
is much better allocated by width than by depth (because only 3 M4Ks are
needed for the first compared to 4 for the latter). (see
http://www.altera.com/support/kdb/rd03292002_9305.html for further details)
MAXIMUM_DEPTH should help you to force Quartus not to waste this addtional
memory block.

Unfortunately it doesn't work. Not even the way Altera thinks it should
work. I had a long (and somewhat bizarre) service request the last entry
being the following one:

-- Altera wrote
This is to let you know that a software problem request has been filed in
order to reflect this issue. I will let you know as soon the software
group gets back to me with any infomation or when a resolution is made.
-- Altera wrote much more, but [snip]

This was written the 25th of august and the service request was closed
without further comment. I have posted an additional request asking for the
actual state of the problem request about one month ago and did not receive
any answer. Either Altera doesn't care or they don't want to state that
this is an issue at present before they are able to ship the new Quartus
4.0 (hopefully fixing this and a lot of other things) - who knows?
If anyone in the group thinks he can help on this topic or has further
details I would be thankful to hear about it as Quartus wastes a lot of my
memory and this has to change!

I have to say that life with Altera mySupport is very ambiguous to me.
Answers are generally quick and friendly (which is already a lot) but
generally only helpful when problems are simple. Whenever the problem gets
more complex or there is a bug thinks get very slow (or even stop).

Regards, Manfred

BTW: "Release notes for Service Pack 2 will be released on Friday, October
24, 2003." (seen on
https://www.altera.com/support/software/download/service_packs/quartus/dnl-
qii30sp2.jsp the 17th november)

======= Service Request Detail (reordered for your convenience)
Request #: 10363308 Status: Closed Date Opened (PDT): 8/19/03 9:03 AM
Date Closed (PDT): 9/4/03 6:52 PM Inquiry Type: Product Question

Device Family: CYCLONE Device:
Title: FIFO implementation size

Description: I have created a 1300word by 8bit FIFO (sfifo). The
implementation of this needs 16384 memory bits. Why?

The FIFO-size should result in about 1300x8=10400 memory bits. As the
blocksize of the embedded ram in Cyclone is 4096bits which can be organized
512x8 I expect Quartus to use three M4K's resulting in 4096*3=12288bits.
Obviously it uses a fourth block, why?

Regards, Manfred
------ 8/19/03 3:17 PM
To Customer
Hello Manfred,

This is to let you know that I am currently looking into this. I will let
you know as soon as I am able to verify the problem as you have described
and come into a resolution.

------ 8/19/03 4:20 PM
To Customer
Hello Manfred,

Since 1300 is larger than 1k, it'll use 2kx2 mode for best performance. To
get the x8 mode you'll need 4 M4Ks. Click custom on (page 6 out of 8 of
the megawizard), then you get an option to set Maximum depth option and if
you set 512 then it'll use that mode and should only need 3 M4Ks.

For more information on this, you may refer to the following link:

http://www.altera.com/support/kdb/rd03292002_9305.html

------ 8/20/03 12:36 AM
From Customer
Hello Marlon,

thanks for your quick and helpful reply. Now the behaviour of Quartus is
clear to me.
Unfortunately setting the parameter max. block depth to 512 in the
Megawizard Plug-In Manager as you proposed does not result in a smaller
memory consumption. I have attached the packed project for your
convenience.
Setting this parameter adds the following line in the scfifo instantiation
code: maximum_depth => 512,
however this parameter is not described in the Quartus II help page for the
scfifo-Megafunction. Why?

Regards, Manfred

------ 8/20/03 9:47 AM
To Customer
Hello Manfred,

The MAXIMUM_DEPTH parameter is an internal parameter so there won't be any
information on this in the Quartus II Help or Megawizard.

------ 8/20/03 11:26 PM
From Customer
Hello Marlon,

again: Unfortunately setting the parameter max. block depth to 512 in the
Megawizard Plug-In Manager as you proposed does NOT result in a smaller
memory consumption. Why? Please check with the attached project file.

Regards, Manfred
------ 8/21/03 5:08 PM
To Customer
Hello Manfred,

Sorry for the inconvenience, but actually, in order to get the x8 mode
you'll need 4 M4Ks.

------ 8/21/03 11:49 PM
From Customer
Hello Marlon,

could you please specify why it is not possible to implement a 1300x8 FIFO
in 3 M4K Blocks as this information is the opposite of both your first
advice and the mentioned support database page
(http://www.altera.com/support/kdb/rd03292002_9305.html).
What exactly is the parameter maximal block depth for then?

Regards, Manfred

------ 8/25/03 6:50 PM
To Customer
Hello Manfred,

This is to let you know that a software problem request has been filed in
order to reflect this issue. I will let you know as soon the software
group gets back to me with any infomation or when a resolution is made.

Subroto Datta · Nov 19, 2003

sdatta@altera.com (Subroto Datta) wrote in message news:<ca4d800d.0311171211.14b76e97@posting.google.com>...

petersommerfeld@hotmail.com (Peter Sommerfeld) wrote in message news:<5c4d983.0311170541.5bd0c1db@posting.google.com>...
What does this generic means?

I am wondering if I am missing out on a possible memory optimization.

Altera's docs are decidedly vague and a search on their website brings up nothing.

-- Pete

MAXIMUM_DEPTH controls the underlying RAM block size that will be used
to construct the user's altsyncram memory. By default, the altsyncram
megafunction will round up the memory depth to the next power-of-2,
and use that as a RAM block size. For example, if you ask for a
3K-word memory, altsyncram will normally construct it from 4K RAM
blocks, because this gives the best performance. If you are running
short of RAM blocks, you could specify MAXIMUM_DEPTH=1024 for this
example, and the altsyncram megafunction will construct the 3K memory
from 1K-word RAM blocks, which might potentially use 1/4 fewer RAM
blocks. The penalty for doing this is that the 3K-word memory
constructed from 1K-word RAM blocks will need LEs to mux and de-mux
the data, and will also run slower as a result.

In summary, MAXIMUM_DEPTH is a control to increase memory efficiency
for non-power-of-2 memory depths, but at a cost of lower memory
performance, and a few LEs to stitch the smaller RAM blocks together.
MAXIMUM_DEPTH can only take power-of-2 values, with 32 being the
smallest meaningful value, since it corresponds to the shallowest M512
memory block configuration.

- Subroto Datta
Altera Corp.

Hi Manfred, Peter,

The MAXIMUM_DEPTH description that was posted in my previous reply
applies to the altsyncram megafunction, and indirectly to scfifo and
dcfifo megafunctions. The FIFO megafunctions do not support
non-power-of-2 depths, so the memory example I gave does not apply.
In Quartus II 4.0, the FIFO MegaWizard plug-in will not allow you to
enter non-power-of-2 depths.

The only reason for specifying a MAXIMUM_DEPTH parameter in a FIFO
megafunction in pre-4.0 versions of Quartus would be to enforce a
smaller RAM block size to give added freedom to the fitter.
MAXIMUM_DEPTH values of 128, 256, and 512 can fit in either M512
blocks or M4K blocks. A MAXIMUM_DEPTH value of 4096 can fit in either
an M4K block or an M-RAM.

Here's an example: I have a 2K word FIFO, and I don't care if it goes
into M4K blocks or M512 blocks. If I set MAXIMUM_DEPTH=512, the FIFO
will be constructed from 512-word RAM slices, which gives the fitter
the flexibility to place the FIFOs in either M512 blocks or M4K
blocks.

- Subroto Datta
Altera Corp.

Manfred Mücke · Nov 20, 2003

Hi Subroto,

The FIFO megafunctions do not support non-power-of-2 depths, so the
memory example I gave does not apply.

This is a very clean answer to a very long service request issue, it would
have saved me a lot of time getting the very same answer from Altera
mySupport. Instead they left me with a dangling service request and the
information that there is a potential bug in Quartus. Do you have the
possibility to look into that, or to share your knowledge with your support
team? I would appreciate getting an official answer from mySuport, really
closing my service request.
BTW: Why do you restrict FIFO depths to powers of two? That would allow
trading memory usage versus implementation speed (like with altsyncram).

Regards, Manfred

Mike Treseler · Nov 20, 2003

Manfred Mücke wrote:

BTW: Why do you restrict FIFO depths to powers of two? That would allow
trading memory usage versus implementation speed (like with altsyncram).

Probably because FIFO storage is based on a ram,
and ram comes in increments of one address bit.

As Subroto said, the extra space from altsyncram MAXIMUM_DEPTH
to the top could not be used as RAM in any case.

-- Mike Treseler

Manfred Mücke · Nov 21, 2003

BTW: Why do you restrict FIFO depths to powers of two? That would allow
trading memory usage versus implementation speed (like with altsyncram).

Probably because FIFO storage is based on a ram,
and ram comes in increments of one address bit.

True as long as the size of the memory/FIFO is smaller than the memory
blocks available in the device. A Cyclone for example uses M4K memory
blocks with 4096bit each (as the name suggests). So for RAM/FIFOs <4096bit
you will always pay with a full M4K (as long as tey are implemented in
memory blocks), but for RAM/FIFOs >4096bits the M4K-block is the smallest
building unit, allowing you to implement a RAM/FIFO using 3*4096=12288bits
from 3 M4K-blocks (depending on the FIFO width). Because address decoding
is easier when aligning by depth an to improve speed, it can make sense to
use more (four in our example) M4K-blocks wasting some memory, but it is by
no ways a necessity.
This is a limitation which does not apply to RAM but only to FIFOs and will
be introduced in Quartus 4.0 as Subroto said. However RAM and FIFOs are
both implemented in the very same memory blocks so it's up to the
Wizard/Module Designer to allow or restrict the depth. It is a choice to
restrict FIFO depths to powers of two but as long as there is no special
FIFO-RAM block no must. My question was why this limitation which restricts
potential savings on memory bit consumption will be introduced.

Regards, Manfred

H. Peter Anvin · Nov 23, 2003

Followup to: <opryznd7oggdoir8@news.inode.at>
By author: =?iso-8859-15?Q?Manfred_M=FCcke?= <manfred.getmuecke@ridgmxof.thisat>
In newsgroup: comp.arch.fpga

True as long as the size of the memory/FIFO is smaller than the memory
blocks available in the device. A Cyclone for example uses M4K memory
blocks with 4096bit each (as the name suggests). So for RAM/FIFOs <4096bit
you will always pay with a full M4K (as long as tey are implemented in
memory blocks), but for RAM/FIFOs >4096bits the M4K-block is the smallest
building unit, allowing you to implement a RAM/FIFO using 3*4096=12288bits
from 3 M4K-blocks (depending on the FIFO width). Because address decoding
is easier when aligning by depth an to improve speed, it can make sense to
use more (four in our example) M4K-blocks wasting some memory, but it is by
no ways a necessity.

There is another issue, which is that the RAMs are actually 4608 bits,
not 4096. I have seen Quartus refuse to use those extra bits in
situations where it could have, because it prefers to organize by
depth, and apparently no way to work around this.

I would really like to see:

(a) support of non-power-of-two memory sizes;
(b) ability to optimize for RAM consumption at the expense of timing.

This in particular was an issue when I tried to create a 16384 x 9 bit
ROM, and yes, I needed all 9 bits...

-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

Manfred Mücke · Dec 6, 2003

Hi Subroto,

I would like to renew my question: Why do you restrict FIFO depths to
powers of two? I can't see the need for that.

Regards, Manfred

Subroto Datta · Dec 7, 2003

Hi Manfred,
There is no real need to restrict it. There have been several requests
to relax this condition and we will get to it in a future release.

- Subroto Datta
Altera Corp.

"Manfred Mücke" <manfred.getmuecke@ridgmxof.thisat> wrote in message
news

przq27uthgdoir8@news.inode.at...

Hi Subroto,

I would like to renew my question: Why do you restrict FIFO depths to
powers of two? I can't see the need for that.

Regards, Manfred

Ben Twijnstra · Dec 9, 2003

Hi Manfred,

I would like to renew my question: Why do you restrict FIFO depths to
powers of two? I can't see the need for that.

The dual-clock FIFO internally uses a Gray counter, which is fairly trivial
to write for a power of two, plus the fact that counter rollover happens
with a single-bit transition as well.

The Gray counter greatly reduces the risk of the Other Side (the one in in
the different clock domain) seeing inaccurate counter values: the count is
either the same, or only one bit has changed. For a normal counter, due to
variations in the delay path between the various counter bits, part of the
logic in the other clock domain might see a number of counter bits still
having the old value, and a number that has the new value, resulting in a
nonsense value. When there's a large difference between reader and writer
clock frequencies, there may be not a single bit transition, but at least
the number of transitions is minimized over time.

I haven't studied Gray counters deeply enough to see whether it's feasible,
or even possible to write a Gray counter generator algorithm that can
_efficiently_ do single-bit-transition counter rollover on an arbitrary
(though pre-computed) value. If this is possible without going into long
combinatorial chains (which would reduce operating frequency) it should
definitely be feasible to remove this power-of-two restriction.

For the single-clock version - hey, why not?

Just my $.02

Ben Twijnstra

Altera's altsyncram MAXIMUM_DEPTH

Peter Sommerfeld

Guest

Manfred Mücke

Guest

Subroto Datta

Guest

Peter Sommerfeld

Guest

Subroto Datta

Guest

Manfred Mücke

Guest

Mike Treseler

Guest

Manfred Mücke

Guest

H. Peter Anvin

Guest

Manfred Mücke

Guest

Subroto Datta

Guest

Ben Twijnstra

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Altera's altsyncram MAXIMUM_DEPTH

Peter Sommerfeld

Guest

Manfred Mücke

Guest

Subroto Datta

Guest

Peter Sommerfeld

Guest

Subroto Datta

Guest

Manfred Mücke

Guest

Mike Treseler

Guest

Manfred Mücke

Guest

H. Peter Anvin

Guest

Manfred Mücke

Guest

Subroto Datta

Guest

Ben Twijnstra

Guest

Log in

Welcome to EDABoard.com

Sponsor