RAM in Altera EABs and Xilinx Block Rams

rickman · Jun 12, 2004

I am using RAM in a processor design and I am having trouble
understanding exactly how best to use these functions for my design. I
will be using them to implement stacks, program memory and data memory.
Ideally the write function will look like an addressable register where
the address, data and enables are setup prior to the clock and the write
happens on the clock edge. The read should be async so that I can
provide an address and get data after a delay.

The Altera part is an EP1K50 where the EAB read can be async. The write
however is only shown as either fully async or fully registered. I
recall that I was warned when reading and writing the same address the
data out has a longer delay. But I can't seem to find a reference to
that. I am also unclear if I can use the write the way I want or if it
requires input registers.

The Xilinx part is an XC3S400 with dual port block rams. It seems like
the read path must be registered as well as the write path. I think I
could live with that if I could read the data that is being written (top
of stack) in the same clock cycle. But I belive the docs say that the
other port can either read the old data or is invalid. But then I may
be able to use a single port ram for a stack. The address would always
be pointing to the current TOS and as soon as a new value were pushed,
the next clock edge would read the new data as it is written to the new
address.

I don't want to pipeline anything in this design to keep it very
simple. Right now the design is pretty clean and the delay paths are
pretty short.

Can anyone clarify how these rams work without pipelining?

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Rajeev · Jun 14, 2004

rickman <spamgoeshere4@yahoo.com> wrote in message news:<40CB4C4A.9FB9CD47@yahoo.com>...

I am using RAM in a processor design and I am having trouble
understanding exactly how best to use these functions for my design. I

Rick,

I wish I had something more constructive to offer... I have a Stratix
design and I use read latency of 2 cycles everywhere (one for address in,
one for data out.) While one can eliminate the data output register it
adds enough ns that it's just not worth it.

I can't help noticing the (huge?) disparity between the 1K50 and the
3S400, and am surprised that you're still using the ACEX parts. In that
vein, I'm carrying around the notion that _all_ newer FPGAs are or will
require registered ports... so why not bite the bullet and go synchronous ?

<snip>

I don't want to pipeline anything in this design to keep it very
simple. Right now the design is pretty clean and the delay paths are
pretty short.

I'm also not sure from your post whether "pipelined" is synonymous with
"registered", ie you're trying to do something like one instruction per
clock cycle and/or you can't tolerate the 2 ticks latency.

Also, what's you're desired clock speed ?

Regards,
-rajeev-

roller · Jun 14, 2004

"rickman" <spamgoeshere4@yahoo.com> escribió en el mensaje
news:40CB4C4A.9FB9CD47@yahoo.com...

I am using RAM in a processor design and I am having trouble
understanding exactly how best to use these functions for my design. I
will be using them to implement stacks, program memory and data memory.
Ideally the write function will look like an addressable register where
the address, data and enables are setup prior to the clock and the write
happens on the clock edge. The read should be async so that I can
provide an address and get data after a delay.

The Altera part is an EP1K50 where the EAB read can be async. The write
however is only shown as either fully async or fully registered. I
recall that I was warned when reading and writing the same address the
data out has a longer delay. But I can't seem to find a reference to
that. I am also unclear if I can use the write the way I want or if it
requires input registers.

The Xilinx part is an XC3S400 with dual port block rams. It seems like
the read path must be registered as well as the write path. I think I
could live with that if I could read the data that is being written (top
of stack) in the same clock cycle. But I belive the docs say that the
other port can either read the old data or is invalid. But then I may
be able to use a single port ram for a stack. The address would always
be pointing to the current TOS and as soon as a new value were pushed,
the next clock edge would read the new data as it is written to the new
address.

i dont know exactly how the spartan3 is related to the spartan2, but it
might help you, check this out

http://toolbox.xilinx.com/docsan/xilinx4/data/docs/lib/dsgnelpr5.html

it says that when you write data, one of the ports reads what you're
writting. From Coregen options i'd guess that you can also set it up as
read-after-write (this one) or write-after-read (which would read the
previous contents, and then write)

I don't want to pipeline anything in this design to keep it very
simple. Right now the design is pretty clean and the delay paths are
pretty short.

Can anyone clarify how these rams work without pipelining?

Coregen ask you about that too, but the link i gave you dont mention
anything. Though, if i recall correctly, i also read (somewhere in xilinx
site) that the latency is dependant on the size of the RAM, bigger gets 2
cycles latency, but smaller can get 1 cycle i think. (sorry i dont have a
link)

Peter Alfke · Jun 14, 2004

Xilinx (Virtex2 or Spartan3) BlockRAM reading while writing:
Any write operation also performs a read, and outputs it on the Do output.
The user can choose: write before read (= output the data that is being
witten), or read before write (=output the previous content that is now
being overwritten) or "no change"( keep the old data on the Do lines.

Peter Alfke

>

Symon · Jun 14, 2004

Hi Rick,
I can offer my experiences with Xilinx blockram. You're correct that both
the read and write are synchronous. There are three write options,
WRITE_FIRST, READ_FIRST and NO_CHANGE. Carefully (!) read about these in the
data sheet. I use WRITE_FIRST almost exclusively, where the "same clock edge
that writes the data input (DI) into the memory also transfers DI into the
output registers DO".
When I did my processor design, I also used one as a stack. Like your design
I didn't use pipelining. This was to keep the design small and simple. On
the BlockRAM I used one port for PUSHING/POPPING registers, and the other
for CALL/RETURN subroutine addresses. The catch with these blockrams is
that, if you read from one port whilst you're writing to the *same* address
on the other port, the read data is indeterminate. This makes sense if you
think about what the BlockRAM is doing. Check out 'Conflict Resolution' in
the user guide (I'm looking at ug012 for V2PRO). This means for me that I
can't do a POP instruction immediately after doing a CALL subroutine, and I
can't do a RETURN immediately after doing a PUSH. No problem to avoid this
in the code, of course. It's a wierd thing to do anyway.
The ModelSIM simulator also warns if conflicts occur and, of course,
simulates the RAM accurately.
Good luck!
Cheers, Syms.

Peter Alfke · Jun 14, 2004

Here is the official Xilinx text (I just rewrote this for the new User
Guide).
Conflict Avoidance.
Virtex-2 BlockRAM is a true dual-port RAM where both ports can access any
memory location at any time. When accessing the SAME MEMORY LOCATION from
both ports, the user must, however, observe certain restrictions, specified
by the clock-to-clock set-up time window.See the following:

There are two fundamentally different situations:
The two ports either have a common clock ("Synchronous Clocking"), or the
clock frequency or phase is different for the two ports ("Asynchronous
Clocking").

Asynchronous Clocking is the more general case, where the active edges of
both clocks do not occur simultaneously:
There are no timing constraints when both ports perform a read operation on
the same location.
When one port performs a write operation, the other port must not read- or
write-access the same memory location by using a clock edge that falls
within the specified forbidden clock-to-clock set-up time window. (If this
restriction is ignored, a read operation might read unreliable data, perhaps
a mixture of old and new data in this location; a write operation might
result in wrong data stored in this location. There is, however, no risk of
physical damage to the device.)

Synchronous Clocking is the special case, where the active edges of both
port clocks occur simultaneously:
There are no timing constraints when both ports perform a read operation.
When one port performs a write operation, the other port must not write into
the same location, unless both ports write identical data.
When one port performs a write operation, the other port can reliably read
data from the same location if the write port is in READ_FIRST mode.
DATA_OUT will then reflect the previously stored data.

If the write port is in either WRITE_FIRST or in NO_CHANGE mode, then the
DATA-OUT on the read port would become invalid (unreliable). Obviously, the
read-port's mode setting does not affect this.

June 2004 Peter Alfke ( this text has not yet been posted on xilinx.com)

>

roller · Jun 14, 2004

"rickman" <spamgoeshere4@yahoo.com> escribió en el mensaje
news:40CB4C4A.9FB9CD47@yahoo.com...

I am using RAM in a processor design and I am having trouble
understanding exactly how best to use these functions for my design. I
will be using them to implement stacks, program memory and data memory.
Ideally the write function will look like an addressable register where
the address, data and enables are setup prior to the clock and the write
happens on the clock edge. The read should be async so that I can
provide an address and get data after a delay.

The Altera part is an EP1K50 where the EAB read can be async. The write
however is only shown as either fully async or fully registered. I
recall that I was warned when reading and writing the same address the
data out has a longer delay. But I can't seem to find a reference to
that. I am also unclear if I can use the write the way I want or if it
requires input registers.

The Xilinx part is an XC3S400 with dual port block rams. It seems like
the read path must be registered as well as the write path. I think I
could live with that if I could read the data that is being written (top
of stack) in the same clock cycle. But I belive the docs say that the
other port can either read the old data or is invalid. But then I may
be able to use a single port ram for a stack. The address would always
be pointing to the current TOS and as soon as a new value were pushed,
the next clock edge would read the new data as it is written to the new
address.

i dont know exactly how the spartan3 is related to the spartan2, but it
might help you, check this out

http://toolbox.xilinx.com/docsan/xilinx4/data/docs/lib/dsgnelpr5.html

it says that when you write data, one of the ports reads what you're
writting. From Coregen options i'd guess that you can also set it up as
read-after-write (this one) or write-after-read (which would read the
previous contents, and then write)

I don't want to pipeline anything in this design to keep it very
simple. Right now the design is pretty clean and the delay paths are
pretty short.

Can anyone clarify how these rams work without pipelining?

Coregen ask you about that too, but the link i gave you dont mention
anything. Though, if i recall correctly, i also read (somewhere in xilinx
site) that the latency is dependant on the size of the RAM, bigger gets 2
cycles latency, but smaller can get 1 cycle i think. (sorry i dont have a
link)

John_H · Jun 14, 2004

Quoting Peter's text from below, "When one port performs a write operation,
the other port must not write into the same location, unless both ports
write identical data."

For a one-port dedicated read and one-port dedicated write configuration
that I *believe* rickman is pursuing, a little trick could be used: feed
the data to *both* write ports and enable the write to the nomally read-only
port when a RdAddr==WrAddr compare is valid. This increases the effective
address setup time but gives the desired WRITE_FIRST functionality without
increasing the Clk-to-out time.

"Peter Alfke" <peter@xilinx.com> wrote in message
news:BCF3748F.69FD%peter@xilinx.com...

Here is the official Xilinx text (I just rewrote this for the new User
Guide).
Conflict Avoidance.
Virtex-2 BlockRAM is a true dual-port RAM where both ports can access any
memory location at any time. When accessing the SAME MEMORY LOCATION from
both ports, the user must, however, observe certain restrictions,
specified
by the clock-to-clock set-up time window.See the following:

There are two fundamentally different situations:
The two ports either have a common clock ("Synchronous Clocking"), or the
clock frequency or phase is different for the two ports ("Asynchronous
Clocking").

Asynchronous Clocking is the more general case, where the active edges of
both clocks do not occur simultaneously:
There are no timing constraints when both ports perform a read operation
on
the same location.
When one port performs a write operation, the other port must not read- or
write-access the same memory location by using a clock edge that falls
within the specified forbidden clock-to-clock set-up time window. (If this
restriction is ignored, a read operation might read unreliable data,
perhaps
a mixture of old and new data in this location; a write operation might
result in wrong data stored in this location. There is, however, no risk
of
physical damage to the device.)

Synchronous Clocking is the special case, where the active edges of both
port clocks occur simultaneously:
There are no timing constraints when both ports perform a read operation.
When one port performs a write operation, the other port must not write
into
the same location, unless both ports write identical data.
When one port performs a write operation, the other port can reliably read
data from the same location if the write port is in READ_FIRST mode.
DATA_OUT will then reflect the previously stored data.

If the write port is in either WRITE_FIRST or in NO_CHANGE mode, then the
DATA-OUT on the read port would become invalid (unreliable). Obviously,
the
read-port's mode setting does not affect this.

June 2004 Peter Alfke ( this text has not yet been posted on xilinx.com)

rickman · Jun 17, 2004

Rajeev wrote:

I wish I had something more constructive to offer... I have a Stratix
design and I use read latency of 2 cycles everywhere (one for address in,
one for data out.) While one can eliminate the data output register it
adds enough ns that it's just not worth it.

I can't help noticing the (huge?) disparity between the 1K50 and the
3S400, and am surprised that you're still using the ACEX parts. In that
vein, I'm carrying around the notion that _all_ newer FPGAs are or will
require registered ports... so why not bite the bullet and go synchronous ?

In my design it adds a clock cycle delay to have a register on the data
out side of the RAM. So that slows things down a lot. I am using the
ACEX parts because I need the 5 volt tolerance that has been left behind
by the newer parts. For that function, they work very well.

I'm also not sure from your post whether "pipelined" is synonymous with
"registered", ie you're trying to do something like one instruction per
clock cycle and/or you can't tolerate the 2 ticks latency.

Yes, if you have more than one register in the fetch-decode-execute
cycle, then more than one clock cycle is needed and if you want to start
a new instruction on every clock (as I do) it would have to be
pipelined. Non-pipelined MCUs are *much* simpler and not necessarily
slower in the time to execute any given instruction. Pipelining only
lets you add more hardware to overlap execution of multiple
instructions. You also don't have to deal with throwing away prefetches
if you don't pipeline.

After looking at the structure of the Xilinx Spartan 3 block rams, I see
that I can't escape the output register. But seeing the mode where the
read is done post-write I realized that I can add a mux and an output
register which will always reflect the top of the stack without a read
delay! I am still not certain it will work ok in the Xilinx part, but
this works great in the Altera parts and it speeds up the cycle time a
lot. I can decode and execute the current instruction and fetch the
next instruction in no more than two levels of logic and one RAM delay
per clock cycle. I expect this to run at 60 to 80 MHz without too much
trouble. If I work on optimizing the placement and routing, I might
even get 100MHz out of this.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

rickman · Jun 17, 2004

roller wrote:

i dont know exactly how the spartan3 is related to the spartan2, but it
might help you, check this out

http://toolbox.xilinx.com/docsan/xilinx4/data/docs/lib/dsgnelpr5.html

it says that when you write data, one of the ports reads what you're
writting. From Coregen options i'd guess that you can also set it up as
read-after-write (this one) or write-after-read (which would read the
previous contents, and then write)

Yes, I saw that. It gave me an idea of how I can deal with the read
delay in the Altera part. But I belive the Xilinx part still gives you
a two clock delay on reading the new data. I am using the RAM for
stacks among other things. So I can use a separate register to always
hold the top of stack. But if it pushes to the stack on one clock cycle
and on the next clock cycle pops, the data on the output of the Xilinx
RAM is still stale. I guess I can use the dual port and always have the
read one address below the write.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

rickman · Jun 17, 2004

Peter Alfke wrote:

Xilinx (Virtex2 or Spartan3) BlockRAM reading while writing:
Any write operation also performs a read, and outputs it on the Do output.
The user can choose: write before read (= output the data that is being
witten), or read before write (=output the previous content that is now
being overwritten) or "no change"( keep the old data on the Do lines.

But it still has a two cycle delay from writing to read data out,
right? So if I want the data that was just written on the next clock
cycle (like in a stack) I need to use an external register and use
separate read and write addresses. Correct?

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

rickman · Jun 17, 2004

Symon wrote:

Hi Rick,
I can offer my experiences with Xilinx blockram. You're correct that both
the read and write are synchronous. There are three write options,
WRITE_FIRST, READ_FIRST and NO_CHANGE. Carefully (!) read about these in the
data sheet. I use WRITE_FIRST almost exclusively, where the "same clock edge
that writes the data input (DI) into the memory also transfers DI into the
output registers DO".
When I did my processor design, I also used one as a stack. Like your design
I didn't use pipelining. This was to keep the design small and simple. On
the BlockRAM I used one port for PUSHING/POPPING registers, and the other
for CALL/RETURN subroutine addresses. The catch with these blockrams is
that, if you read from one port whilst you're writing to the *same* address
on the other port, the read data is indeterminate. This makes sense if you
think about what the BlockRAM is doing. Check out 'Conflict Resolution' in
the user guide (I'm looking at ug012 for V2PRO). This means for me that I
can't do a POP instruction immediately after doing a CALL subroutine, and I
can't do a RETURN immediately after doing a PUSH. No problem to avoid this
in the code, of course. It's a wierd thing to do anyway.
The ModelSIM simulator also warns if conflicts occur and, of course,
simulates the RAM accurately.

Sounds like we are doing similar things. I am not trying to share one
ram for two stacks though. In the Altera part, I can have an async read
and a clocked write all with the same address (single port). So
whenever I write (push) the data is available on the read output in the
second half of the next clock cycle. To speed up the delay I am adding
a mux and a register to hold the top of stack when the stack is written
and to get the second to top on pops (new top). Since this is
registered, I don't have to worry about the cascaded delays on the
address setup and the RAM read times. On a return instruction it would
have two RAM delays (return stack and instruction memory) and some three
or four LUT delays (decode, mux).

But with the Xilinx part, the two clock cycle thing really gets in the
way of implementing one clock cycle stacks. You can't even do a push
followed by a pop which is not at all uncommon... "1 2 add"... two
pushes followed by a pop. I can do the same muxed register trick I do
with the Altera part, but I have to have two addresses and use two
ports, one for read and one for write.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

rickman · Jun 17, 2004

John_H wrote:

Quoting Peter's text from below, "When one port performs a write operation,
the other port must not write into the same location, unless both ports
write identical data."

For a one-port dedicated read and one-port dedicated write configuration
that I *believe* rickman is pursuing, a little trick could be used: feed
the data to *both* write ports and enable the write to the nomally read-only
port when a RdAddr==WrAddr compare is valid. This increases the effective
address setup time but gives the desired WRITE_FIRST functionality without
increasing the Clk-to-out time.

To implement a stack you don't normally need separate read and write
ports since you only do one thing at a time. The Xilinx block RAMs
can't do a read in less than two clock cycles which gets in the way of a
stack. So I would need to use a separate register to hold the top of
stack and refresh that on POPs from the RAM using a separate read port
with a separate address. In that case there is never the problem of
simulaneous reads and writes to the same address because you only ever
do one thing at a time.

I have not thought about my program or data memory. I may really be
hosed there and have to abandon the one clock cycle instruction idea. I
guess I could use a two up clock or something similar. I belive the
Spartan 3 block rams are fast enough that I likely won't have a speed
issue even with a 2x clock.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Peter Alfke · Jun 17, 2004

Just to clarify Rickman's "Two-clock-cycle thing":
Xilinx BlockRAMs need ONE clock to perform any operation, be it a read or a
write. As a bonus, the write operation also performs a read operation on the
same location, showing either the old or the new data (user option).
And this is all on one port. You can obviously use the other port
independently from the first.
The one thing you cannot do is an asynchronous read without a clock edge.

If anybody has any questions about Xilinx BlockRAMs, I am more than happy to
explain.
Peter Alfke, Xilinx Applications

From: rickman <spamgoeshere4@yahoo.com
Reply-To: john@bluepal.net
Newsgroups: comp.arch.fpga
Date: Thu, 17 Jun 2004 02:14:36 -0400
Subject: Re: RAM in Altera EABs and Xilinx Block Rams

John_H wrote:

Quoting Peter's text from below, "When one port performs a write operation,
the other port must not write into the same location, unless both ports
write identical data."

For a one-port dedicated read and one-port dedicated write configuration
that I *believe* rickman is pursuing, a little trick could be used: feed
the data to *both* write ports and enable the write to the nomally read-only
port when a RdAddr==WrAddr compare is valid. This increases the effective
address setup time but gives the desired WRITE_FIRST functionality without
increasing the Clk-to-out time.

To implement a stack you don't normally need separate read and write
ports since you only do one thing at a time. The Xilinx block RAMs
can't do a read in less than two clock cycles which gets in the way of a
stack. So I would need to use a separate register to hold the top of
stack and refresh that on POPs from the RAM using a separate read port
with a separate address. In that case there is never the problem of
simulaneous reads and writes to the same address because you only ever
do one thing at a time.

I have not thought about my program or data memory. I may really be
hosed there and have to abandon the one clock cycle instruction idea. I
guess I could use a two up clock or something similar. I belive the
Spartan 3 block rams are fast enough that I likely won't have a speed
issue even with a 2x clock.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

rickman · Jun 21, 2004

Peter Alfke wrote:

Just to clarify Rickman's "Two-clock-cycle thing":
Xilinx BlockRAMs need ONE clock to perform any operation, be it a read or a
write. As a bonus, the write operation also performs a read operation on the
same location, showing either the old or the new data (user option).
And this is all on one port. You can obviously use the other port
independently from the first.
The one thing you cannot do is an asynchronous read without a clock edge.

If anybody has any questions about Xilinx BlockRAMs, I am more than happy to
explain.

Perhaps I didn't understand the documentation. I think I got mixed up
in the description of the read port latches. Sometimes I forget the
distinction between latches and registers.

First, let me say that I am designing a stack using a single block ram.
My understanding is that I can use the RAM as either a single port ram
with a single address bus, a write data bus and a read data bus or a
dual port ram with two independant interfaces like the single port
interface.

Using the single port interface it appears to me that the address and
control signals are registered. Looking at the timing diagram for the
WRITE_FIRST option, I see that the data output changes with one clock
delay. So can I consider the register to be on the input side (address,
control) with the read data output using no register? I belive that
will work for a stack. When data is being pushed, the incremented
address is set up and the write is clocked in, while the data output is
steady until the clock edge (old top of stack). Following the clock
edge, the data written will be presented on the output (new top of
stack). To pop the stack, the address is decremented and a read is done
with the new data available following the clock edge (new top of
stack). A write (pop and push) is done by not changing the address and
registering a new write with the read data changing after the clock
edge.

Will the single port WRITE_FIRST ram mode work this way?

I also need program and data memories and the register delay may
interfere with full speed operation on these. I might be able to clock
the data and instruction memory from "not clock" to allow the read data
to be available during the second half of the current clock cycle. This
may result is a bit slower clock cycle, but it should be better than a
two clock cycle.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

John_H · Jun 21, 2004

"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:40D74C3A.E31766EC@yahoo.com...
[snip]

Using the single port interface it appears to me that the address and
control signals are registered. Looking at the timing diagram for the
WRITE_FIRST option, I see that the data output changes with one clock
delay. So can I consider the register to be on the input side (address,
control) with the read data output using no register? I belive that
will work for a stack. When data is being pushed, the incremented
address is set up and the write is clocked in, while the data output is
steady until the clock edge (old top of stack). Following the clock
edge, the data written will be presented on the output (new top of
stack). To pop the stack, the address is decremented and a read is done
with the new data available following the clock edge (new top of
stack). A write (pop and push) is done by not changing the address and
registering a new write with the read data changing after the clock
edge.

Will the single port WRITE_FIRST ram mode work this way?

[snip]

The "write (pop and push)" is a little confusing, you may need to elaborate
that for my own edification.

For WRITE_FIRST mode, when you push a value to the top of stack, that
value - the top of stack - will be sitting on the output after the one clock
edge, ready to be used *immediately* for a POP value in the new cycle. With
the POP command that uses the top of stack value which is waiting on the
read port, the address needs be decremented such that the *next* cycle will
have the *new* top of stack value ready for a new POP command. If you have
a PUSH before the POP, the address is incremented for the write during the
PUSH cycle such that the clock edge will have the new top of stack ready for
a next-cycle POP. It's because the WRITE_FIRST makes the most-recently
written value available on the read port that the stack can work well.

It's the address that needs to be manipulated combinatorially before the
clock edge for the PUSH or POP to have the value ready for POP access
whenever the POP comes up. The setup and routing for the address is small
enough that the combinatorial delay before the BlockRAM still gives
excellent timing.

rickman · Jun 22, 2004

John_H wrote:

The "write (pop and push)" is a little confusing, you may need to elaborate
that for my own edification.

Push - write to location at incremented stack pointer, update register
to new data.
Pop - read location at decremented stack pointer, update register to
data read.
Write - write to location at stack pointer, update register to new data.
write is used when an instruction modifies the top of stack without
popping.

For WRITE_FIRST mode, when you push a value to the top of stack, that
value - the top of stack - will be sitting on the output after the one clock
edge, ready to be used *immediately* for a POP value in the new cycle. With
the POP command that uses the top of stack value which is waiting on the
read port, the address needs be decremented such that the *next* cycle will
have the *new* top of stack value ready for a new POP command. If you have
a PUSH before the POP, the address is incremented for the write during the
PUSH cycle such that the clock edge will have the new top of stack ready for
a next-cycle POP. It's because the WRITE_FIRST makes the most-recently
written value available on the read port that the stack can work well.

It's the address that needs to be manipulated combinatorially before the
clock edge for the PUSH or POP to have the value ready for POP access
whenever the POP comes up. The setup and routing for the address is small
enough that the combinatorial delay before the BlockRAM still gives
excellent timing.

I understand about the address. I was not certain about the read
timing. The data sheet talks about output latches, but now I realize
they mean transparent latches and the registers are all on the input
side.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Peter Alfke · Jun 22, 2004

If the BlockRAM explanation in the Xilinx data book is not clear, I consider
that my problem. Let me fix this here:

The BlockRAM is a synchronous device, nothing happens without a clock edge.
Letšs look at just one port.

Read operation:
You have to apply the address and control inputs a (very short) set-up time
before the active (optional polarity) clock edge. (DI data input lines are
not used).
The active clock edge stores the information, decodes the address, reads the
data content at that location and puts it onto the DO output lines. There is
a very short set-up time, but a relatively long łclock-to-out˛ read time,
since it includes address decode and read and write strobes.

Write operation:
You have to apply the address, Data and control inputs a (very short) set-up
time before the active (optional polarity) clock edge.
The active clock edge stores the information, decodes the address, creates a
read and a write pulse, writes the DI data into the addressed location, and
also reads the data content at that location and puts it onto the DO output
lines.

The user has control of the relative timing of write and read sequence.
Either WRITE_FIRST łwrite before read˛, forcing the written data onto the DO
outputs (of marginal interest)
Or READ_FIRST łread before write˛ , forcing the łold˛ data onto the DO
outputs and keeping them there until the next operation.
Or NO_CHANGE, donšt change the D0 output, causing it to maintain its data
until the next read operation.
These options are new to Virtex-II (and Spartan3). Virtex and Spartan2
always did write before read.

Dual-Port operation:
The two ports are independent, except for special rules of validity when
one port writes into a location that the other port is reading from (I
posted the gory details a while ago).

In your case, you perform a synchronous write to the Top-of-Stack address,
while (for free) simultaneously also reading this new data on DO.
You then can pop the stack synchronously with the decremented address.

I hope this clarifies things.

Peter Alfke

From: rickman <spamgoeshere4@yahoo.com
Reply-To: john@bluepal.net
Newsgroups: comp.arch.fpga
Date: Mon, 21 Jun 2004 16:59:38 -0400
Subject: Re: RAM in Altera EABs and Xilinx Block Rams

Peter Alfke wrote:

Just to clarify Rickman's "Two-clock-cycle thing":
Xilinx BlockRAMs need ONE clock to perform any operation, be it a read or a
write. As a bonus, the write operation also performs a read operation on the
same location, showing either the old or the new data (user option).
And this is all on one port. You can obviously use the other port
independently from the first.
The one thing you cannot do is an asynchronous read without a clock edge.

If anybody has any questions about Xilinx BlockRAMs, I am more than happy to
explain.

Perhaps I didn't understand the documentation. I think I got mixed up
in the description of the read port latches. Sometimes I forget the
distinction between latches and registers.

First, let me say that I am designing a stack using a single block ram.
My understanding is that I can use the RAM as either a single port ram
with a single address bus, a write data bus and a read data bus or a
dual port ram with two independant interfaces like the single port
interface.

Using the single port interface it appears to me that the address and
control signals are registered. Looking at the timing diagram for the
WRITE_FIRST option, I see that the data output changes with one clock
delay. So can I consider the register to be on the input side (address,
control) with the read data output using no register? I belive that
will work for a stack. When data is being pushed, the incremented
address is set up and the write is clocked in, while the data output is
steady until the clock edge (old top of stack). Following the clock
edge, the data written will be presented on the output (new top of
stack). To pop the stack, the address is decremented and a read is done
with the new data available following the clock edge (new top of
stack). A write (pop and push) is done by not changing the address and
registering a new write with the read data changing after the clock
edge.

Will the single port WRITE_FIRST ram mode work this way?

I also need program and data memories and the register delay may
interfere with full speed operation on these. I might be able to clock
the data and instruction memory from "not clock" to allow the read data
to be available during the second half of the current clock cycle. This
may result is a bit slower clock cycle, but it should be better than a
two clock cycle.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

rickman · Jun 22, 2004

I like to see diagrams of the functional elements to show how circuits
work... "a picture is worth a thousand words"...

The app note is more clear now that I see my mistake. But a block
diagram showing the input registers and the output *latch* might help to
make the circuit operation more clear. I don't recall if there is also
an optional output register, if so, that should be added to the
illustration as well. I seem to recall that the operation of the CLB
RAM in the 4000E series was illustrated very well in this regards. It
showed all the possible modes via registers, muxes and the write pulse
generator. Something like that would be useful if added to Xapp 463.

Peter Alfke wrote:

If the BlockRAM explanation in the Xilinx data book is not clear, I consider
that my problem. Let me fix this here:

The BlockRAM is a synchronous device, nothing happens without a clock edge.
Letšs look at just one port.

Read operation:
You have to apply the address and control inputs a (very short) set-up time
before the active (optional polarity) clock edge. (DI data input lines are
not used).
The active clock edge stores the information, decodes the address, reads the
data content at that location and puts it onto the DO output lines. There is
a very short set-up time, but a relatively long łclock-to-out˛ read time,
since it includes address decode and read and write strobes.

Write operation:
You have to apply the address, Data and control inputs a (very short) set-up
time before the active (optional polarity) clock edge.
The active clock edge stores the information, decodes the address, creates a
read and a write pulse, writes the DI data into the addressed location, and
also reads the data content at that location and puts it onto the DO output
lines.

The user has control of the relative timing of write and read sequence.
Either WRITE_FIRST łwrite before read˛, forcing the written data onto the DO
outputs (of marginal interest)
Or READ_FIRST łread before write˛ , forcing the łold˛ data onto the DO
outputs and keeping them there until the next operation.
Or NO_CHANGE, donšt change the D0 output, causing it to maintain its data
until the next read operation.
These options are new to Virtex-II (and Spartan3). Virtex and Spartan2
always did write before read.

Dual-Port operation:
The two ports are independent, except for special rules of validity when
one port writes into a location that the other port is reading from (I
posted the gory details a while ago).

In your case, you perform a synchronous write to the Top-of-Stack address,
while (for free) simultaneously also reading this new data on DO.
You then can pop the stack synchronously with the decremented address.

I hope this clarifies things.

Peter Alfke

From: rickman <spamgoeshere4@yahoo.com
Reply-To: john@bluepal.net
Newsgroups: comp.arch.fpga
Date: Mon, 21 Jun 2004 16:59:38 -0400
Subject: Re: RAM in Altera EABs and Xilinx Block Rams

Peter Alfke wrote:

Just to clarify Rickman's "Two-clock-cycle thing":
Xilinx BlockRAMs need ONE clock to perform any operation, be it a read or a
write. As a bonus, the write operation also performs a read operation on the
same location, showing either the old or the new data (user option).
And this is all on one port. You can obviously use the other port
independently from the first.
The one thing you cannot do is an asynchronous read without a clock edge.

If anybody has any questions about Xilinx BlockRAMs, I am more than happy to
explain.

Perhaps I didn't understand the documentation. I think I got mixed up
in the description of the read port latches. Sometimes I forget the
distinction between latches and registers.

First, let me say that I am designing a stack using a single block ram.
My understanding is that I can use the RAM as either a single port ram
with a single address bus, a write data bus and a read data bus or a
dual port ram with two independant interfaces like the single port
interface.

Using the single port interface it appears to me that the address and
control signals are registered. Looking at the timing diagram for the
WRITE_FIRST option, I see that the data output changes with one clock
delay. So can I consider the register to be on the input side (address,
control) with the read data output using no register? I belive that
will work for a stack. When data is being pushed, the incremented
address is set up and the write is clocked in, while the data output is
steady until the clock edge (old top of stack). Following the clock
edge, the data written will be presented on the output (new top of
stack). To pop the stack, the address is decremented and a read is done
with the new data available following the clock edge (new top of
stack). A write (pop and push) is done by not changing the address and
registering a new write with the read data changing after the clock
edge.

Will the single port WRITE_FIRST ram mode work this way?

I also need program and data memories and the register delay may
interfere with full speed operation on these. I might be able to clock
the data and instruction memory from "not clock" to allow the read data
to be available during the second half of the current clock cycle. This
may result is a bit slower clock cycle, but it should be better than a
two clock cycle.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Symon · Jun 22, 2004

Rick,
Try this instead. (The POPs are different) Put the BRAM in WRITE_FIRST mode.

PUSH - Write to location at incremented stack pointer, new output is new
data.
POP - Read output data, decrement stack pointer so new output is new top
of stack
WRITE - Write new data to top of stack, read old top of stack.

Sounds ideal for a Xilinx BRAM to me, all happens on a single clock edge.
The BRAM always presents the top of stack at its output so it's available
right away. Anyway, that's what I did..

cheers, Syms.

"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:40D79D6D.9ACFF848@yahoo.com...

Push - write to location at incremented stack pointer, update register
to new data.
Pop - read location at decremented stack pointer, update register to
data read.
Write - write to location at stack pointer, update register to new data.
write is used when an instruction modifies the top of stack without
popping.

RAM in Altera EABs and Xilinx Block Rams

rickman

Guest

Rajeev

Guest

roller

Guest

Peter Alfke

Guest

Symon

Guest

Peter Alfke

Guest

roller

Guest

John_H

Guest

rickman

Guest

rickman

Guest

rickman

Guest

rickman

Guest

rickman

Guest

Peter Alfke

Guest

rickman

Guest

John_H

Guest

rickman

Guest

Peter Alfke

Guest

rickman

Guest

Symon

Guest

Log in

Welcome to EDABoard.com

Sponsor