sram

KJ wrote on 8/6/2017 1:33 PM:
On Sunday, August 6, 2017 at 12:40:25 PM UTC-4, rickman wrote:
KJ wrote on 8/6/2017 8:01 AM:
It's even easier than that to synchronously control a standard async SRAM. Simply connect WE to the clock and hold OE active all the time except for cycles where you want to write something new into the SRAM.

That would depend a *lot* on the details of the setup and hold times for the
async SRAM, no? You can do what you want with data for much of the clock
cycle, but the address has to meet setup and hold for the entire WE time.
That's typically more than half a clock cycle and makes it hard to use it on
every clock cycle.

Address (and data) setup and hold times are easily met. As a first order approximation, the setup time will be T/2-Tco(max). The address hold time will be Tco(min).

What is your source for statement "That's typically more than half a clock cycle"? The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].

I'm talking about the time the address must remain stable. Your
calculations above show it is at a minimum T/2.

When running with fast SRAM it can be very hard to get this to work
properly. The devil is in the details of the chips.


> The technique works. You get single cycle read or write on 100% of the clock cycles, timing is met, period...and it worked 20+ years ago on product I designed [2].

Great! You were able to use it on one device at an unknown speed. What was
the clock period?

Did you supply the WE from the external clock (same as to the FPGA) or a
copy of the clock from inside the FPGA? In the case of the former the total
delay through the chip of the signals can be a significant part of the setup
margin. If the latter it is hard to control the routing delays.

--

Rick C
 
Den søndag den 6. august 2017 kl. 19.40.40 UTC+2 skrev KJ:
On Sunday, August 6, 2017 at 1:30:46 PM UTC-4, lasselangwad...@gmail.com wrote:
Den søndag den 6. august 2017 kl. 18.40.25 UTC+2 skrev rickman:

and just using the clock give you the headache of trying to control routing
delays on data vs. WE

using the dual edge output flipflop makes it all much controllable

Not true. There is nothing special that needs to be done to "control routing delays on data vs. WE". Do you have any basis for that statement?

to get your clock out to you WE pin you first have to get off the clock network and out to an IO, how are you going to guarantee that delay is the same as the data going from an output flop to an io?

Using the method I described is absolutely the same as connecting up two 74X374 flip flops, nothing more, nothing less. How is that a 'headache'?

with a string of 374 you also have to make sure the delay on the clock is
controlled with regards to the data
 
On Sunday, August 6, 2017 at 2:08:09 PM UTC-4, lasselangwad...@gmail.com wrote:
Den søndag den 6. august 2017 kl. 19.40.40 UTC+2 skrev KJ:
Not true. There is nothing special that needs to be done to "control routing delays on data vs. WE". Do you have any basis for that statement?

to get your clock out to you WE pin you first have to get off the clock network and out to an IO, how are you going to guarantee that delay is the same as the data going from an output flop to an io?

One does not need to "guarantee that delay is the same as the data going from an output flop to an io" in order to get a working design as you stated. Instead, one can design it such that the clock to the flip flops that generates the address/data/control signals are simultaneous, within some applicable design tolerance, with the clock signal (aka WE) arriving at the SRAM..

In fact, since there are tolerances, if you design it such that the nominal data delay matches the nominal clock delay as you suggest you are essentially crossing your fingers hoping that you don't run across a 'fast' data path and a 'slow' clock path over full PVT range...either that or you are lengthening the data path on the PCBA to guarantee that it never beats the clock. Yes you can do that to get a guaranteed working design, but that would seem to be more of the 'headache' that you mentioned than my approach of just routing them on the shortest path as one would probably normally do anyway.

Using the method I described is absolutely the same as connecting up two 74X374 flip flops, nothing more, nothing less. How is that a 'headache'?


with a string of 374 you also have to make sure the delay on the clock is
controlled with regards to the data

No. The delay of data relative to clock in a string of flip flops is not important at all if every flip flop receives the same rising edge. Getting multiple receivers to receive the same clock signal (to within some tolerance) is something that a designer does have control over. Relying on the control of skew between two or more signals, not so much.

This simultaneous receipt of the clock signal is essentially what goes on inside every FPGA. You can send any FF output to the input of any other FF on the device because they design the clock network to produce this simultaneous action. It's not because they added data routing delays.

Kevin Jennings
 
On Sunday, August 6, 2017 at 2:07:35 PM UTC-4, rickman wrote:
KJ wrote on 8/6/2017 1:33 PM:
What is your source for statement "That's typically more than half a clock cycle"? The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].

I'm talking about the time the address must remain stable. Your
calculations above show it is at a minimum T/2.

My calculation is T/2-Tco(max). As long as Tco(max) <= T/2 then the design will work with anything compatible with the Cypress part that I previously referenced that requires 0 setup time. Tco(max) being less than one half of a clock cycle is not much of a hurdle. The SRAM access time will typically be greater.

When running with fast SRAM it can be very hard to get this to work
properly.

Speaking for myself I can say that no it was not hard at all, it worked right at the start. I'm not sure where you see the difficulty.

The devil is in the details of the chips.

And I've provided the details. More so than you.

Great!
Thanks!

You were able to use it on one device at an unknown speed.
You're making assumptions here that are incorrect.

What was the clock period?
I dunno, that was 20+ years ago but it was using the fastest available CMOS SRAMs of the mid to late 1990s. But the clock speed is not relevant, the technique is still valid. The biggest limiting factor is going to be the read/write speed of the async SRAM.

Kevin Jennings
 
On 8/6/17 3:42 PM, KJ wrote:
On Sunday, August 6, 2017 at 2:07:35 PM UTC-4, rickman wrote:
KJ wrote on 8/6/2017 1:33 PM:
What is your source for statement "That's typically more than half a clock cycle"? The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].

I'm talking about the time the address must remain stable. Your
calculations above show it is at a minimum T/2.


My calculation is T/2-Tco(max). As long as Tco(max) <= T/2 then the design will work with anything compatible with the Cypress part that I previously referenced that requires 0 setup time. Tco(max) being less than one half of a clock cycle is not much of a hurdle. The SRAM access time will typically be greater.

When running with fast SRAM it can be very hard to get this to work
properly.

Speaking for myself I can say that no it was not hard at all, it worked right at the start. I'm not sure where you see the difficulty.

The devil is in the details of the chips.

And I've provided the details. More so than you.

Great!
Thanks!

You were able to use it on one device at an unknown speed.
You're making assumptions here that are incorrect.

What was the clock period?
I dunno, that was 20+ years ago but it was using the fastest available CMOS SRAMs of the mid to late 1990s. But the clock speed is not relevant, the technique is still valid. The biggest limiting factor is going to be the read/write speed of the async SRAM.

Kevin Jennings

I think, if I understand what you are proposing, one big issue is you
seem to be assuming that the clock that you are using as WE starts
external to the FPGA (or at least comes out and goes back in) so that
you know the clock rises before the data on the address bus can change.
From my experience, in very many cases, this is NOT true for an FPGA
design, but some slower clock comes in, and the highest speed clocks are
generated by PLLs in the FPGA.

A second really big issue is how do you do a read cycle if the write
comes ungated from the clock. The best I can figure is you are assuming
you can get a read done in 1/2 a clock cycle and just rewrite the data.
In most such rams WE overrides OE, and the Selects kill both read and
write. Unless you had a part with both a WE and WS (where WE could
disable the WS, but did not itself need to have the required setup/hold
to address) I can't see how you do reads with the clock anywhere close
to cycle time, and having a WOM (Write only Memory) isn't that useful here.
 
Den søndag den 6. august 2017 kl. 21.26.23 UTC+2 skrev KJ:
On Sunday, August 6, 2017 at 2:08:09 PM UTC-4, lasselangwad...@gmail.com wrote:
Den søndag den 6. august 2017 kl. 19.40.40 UTC+2 skrev KJ:
Not true. There is nothing special that needs to be done to "control routing delays on data vs. WE". Do you have any basis for that statement?

to get your clock out to you WE pin you first have to get off the clock network and out to an IO, how are you going to guarantee that delay is the same as the data going from an output flop to an io?


One does not need to "guarantee that delay is the same as the data going from an output flop to an io" in order to get a working design as you stated. Instead, one can design it such that the clock to the flip flops that generates the address/data/control signals are simultaneous, within some applicable design tolerance, with the clock signal (aka WE) arriving at the SRAM.

how are you going to control the delay from output ff to io vs, clock getting of the clock tree to io?

In fact, since there are tolerances, if you design it such that the nominal data delay matches the nominal clock delay as you suggest you are essentially crossing your fingers hoping that you don't run across a 'fast' data path and a 'slow' clock path over full PVT range...either that or you are lengthening the data path on the PCBA to guarantee that it never beats the clock. Yes you can do that to get a guaranteed working design, but that would seem to be more of the 'headache' that you mentioned than my approach of just routing them on the shortest path as one would probably normally do anyway.

using a DDR output data and WE all have the same path to io and should thus track over PVT

using the clock directly is pretty much guranteed to add more delay than the clock to out on the output ffs

Using the method I described is absolutely the same as connecting up two 74X374 flip flops, nothing more, nothing less. How is that a 'headache'?


with a string of 374 you also have to make sure the delay on the clock is
controlled with regards to the data

No. The delay of data relative to clock in a string of flip flops is not important at all if every flip flop receives the same rising edge. Getting multiple receivers to receive the same clock signal (to within some tolerance) is something that a designer does have control over. Relying on the control of skew between two or more signals, not so much.

This simultaneous receipt of the clock signal is essentially what goes on inside every FPGA. You can send any FF output to the input of any other FF on the device because they design the clock network to produce this simultaneous action. It's not because they added data routing delays.

FF out to FF in is safe by design, once you m mix in clock used as "data" you add an unknown delay
 
On Sunday, August 6, 2017 at 4:09:28 PM UTC-4, Richard Damon wrote:
On 8/6/17 3:42 PM, KJ wrote:
I think, if I understand what you are proposing, one big issue is you
seem to be assuming that the clock that you are using as WE starts
external to the FPGA (or at least comes out and goes back in) so that
you know the clock rises before the data on the address bus can change.
From my experience, in very many cases, this is NOT true for an FPGA
design, but some slower clock comes in, and the highest speed clocks are
generated by PLLs in the FPGA.

No that was not my assumption. The clocking situation is no different than how one synchronizing the internal clock and the external clock in SDRAM or DDR. Even before there were DDR parts and DDR flops in FPGAs, there were single clock SDRAMs and they had FPGA controllers. Clock synchronization between the FPGA and the SDRAM is required there as well and would use the same control technique.

In most such rams WE overrides OE, and the Selects kill both read and
write.

Well, looking around now that does seem to be the case today which sort of makes me wonder which SRAM I was using back then. At that time, CE and OE enabled the I/O drivers independent of WE. Writing to memory was sometimes (depending on the part) inhibited if OE was active. I don't believe I relied on any bus-hold circuit circuit or any sort of other trickery like that.. I will say that the design did work and was in production for several years without issue but, in any case, my solution does not seem applicable today. Interesting, good catch.

Kevin Jennings
 
On Sunday, August 6, 2017 at 4:09:31 PM UTC-4, lasselangwad...@gmail.com wrote:
how are you going to control the delay from output ff to io vs, clock getting of the clock tree to io?

By using the phase control of the PLL to adjust the clock leaving the chip relative to the clock internal to the chip. That can be done in a way to guarantee operation.

using a DDR output data and WE all have the same path to io and should
thus track over PVT

'Should' is an important word there...but practically speaking I agree that there is probably 'slim' chances of failure.

FF out to FF in is safe by design, once you m mix in clock used as "data"
you add an unknown delay

Mixing in clock as data was not what I was doing. In any case, based on my reply to Richard Damon's post, my approach, while it worked back in the day, wouldn't work now.

Kevin
 
KJ wrote on 8/6/2017 3:42 PM:
On Sunday, August 6, 2017 at 2:07:35 PM UTC-4, rickman wrote:
KJ wrote on 8/6/2017 1:33 PM:
What is your source for statement "That's typically more than half a clock cycle"? The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].

I'm talking about the time the address must remain stable. Your
calculations above show it is at a minimum T/2.


My calculation is T/2-Tco(max). As long as Tco(max) <= T/2 then the design will work with anything compatible with the Cypress part that I previously referenced that requires 0 setup time. Tco(max) being less than one half of a clock cycle is not much of a hurdle. The SRAM access time will typically be greater.

When running with fast SRAM it can be very hard to get this to work
properly.

Speaking for myself I can say that no it was not hard at all, it worked right at the start. I'm not sure where you see the difficulty.

The devil is in the details of the chips.

And I've provided the details. More so than you.

Great!
Thanks!

You were able to use it on one device at an unknown speed.
You're making assumptions here that are incorrect.

What was the clock period?
I dunno, that was 20+ years ago but it was using the fastest available CMOS SRAMs of the mid to late 1990s. But the clock speed is not relevant, the technique is still valid. The biggest limiting factor is going to be the read/write speed of the async SRAM.

Hmmm, looking at a current data sheet I don't see where you can gate the
write cycle with OE. WE, the byte enables and CE, but not OE.

--

Rick C
 
KJ wrote:
It's even easier than that to synchronously control a standard async SRAM.
Simply connect WE to the clock and hold OE active all the time except
for cycles where you want to write something new into the SRAM.

As has been explained to you in detail by several other posters, your method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the chip, coincident with other registered I/O signals, is to use the dual-edge IOB flip-flops as I suggested.

The DDR technique I mentioned would run synchronous single-cycle read or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse width requirements.

Another advantage of the 'forwarding' method is that one can use the internal FPGA clock resources for clock multiply/divides etc. without needing to also manage the board-level low-skew clock distribution needed by your method.

-Brian
 
brimdavis@gmail.com wrote on 8/8/2017 8:37 PM:
KJ wrote:

It's even easier than that to synchronously control a standard async SRAM.
Simply connect WE to the clock and hold OE active all the time except
for cycles where you want to write something new into the SRAM.

As has been explained to you in detail by several other posters, your method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the chip, coincident with other registered I/O signals, is to use the dual-edge IOB flip-flops as I suggested.

The DDR technique I mentioned would run synchronous single-cycle read or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse width requirements.

Another advantage of the 'forwarding' method is that one can use the internal FPGA clock resources for clock multiply/divides etc. without needing to also manage the board-level low-skew clock distribution needed by your method.

I can't say I follow what you are proposing. How do you get the clock out
of the FPGA with a defined time relationship to the signals clocked through
the IOB? Is this done with feedback from the output clock using the
internal clocking circuits?

--

Rick C
 
On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote:

brimdavis@gmail.com wrote on 8/8/2017 8:37 PM:
KJ wrote:

It's even easier than that to synchronously control a standard async
SRAM.
Simply connect WE to the clock and hold OE active all the time except
for cycles where you want to write something new into the SRAM.

As has been explained to you in detail by several other posters, your
method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the chip,
coincident with other registered I/O signals, is to use the dual-edge
IOB flip-flops as I suggested.

The DDR technique I mentioned would run synchronous single-cycle read
or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10
ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse
width requirements.

Another advantage of the 'forwarding' method is that one can use the
internal FPGA clock resources for clock multiply/divides etc. without
needing to also manage the board-level low-skew clock distribution
needed by your method.

I can't say I follow what you are proposing. How do you get the clock
out of the FPGA with a defined time relationship to the signals clocked
through the IOB? Is this done with feedback from the output clock using
the internal clocking circuits?

About a decade back, mainstream FPGAs gained greatly expanded IOB
clocking abilities to support DDR RAM (and other interfaces such as
RGMII).
In particular, one can forward a clock out of an FPGA pin phase aligned
with data on other pins. You can also use one of the internal PLLs to
generate phase shifted clocks, and thus have a phase shift on the pins
between two data signals or between the clock and the data signals.

This can be done without needing feedback from the pins.


You should try reading a datasheet occasionally - they can be very
informative.
Just in case someone has blocked Google where you are: here's an example:
https://www.xilinx.com/support/documentation/user_guides/ug571-ultrascale-
selectio.pdf

Allan
 
rickman wrote:
I can't say I follow what you are proposing. How do you get
the clock out of the FPGA with a defined time relationship
to the signals clocked through the IOB?

The links I gave in my original post explain the technique:
I posted some notes on this technique (for a Spartan-3) to the fpga-cpu group >> many years ago:
https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/messages/2076
https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/messages/2177

Allan Herriman wrote:
About a decade back, mainstream FPGAs gained greatly expanded IOB
clocking abilities to support DDR RAM (and other interfaces such as
RGMII).

Nearly twenty years now!

Xilinx parts had ODDR equivalents in Virtex-E using hard macros; then the actual ODDR primitive stuff appeared in Virtex-2.

-Brian
 
Allan Herriman wrote on 8/10/2017 2:02 AM:
On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote:

brimdavis@gmail.com wrote on 8/8/2017 8:37 PM:
KJ wrote:

It's even easier than that to synchronously control a standard async
SRAM.
Simply connect WE to the clock and hold OE active all the time except
for cycles where you want to write something new into the SRAM.

As has been explained to you in detail by several other posters, your
method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the chip,
coincident with other registered I/O signals, is to use the dual-edge
IOB flip-flops as I suggested.

The DDR technique I mentioned would run synchronous single-cycle read
or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10
ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse
width requirements.

Another advantage of the 'forwarding' method is that one can use the
internal FPGA clock resources for clock multiply/divides etc. without
needing to also manage the board-level low-skew clock distribution
needed by your method.

I can't say I follow what you are proposing. How do you get the clock
out of the FPGA with a defined time relationship to the signals clocked
through the IOB? Is this done with feedback from the output clock using
the internal clocking circuits?


About a decade back, mainstream FPGAs gained greatly expanded IOB
clocking abilities to support DDR RAM (and other interfaces such as
RGMII).
In particular, one can forward a clock out of an FPGA pin phase aligned
with data on other pins. You can also use one of the internal PLLs to
generate phase shifted clocks, and thus have a phase shift on the pins
between two data signals or between the clock and the data signals.

This can be done without needing feedback from the pins.


You should try reading a datasheet occasionally - they can be very
informative.
Just in case someone has blocked Google where you are: here's an example:
https://www.xilinx.com/support/documentation/user_guides/ug571-ultrascale-
selectio.pdf

Thank you for the link to the 356 page document. No, I have not researched
how every brand of FPGA implements DDR interfaces mostly because I have not
designed a DDR memory interface in an FPGA. I did look at the document and
didn't find info on how the timing delays through the IOB might be
synchronized with the output clock.

So how exactly does the tight alignment of a clock exiting a Xilinx FPGA
maintain alignment with data exiting the FPGA over time and differential
temperature? What will the timing relationship be and how tightly can it be
maintained?

Just waving your hands and saying things can be aligned doesn't explain how
it works. This is a discussion. If you aren't interested in discussing,
then please don't bother to reply.

--

Rick C
 
brimdavis@gmail.com wrote on 8/10/2017 7:46 PM:
rickman wrote:

I can't say I follow what you are proposing. How do you get
the clock out of the FPGA with a defined time relationship
to the signals clocked through the IOB?


The links I gave in my original post explain the technique:

I posted some notes on this technique (for a Spartan-3) to the fpga-cpu group >> many years ago:
https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/messages/2076
https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/messages/2177

I haven't used a Xilinx part in at something like 15 years. So I don't
recall all the details. I don't follow how you achieve the timing margin
needed between the address, control and data signals which are passing
through the IOB and the WE signal pulse is being generated in the IOB DDR.
Even with a hold time requirement of 0 ns something has to be done to
prevent a race condition. Your posts seem to say you used different drive
strengths to use the trace capacitance to create different delays in signal
timing. If you can't use a data sheet to produce a timing analysis, it
would seem to be a fairly sketchy method that you can't count on to work
under all conditions. I suppose you could qualify the circuit over
temperature and voltage and then make some assumptions about process
variability, but as I say, sketchy.


--

Rick C
 
On 8/10/17 10:39 PM, rickman wrote:
Allan Herriman wrote on 8/10/2017 2:02 AM:
On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote:

brimdavis@gmail.com wrote on 8/8/2017 8:37 PM:
KJ wrote:

It's even easier than that to synchronously control a standard async
SRAM.
Simply connect WE to the clock and hold OE active all the time except
for cycles where you want to write something new into the SRAM.

As has been explained to you in detail by several other posters, your
method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the chip,
coincident with other registered I/O signals, is to use the dual-edge
IOB flip-flops as I suggested.

The DDR technique I mentioned would run synchronous single-cycle read
or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10
ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse
width requirements.

Another advantage of the 'forwarding' method is that one can use the
internal FPGA clock resources for clock multiply/divides etc. without
needing to also manage the board-level low-skew clock distribution
needed by your method.

I can't say I follow what you are proposing. How do you get the clock
out of the FPGA with a defined time relationship to the signals clocked
through the IOB? Is this done with feedback from the output clock using
the internal clocking circuits?


About a decade back, mainstream FPGAs gained greatly expanded IOB
clocking abilities to support DDR RAM (and other interfaces such as
RGMII).
In particular, one can forward a clock out of an FPGA pin phase aligned
with data on other pins. You can also use one of the internal PLLs to
generate phase shifted clocks, and thus have a phase shift on the pins
between two data signals or between the clock and the data signals.

This can be done without needing feedback from the pins.


You should try reading a datasheet occasionally - they can be very
informative.
Just in case someone has blocked Google where you are: here's an example:
https://www.xilinx.com/support/documentation/user_guides/ug571-ultrascale-

selectio.pdf

Thank you for the link to the 356 page document. No, I have not
researched how every brand of FPGA implements DDR interfaces mostly
because I have not designed a DDR memory interface in an FPGA. I did
look at the document and didn't find info on how the timing delays
through the IOB might be synchronized with the output clock.

So how exactly does the tight alignment of a clock exiting a Xilinx FPGA
maintain alignment with data exiting the FPGA over time and differential
temperature? What will the timing relationship be and how tightly can
it be maintained?

Just waving your hands and saying things can be aligned doesn't explain
how it works. This is a discussion. If you aren't interested in
discussing, then please don't bother to reply.

Thinking about it, YES, FPGAs normally have a few pins that can be
configured as dedicated clock drivers, and it will generally be
guaranteed that if those pins are driving out a global clock, then any
other pin with output clocked by that clock will change so as to have a
known hold time (over specified operating conditions). This being the
way to run a typical synchronous interface.

Since this method requires the WE signal to be the clock, you need to
find a part that has either a write mask signal, or perhaps is
multi-ported so this port could be dedicated to writes and another port
could be used to read what is needed (the original part for this thread
wouldn't be usable with this method).
 
Richard Damon wrote on 8/11/2017 12:09 AM:
On 8/10/17 10:39 PM, rickman wrote:
Allan Herriman wrote on 8/10/2017 2:02 AM:
On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote:

brimdavis@gmail.com wrote on 8/8/2017 8:37 PM:
KJ wrote:

It's even easier than that to synchronously control a standard async
SRAM.
Simply connect WE to the clock and hold OE active all the time except
for cycles where you want to write something new into the SRAM.

As has been explained to you in detail by several other posters, your
method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the chip,
coincident with other registered I/O signals, is to use the dual-edge
IOB flip-flops as I suggested.

The DDR technique I mentioned would run synchronous single-cycle read
or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10
ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse
width requirements.

Another advantage of the 'forwarding' method is that one can use the
internal FPGA clock resources for clock multiply/divides etc. without
needing to also manage the board-level low-skew clock distribution
needed by your method.

I can't say I follow what you are proposing. How do you get the clock
out of the FPGA with a defined time relationship to the signals clocked
through the IOB? Is this done with feedback from the output clock using
the internal clocking circuits?


About a decade back, mainstream FPGAs gained greatly expanded IOB
clocking abilities to support DDR RAM (and other interfaces such as
RGMII).
In particular, one can forward a clock out of an FPGA pin phase aligned
with data on other pins. You can also use one of the internal PLLs to
generate phase shifted clocks, and thus have a phase shift on the pins
between two data signals or between the clock and the data signals.

This can be done without needing feedback from the pins.


You should try reading a datasheet occasionally - they can be very
informative.
Just in case someone has blocked Google where you are: here's an example:
https://www.xilinx.com/support/documentation/user_guides/ug571-ultrascale-
selectio.pdf

Thank you for the link to the 356 page document. No, I have not
researched how every brand of FPGA implements DDR interfaces mostly
because I have not designed a DDR memory interface in an FPGA. I did look
at the document and didn't find info on how the timing delays through the
IOB might be synchronized with the output clock.

So how exactly does the tight alignment of a clock exiting a Xilinx FPGA
maintain alignment with data exiting the FPGA over time and differential
temperature? What will the timing relationship be and how tightly can it
be maintained?

Just waving your hands and saying things can be aligned doesn't explain
how it works. This is a discussion. If you aren't interested in
discussing, then please don't bother to reply.


Thinking about it, YES, FPGAs normally have a few pins that can be
configured as dedicated clock drivers, and it will generally be guaranteed
that if those pins are driving out a global clock, then any other pin with
output clocked by that clock will change so as to have a known hold time
(over specified operating conditions). This being the way to run a typical
synchronous interface.

Since this method requires the WE signal to be the clock, you need to find a
part that has either a write mask signal, or perhaps is multi-ported so this
port could be dedicated to writes and another port could be used to read
what is needed (the original part for this thread wouldn't be usable with
this method).

I'm not sure you read the full thread. The method for generating the WE
signal is to use the two DDR FFs to drive a one level during one half of the
clock and to drive the write signal during the other half of the clock. I
misspoke above when I called it a "clock". The *other* method involved
using the actual clock as WE and gating it with the OE signal which won't
work on all async RAMs.

So with the DDR method *all* of the signals will exit the chip with a
nominal zero timing delay relative to each other. This is literally the
edge of the async RAM spec. So you need to have some delays on the other
signals relative to the WE to allow for variation in timing of individual
outputs. It seems the method suggested is to drive the CS and WE signals
hard and lighten the drive on the other outputs.

This is a method that is not relying on any guaranteed spec from the FPGA
maker. This method uses trace capacitance to create delta t = delta v * c /
i to speed or slow the rising edge of the various outputs. This relies on
over compensating the FPGA spec by means that depend on details of the board
layout. It reminds me of the early days of generating timing signals for
DRAM with logic delays.

Yeah, you might get it to work, but the layout will need to be treated with
care and respect even more so than an impedance controlled trace. It will
need to be characterized over temperature and voltage and you will have to
design in enough margin to allow for process variations.

--

Rick C
 
On Thu, 10 Aug 2017 16:46:13 -0700, brimdavis wrote:

rickman wrote:

I can't say I follow what you are proposing. How do you get the clock
out of the FPGA with a defined time relationship to the signals clocked
through the IOB?


The links I gave in my original post explain the technique:

I posted some notes on this technique (for a Spartan-3) to the
fpga-cpu group >> many years ago:
https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/
messages/2076
https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/
messages/2177


Allan Herriman wrote:

About a decade back, mainstream FPGAs gained greatly expanded IOB
clocking abilities to support DDR RAM (and other interfaces such as
RGMII).


Nearly twenty years now!

Xilinx parts had ODDR equivalents in Virtex-E using hard macros; then
the actual ODDR primitive stuff appeared in Virtex-2.

Nearly twenty years! Doesn't time fly when you're having fun.

Thinking back, the last time I connected an async SRAM to an FPGA was in
1997, using a Xilinx 5200 series device.

The 5200 was a low cost family, a bit like the XC4000 series, but with
even worse routing resources, and (keeping it on-topic for this thread)
NO IOB FF. Yes, that's right, to get repeatable IO timing, one had to LOC
a fabric FF near the pin and do manual routing from that FF to the pin.
(The manual routing could be saved as a string in a constraints file,
IIRC).

Still, I managed to meet all the SRAM timing requirements, but only by
using two clocks for each RAM read or write. The write strobe used a
negative edge triggered FF.


"And if you tell that to the young people today, they won't believe you"

Regards,
Allan
 
On Thu, 10 Aug 2017 22:39:39 -0400, rickman wrote:

Allan Herriman wrote on 8/10/2017 2:02 AM:
On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote:

brimdavis@gmail.com wrote on 8/8/2017 8:37 PM:
KJ wrote:

It's even easier than that to synchronously control a standard async
SRAM.
Simply connect WE to the clock and hold OE active all the time
except for cycles where you want to write something new into the
SRAM.

As has been explained to you in detail by several other posters, your
method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the
chip,
coincident with other registered I/O signals, is to use the dual-edge
IOB flip-flops as I suggested.

The DDR technique I mentioned would run synchronous single-cycle read
or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC)
10 ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE
pulse width requirements.

Another advantage of the 'forwarding' method is that one can use the
internal FPGA clock resources for clock multiply/divides etc.
without needing to also manage the board-level low-skew clock
distribution needed by your method.

I can't say I follow what you are proposing. How do you get the clock
out of the FPGA with a defined time relationship to the signals
clocked through the IOB? Is this done with feedback from the output
clock using the internal clocking circuits?


About a decade back, mainstream FPGAs gained greatly expanded IOB
clocking abilities to support DDR RAM (and other interfaces such as
RGMII).
In particular, one can forward a clock out of an FPGA pin phase aligned
with data on other pins. You can also use one of the internal PLLs to
generate phase shifted clocks, and thus have a phase shift on the pins
between two data signals or between the clock and the data signals.

This can be done without needing feedback from the pins.


You should try reading a datasheet occasionally - they can be very
informative.
Just in case someone has blocked Google where you are: here's an
example:
https://www.xilinx.com/support/documentation/user_guides/ug571-
ultrascale-
selectio.pdf

Thank you for the link to the 356 page document. No, I have not
researched how every brand of FPGA implements DDR interfaces mostly
because I have not designed a DDR memory interface in an FPGA. I did
look at the document and didn't find info on how the timing delays
through the IOB might be synchronized with the output clock.

So how exactly does the tight alignment of a clock exiting a Xilinx FPGA
maintain alignment with data exiting the FPGA over time and differential
temperature? What will the timing relationship be and how tightly can
it be maintained?

Just waving your hands and saying things can be aligned doesn't explain
how it works. This is a discussion. If you aren't interested in
discussing, then please don't bother to reply.

As you say you've never done DDR I'll give a simple explanation here,
using Xilinx primitives as an example.

The clock forwarding is not the same as connecting an internal clock net
to an output pin. Instead, it is output through an ODDR, in exactly the
same way that the DDR output data is produced. (Except in this case,
instead of outputting two data phases, D1 and D2, it just outputs two
constants, '1' and '0' (or '0' and '1' if you want the opposite phase) to
produce a square wave.

The clock-forwarding output and the data output ODDR blocks are all
clocked from the same clock on a low skew internal clock net. This will
typically have some tens of ps (to hundreds of ps, depending on the
particular clocking resource) skew. There will also be skew due to the
different trace lengths for each signal in the BGA interposer, but these
are known and can be compensated for in the PCB design.

Perhaps you want deliberate skew between the clock and data (e.g. for
RGMII) - there are two ways of doing that:
1. Use an ODELAY block on (a subset of) the outputs, ODELAY sits
between the ODDR output and the input of the OBUF pin driver. The ODELAY
is calibrated by a reference clock, and thus is stable against PVT. It
has a delay programmable between ~0 and a few ns.
It has an accuracy of some tens of ps, and produces some tens of ps jitter
on the signal passing through it.

2. Use a PLL (or MMCM) to produce deliberately skewed system clocks
inside the FPGA. These will need separate clocking resources to get to
the IO blocks (leading to some hundreds of ps of additional, unknown
skew).

More details can be found in the user guide that I linked earlier.

Allan
 
rickman wrote:

> I haven't used a Xilinx part in at something like 15 years.

Then maybe you shouldn't post comments like this:

This is a method that is not relying on any guaranteed spec
from the FPGA maker. This method uses trace capacitance to
create delta t = delta v * c /i to speed or slow the rising
edge of the various outputs.

Xilinx characterizes and publishes I/O buffer switching parameters vs. IOSTANDARD/SLEW/DRIVE settings; this information is both summarized in the datasheet and used in generating the timing reports, providing the base delay of the I/O buffer independent of any external capacitive loading [1].

The I/O drive values I used in my S3 testing provided an I/O buffer delay difference of about 1 ns (at the fast device corner) between WE and the address/data lines.

While these I/O pins will be slowed further by any board level loading, for any reasonable board layout it is improbable that this loading will somehow reverse the WE timing relationship and violate the zero-ns hold requirement.

My original 2004 posts clearly specified what was (timing at FPGA pins) and wasn't (board level signal integrity issues) covered in my example:
- board level timing hasn't been looked at ( note that S3
timing reports don't include output buffer loading )

For purposes of a demo example design, I'm perfectly happy with an address/data hold of 10% of the SRAM minimum cycle time, given that the SRAM hold specification is zero ns.

If a design needs more precise control, many of the newer parts have calibrated I/O delays (already mentioned by Allan) that can be used to produce known time delays; in the older S3 family, the easiest way to provide an adjustable time delay would be to use a DCM to phase shift the clock to the OFDDRRSE flip-flop primitive driving WE.


-Brian


[1] UG199 S3 data sheet v3.1
https://www.xilinx.com/support/documentation/data_sheets/ds099.pdf
page 83:
"
" The Output timing for all standards, as published in the speed files
" and the data sheet, is always based on a CL value of zero.
"
 

Welcome to EDABoard.com

Sponsor

Back
Top