Both transitions of CLOCK

Christiano · Aug 4, 2013

In the book of Volnei pedroni in section 6.9 (bad Clocking) it shows that the optimal code for a circuit that works for Both transitions of CLOCK is the one below:

process(clk)
begin
if(clk'event and clk='1') then
x <= d;
end if;
end process;

process(clk)
begin
if(clk'event and clk='0') then
y <= d;
end if;
end process;

However, it would not be correct to write the form below?

process(clk)
begin
if(clk'event and clk='1') then
x <= d;
end if;

if(clk'event and clk='0') then
y <= d;
end if;
end process;

KJ · Aug 4, 2013

On Sunday, August 4, 2013 1:10:22 PM UTC-4, Christiano wrote:

However, it would not be correct to write the form below?

Yes, both forms are equivalent.

Kevin Jennings

Daniel Kho · Aug 4, 2013

I have been trying to get at least one synthesis tool vendor to support the following equivalent forms:

/* Alternate form 1. */
process(clk) is begin
if rising_edge(clk) then x<=d; end if;
end process;

process(clk) is begin
if falling_edge(clk) then y<=d; end if;
end process;

q<=x xor y;

OR

/* Alternate form 2. */
process(clk) is begin
if rising_edge(clk) or falling_edge(clk) then
q<=d;
end if;
end process;

You already mentioned 2 other equivalent forms (and confirmed by KJ). My "Alternate Form 1" is similar to the first code you posted (there's a paper by Ralf Hildebrandt that described this in detail). I believe there may be other forms as well. However AFAIK, only these forms (Alt Form 1 & your first code) are supported well by all major synthesis vendors. The rest may be supported by some but not supported by others. However, for simulation, all of these forms should work fine.

If you plan to synthesize your code, then help request your synthesis vendors to support these alternative forms of dual-edge clocking.

-daniel

Daniel Kho · Aug 4, 2013

On Monday, 5 August 2013 04:41:31 UTC+8, Daniel Kho wrote:

I have been trying to get at least one synthesis tool vendor to support the following equivalent forms:

Sorry, I meant I've been trying to get them to support Alternate Form 2 - the first form I posted (Alternate Form 1) is already well supported. They should also support multiple clock statements within the same process (like the example you posted), but I did not have much luck convincing them the last time. Things might have changed since then - I haven't took the time to see if the latest tools work, but my gut tells me the tools are pretty much the same as they were a few years ago (in this regard).

Christiano · Aug 5, 2013

On Sunday, August 4, 2013 5:48:00 PM UTC-3, Daniel Kho wrote:

On Monday, 5 August 2013 04:41:31 UTC+8, Daniel Kho wrote:

I have been trying to get at least one synthesis tool vendor to support the following equivalent forms:

Sorry, I meant I've been trying to get them to support Alternate Form 2 - the first form I posted (Alternate Form 1) is already well supported. They should also support multiple clock statements within the same process (like the example you posted), but I did not have much luck convincing them the last time. Things might have changed since then - I haven't took the time to see if the latest tools work, but my gut tells me the tools are pretty much the same as they were a few years ago (in this regard).

Your Alternate form 2 has a problem that is described in the book of Pedroni. The technology of the flip flop usually only allows one edge.

KJ · Aug 5, 2013

On Sunday, August 4, 2013 7:44:41 PM UTC-4, Christiano wrote:

Your Alternate form 2 has a problem that is described in the book of Pedroni.
The technology of the flip flop usually only allows one edge.

Not true. All of the descriptions so far are assigning to two different signals (x, y). Two signals clocked by different clocks (or different edges of a single clock) will always be implemented totally independent of each other. Both x and y will be synthesized as simple flip flops.

Kevin Jennings

rickman · Aug 5, 2013

On 8/4/2013 4:41 PM, Daniel Kho wrote:

I have been trying to get at least one synthesis tool vendor to support the following equivalent forms:

/* Alternate form 1. */
process(clk) is begin
if rising_edge(clk) then x<=d; end if;
end process;

process(clk) is begin
if falling_edge(clk) then y<=d; end if;
end process;

q<=x xor y;

There is a problem with this last statement. Using an XOR of x and y
does not make q the same as a clocked version of d which would seem to
be your goal as shown in the alternate form 2 below.

OR

/* Alternate form 2. */
process(clk) is begin
if rising_edge(clk) or falling_edge(clk) then
q<=d;
end if;
end process;

You already mentioned 2 other equivalent forms (and confirmed by KJ). My "Alternate Form 1" is similar to the first code you posted (there's a paper by Ralf Hildebrandt that described this in detail). I believe there may be other forms as well. However AFAIK, only these forms (Alt Form 1& your first code) are supported well by all major synthesis vendors. The rest may be supported by some but not supported by others. However, for simulation, all of these forms should work fine.

If you plan to synthesize your code, then help request your synthesis vendors to support these alternative forms of dual-edge clocking.

The trouble is that there is nothing similar between these two forms.
Form 1 registers two signals from the same input signal, but at
different times. The XOR operation is a logical function. Form 2
registers the signal d at two edges of the clock to the signal q, but
this is not synthesizable directly in a single FF.

--

Rick

Daniel Kho · Aug 5, 2013

Yes, the "problem" you are talking about really lies with the tools. If tools are smart enough to be able to infer dual-edge-triggered flip-flops, or pseudo-dual-edge-triggered flip-flops using single-edge-triggered ones, then we will have no problems using Alternate Form 2 (or any other alternate forms for that matter). In fact, the 2nd code you posted also has the same "problems" mentioned by Pedroni, but again this problem can be solved if tool vendors spend more time improving their synthesis engine.

If there are any of these forms that you like to use, then request them from your tool vendors.

Daniel Kho · Aug 5, 2013

The trouble is that there is nothing similar between these two forms.

Sure, but the first form _emulates_ dual clocking behaviour, though yes, it is being described as combinatorial logic. This is what is called pseudo-dual-edge triggering (as described by Ralf Hildebrandt). If synthesis tools can infer dual-triggered flip-flops from this (pseudo) description, that would be nice. (Of course, warn the user that it has optimized the design).

The second form assumes there are dual-triggered flip-flops available for use, and the synthesis tool is smart enough to be able to instantiate these resources.

Both descriptions are different, but have the same intention (dual-edge clocking).

-daniel

Allan Herriman · Aug 5, 2013

On Sun, 04 Aug 2013 11:14:58 -0700, KJ wrote:

On Sunday, August 4, 2013 1:10:22 PM UTC-4, Christiano wrote:
However, it would not be correct to write the form below?

Yes, both forms are equivalent.

Whist both forms are equivalent from the point of view of the VHDL
language, I recently came across a problem in XST in which it would not
give the correct results if both edges were used in the same process.

It didn't even give a warning message. It just generated bad code.

Regards,
Allan

alb · Aug 5, 2013

On 05/08/2013 07:18, Daniel Kho wrote:

Sure, but the first form _emulates_ dual clocking behaviour, though
yes, it is being described as combinatorial logic. This is what is
called pseudo-dual-edge triggering (as described by Ralf
Hildebrandt).

I urge you read again the paper you mentioned. The code you posted has
nothing to do with pseudo-dual-edge flip-flop (pde_dff).

BTW, be aware that in pde_dff timing might be an issue due to the
asymmetries between the two paths.

Anyway you could write your own procedure to handle this and eventually
have something like:

pde_dff(clk, nrst, din, dout);

If a synthesis tool will be smart enough you'll have it optimized,
otherwise you'll have the functional equivalent, enhancing your
portability. Consider also that a dual edge triggered clock has a
limited use case and it is possible that cost to support such a small
niche is not worth the candle...

If synthesis tools can infer dual-triggered flip-flops
from this (pseudo) description, that would be nice. (Of course, warn
the user that it has optimized the design).

BTW the FM0 encoding use case presented in the article is rather naive.
You need to do phase lock or guarantee your tx and rx are synchronous,
so why bothering?

Except DDR use, where else is a dual-edge ff nice to have?

Both descriptions are different, but have the same intention
(dual-edge clocking).

A xor operation on the same signal clocked at different moment is not
the same as presented by RH.

rickman · Aug 5, 2013

On 8/5/2013 1:18 AM, Daniel Kho wrote:

The trouble is that there is nothing similar between these two forms.

Sure, but the first form _emulates_ dual clocking behaviour, though yes, it is being described as combinatorial logic. This is what is called pseudo-dual-edge triggering (as described by Ralf Hildebrandt). If synthesis tools can infer dual-triggered flip-flops from this (pseudo) description, that would be nice. (Of course, warn the user that it has optimized the design).

The second form assumes there are dual-triggered flip-flops available for use, and the synthesis tool is smart enough to be able to instantiate these resources.

Both descriptions are different, but have the same intention (dual-edge clocking).

I think there is something I am missing in this conversation. I don't
know what a dual-triggered flip-flop is. If you mean a single FF that
is triggered on both edges of the clock, then this does not exist to the
best of my knowledge. If you mean a double data rate circuit like used
in memory configurations - that does not combine the data into a single
data stream because there is no 2x clock to process the resulting data.

+-----+ +-----+ !
Data --+----->|D Q|-------| mux | ! +-----+
| | | | |-------|D Q|---- Data'
| +--|> | +--| s | ! | |
| | +-----+ | +-----+ ! +-|> |
| | | | ! | +-----+
| +-------------(-----+ ! |
| | +-----+ | ! |
+---(->|D Q|----+ ! |
| | | ! |
CLK--------+-o|> | ! |
+-----+ ! |
2xCLK-----------------------------------+

The logic to the left of the ! line is effectively what your
dual-triggered flip-flop would accomplish. But since there is no 2x
clock to work with the output from that device, what use is it?

You seem to be focusing on synthesis of this dual-triggered flip-flop or
pseudo-dual-triggered flip-flop without a context. At least I'm not
following how they would be of any value.

What is normally done is to clock the two phases of data into two
separate streams which are then processed in parallel, much like using a
SERDES. The serial data is very high speed, it is shifted into a set of
registers and processed in the FPGA as parallel words.

--

Rick

Andy · Aug 5, 2013

On Sunday, August 4, 2013 3:41:31 PM UTC-5, Daniel Kho wrote:

I have been trying to get at least one synthesis tool vendor to support the following equivalent forms:

I think you meant:

Process (clk)
begin
if rising_edge(clk) then x <= y XOR d; end if;
if falling_edge(clk) then y <= x XOR d; end if;
end process;

q <= x XOR y;

Andy

rickman · Aug 5, 2013

On 8/5/2013 11:02 AM, Daniel Kho wrote:

I think you meant:

Process (clk)

begin

if rising_edge(clk) then x<= y XOR d; end if;

if falling_edge(clk) then y<= x XOR d; end if;

end process;

q<= x XOR y;

Andy

Andy, yes you got me.

regards, daniel

How is this useful though? You now have an internal signal which is the
same as the external signal based on a clock that is twice the rate of
the internal clock in your FPGA.

Is this just a thought experiment?

--

Rick

Daniel Kho · Aug 5, 2013

I think you meant:

Process (clk)

begin

if rising_edge(clk) then x <= y XOR d; end if;

if falling_edge(clk) then y <= x XOR d; end if;

end process;

q <= x XOR y;

Andy

Andy, yes you got me.

regards, daniel

Andy · Aug 6, 2013

On Monday, August 5, 2013 11:02:33 AM UTC-5, rickman wrote:

How is this useful though? You now have an internal signal
which is the same as the external signal based on a clock
that is twice the rate of the internal clock in your FPGA.
Is this just a thought experiment?

I have not found too many useful internal uses for DDR in general, or this implementation of DDR.

However, if you need to use a double-data-rate output from a low cost/power device that does not support DDR outputs or have an available PLL/DLL, this is one way you can safely (glitchlessly) do that.

If your device does support built-in DDR outputs, they are very useful for driving clock signals out (with the same timing as the data driven from the same global clock), and/or for a gated clock, etc.

Some architectures do not provide dedicated low-skew routing from global clock nets to output pins (e.g. through an output buffer), so using DDR output registers is a good way to generate multiple in-phase (or 180 out) clock outputs at very low skew, at the same rate as the global clock.

And it's not a bad thought experiment either.

Andy

rickman · Aug 9, 2013

On 8/6/2013 8:32 AM, Andy wrote:

On Monday, August 5, 2013 11:02:33 AM UTC-5, rickman wrote:
How is this useful though? You now have an internal signal
which is the same as the external signal based on a clock
that is twice the rate of the internal clock in your FPGA.
Is this just a thought experiment?

I have not found too many useful internal uses for DDR in general, or this implementation of DDR.

However, if you need to use a double-data-rate output from a low cost/power device that does not support DDR outputs or have an available PLL/DLL, this is one way you can safely (glitchlessly) do that.

I don't follow. The circuit we are discussing pulls in a signal that
changes data at twice the clock rate. It then combined the two data
streams (rising edge, falling edge data) into a single data stream which
should be the same as the corresponding external data stream... at TWICE
the clock rate of your internal clock. That does nothing for
interfacing to DDR signals. This circuit is for input, not output. In
addition there can be problems trying to generate a combinatorial signal
from FF outputs unless you can appropriately control the path delays.

If your device does support built-in DDR outputs, they are very useful for driving clock signals out (with the same timing as the data driven from the same global clock), and/or for a gated clock, etc.

I'm not following how you would use this circuit to generate a clock.
All it can do is to generate the same frequency clock as the system
clock in the chip. I think you are saying this clock would have the
same phasing as data which I supposed is true, but I'm not sure how that
would help. There is always going to be skew between the clock and
data. Normally one clock is an input to multiple chips on a board with
virtually no skew if you match the path lengths. Then the problem is
just a matter of meeting the chip to chip delay requirements. By
outputting the clock from the FPGA you have an unknown skew (to the best
of my knowledge). Is this parameter specified for FPGAs?

Some architectures do not provide dedicated low-skew routing from global clock nets to output pins (e.g. through an output buffer), so using DDR output registers is a good way to generate multiple in-phase (or 180 out) clock outputs at very low skew, at the same rate as the global clock.

I don't remember seeing the output skew specified for FPGAs. Do they do
that?

And it's not a bad thought experiment either.

I won't argue that.

--

Rick

Andy · Aug 12, 2013

Rick,

I was referring to Daniel Kho's description which was for a DDR type output, whether used internally or externally.

The worst case output skew is either reported in or can be deduced from the timing reports.

As an example, I had a project with an FPGA that needed to provide multiple low-skew clock output signals at the same frequency as an available clock signal.

The FPGA did not provide dedicated connections from the global clock net to any port on the IOB except for the clock input(s) of the IOB. If you wanted to drive the globally routed clock out on a pad, you had to use local routing near the IOB to get off the global clock net and into the out port of the IOB (so that the IOB would buffer the signal to the pad). This local routing is very unpredictable in between P&R runs if there are any changes to the design, and there is a significant delay in that routing.

To solve the problem, I configured the IOBs as a DDR output registers, with '1' on the rising edge data input, and '0' on the falling edge data input (to replicate the clock on the output). Since the global clock net has a direct connection to the IOB clock ports, the skew is very low and controllable among multiple outputs driven the same way. In this example, we drove 7 identical clock output signals. All were routed point-point with matched lengths to control timing and signal integrity.

Granted, all the low skew features fly out the window if you have to emulate the DDR output register. But you can safely (glitchlessly) enable and disable a single clock output driven from an emulated DDR, since the enable/disable signals are inputs to the registers, and thus the output can only change as a result of the clock changing. The XOR gates are glitchless since the two inputs cannot change at the same time. And the design is STA-friendly.

Andy

Both transitions of CLOCK

Christiano

Guest

KJ

Guest

Daniel Kho

Guest

Daniel Kho

Guest

Christiano

Guest

KJ

Guest

rickman

Guest

Daniel Kho

Guest

Daniel Kho

Guest

Allan Herriman

Guest

alb

Guest

rickman

Guest

Andy

Guest

rickman

Guest

Daniel Kho

Guest

Andy

Guest

rickman

Guest

Andy

Guest

Log in

Welcome to EDABoard.com

Sponsor