seperate high speed rules for HDL?

Guest
Hi everyone. I'm trying to find out if, at high speeds, it is necessary to clock every other register using every other clock transition. For instance, clocking every other register in a shift register using the positive clock transition and the rest use the negative clock transition. This VHDL may help explain:

I know this works at lower speeds:

if(clk'event and clk='1')then
D <= C;
C <= B;
B <= A;
A <= input;
end if;

But I wonder if at higher speeds this sort of coding is required:

if(clk'event and clk='1')then
D <= C;
B <= A;
end if
if(clk'event and clk='0')then
C <= B;
A <= input;
end if;

That way, in my second example, "B" for instance captures it's data in the middle of "A's" data eye. Is this coding style required above some speed? If so, does anyone know how to find out what that speed is or just tell me some general approximate?
 
sketyro@gmail.com wrote:

Hi everyone. I'm trying to find out if, at high speeds, it is
necessary to clock every other register using every other clock
transition. For instance, clocking every other register in a
shift register using the positive clock transition and the rest
use the negative clock transition. This VHDL may help explain:
How fast are you thinking about?

I have done FPGA designs that had at most two LUTs between FFs.
That, and optimal routing, leads to a fairly fast design.
I believe that using one clock edge works best in this case.

For most FPGA families, there is a well optimized clock tree to
minimize the clock skew. If you use different clock edges, there
must either be an inverter on some clock inputs, or two separate
clock trees. Either seems likely to add clock skew, and limit the
speed.

Also, the timing tools might have a harder time figuring out the
appropriate timing. Probably it doesn't cause so much of a problem,
but it is your problem to get the clock timing right.

The only advantage I see is that the clock signal runs at a lower
frequency.

OK, in the olden days there were advantages. We now have nice, well
designed master-slave flip-flops. Before TTL, as well as I know it,
much logic was done using only latches. The Earle latch allows one
to generate efficient pipelines, merging two levels of logic with
the latch logic. Without the advantage of a master-slave FF,
using either two clock edges or, more usual, two separate clock
phases, allows for nice pipelines.

If I remember, the TMS9900 microprocessor uses a four phase clock.
The 8088 and 8086 use a single clock input with 33% duty cycle,
and dynamic logic. (There is a minimum clock frequency of about
one or two MHz. It has been some time since I thought of the
exact value.) The 33% is optimal for the different path lengths
on the two clock edges.

But as far as I know, there are no advantages for current FPGA families.

Now, there are DDR DRAMs which clock on both edges. The FPGA logic
required to do that likely has FFs clocked on both edges. It might
be that for signals going into or out of the FPGA, that you can
do it faster using both edges.

-- glen
 
On Monday, July 29, 2013 11:01:28 PM UTC-4, ske...@gmail.com wrote:
Hi everyone. I'm trying to find out if, at high speeds, it is necessary to clock every
other register using every other clock transition. For instance, clocking every other
register in a shift register using the positive clock transition and the rest use
the negative clock transition.

I know this works at lower speeds:

if(clk'event and clk='1')then
D <= C;
C <= B;
B <= A;
A <= input;
end if;
It works at any speed

But I wonder if at higher speeds this sort of coding is required:

if(clk'event and clk='1')then
D <= C;
B <= A;
end if
if(clk'event and clk='0')then
C <= B;
A <= input;
end if;
About the only plausible situation where you would benefit here is if you can't double the clock frequency either because it would exceed the device limits or if the other logic that is clocked by that same clock can't run that fast without having to massively redesign. You'll still have to deal with trying to receieve data with only one half of a clock time of setup if those negative edge triggered flip flop outputs fan out to anything other than your shift register (i.e. it's a small design niche where it may be useful).

That way, in my second example, "B" for instance captures it's data in the middle of
"A's" data eye.
Being in the middle of a data eye that you can do nothing about doesn't help. You have to meet the setup and hold time of the flip flops, there is no extra credit for placing the sampling clock edge in what you think may be the middle. Devices are designed to distribute free running clocks that originate at an input pin or an internal PLL output with zero skew from the perspective of the designer.

Is this coding style required above some speed? If so, does anyone > know how to find
out what that speed is or just tell me some general approximate?
The speed would be device specific since it would be the maximum clocking speed of that device which you can find in the datasheet. However, that maximum speed is typically only applicable to a simple shift register, stick in any logic and the clock speed will drop.

Kevin Jennings
 
On Monday, July 29, 2013 8:01:28 PM UTC-7, ske...@gmail.com wrote:
But I wonder if at higher speeds this sort of coding is required:
if(clk'event and clk='1')then
D <= C;
B <= A;
end if
if(clk'event and clk='0')then
C <= B;
A <= input;
end if;
This is overall not a very good idea. Even with 50% duty cycle clocks, say path A.Q to B.D has T/2 time so you are cutting the time available for B to register by half. To make a design run faster you need to increase the source clock edge to destination clock time, not decrease as you are doing here.
Your options are add multicycle paths or useful skew to increase the time available between clock edges. The former is difficult to constrain and the latter is strictly a physical design solution which doesn't apply to FPGAs.
 
On Tuesday, July 30, 2013 5:01:28 AM UTC+2, ske...@gmail.com wrote:
Hi everyone. I'm trying to find out if, at high speeds, it is necessary to clock every other register using every other clock transition. For instance, clocking every other register in a shift register using the positive clock transition and the rest use the negative clock transition. This VHDL may help explain:



I know this works at lower speeds:



if(clk'event and clk='1')then

D <= C;

C <= B;

B <= A;

A <= input;

end if;



But I wonder if at higher speeds this sort of coding is required:



if(clk'event and clk='1')then

D <= C;

B <= A;

end if

if(clk'event and clk='0')then

C <= B;

A <= input;

end if;



That way, in my second example, "B" for instance captures it's data in the middle of "A's" data eye. Is this coding style required above some speed? If so, does anyone know how to find out what that speed is or just tell me some general approximate?

no you shouldn't do that it doesn't gain anything, if anything it'll make things slower.

in you example using both edges the output of A only has half as much time to get to input of B so it will only be able to run half as fast

FPGAs are generally designed so the clock arrives at all FFs at the same time
so all you need to check is if the path takes less time than a clock cycle minus setup time

and the hold time on FFs are zero or less so that paths can be arbitrarily fast

-Lasse
 
and the hold time on FFs are zero or less so that paths can be arbitrarily fast

-Lasse
This I did not know. I feel like it should have come to me before now. Oh well. Thanks everyone!
 
On Tuesday, July 30, 2013 6:25:03 PM UTC-5, lang...@fonz.dk wrote:
and the hold time on FFs are zero or less so that paths can be
arbitrarily fast
Zero (or less) hold time on FFs is not true for all FPGAs. It also does not account for finite skew between clock arrival at different FFs, even using a "global clock net".

As to the original problem that the alternate-edge clocking scheme is presumably trying to solve, there is a one-clock-cycle delay between A & C. But there are 2 t_setup and 2 t_clk2out times, since you are using an additional register on the opposite clock edge between A and C. You would be better off to perform the operations for B and C combinatorially in series between A & C, without trying to use a register for B in between them.

Andy
 
On Wednesday, July 31, 2013 2:39:03 PM UTC+2, jone...@comcast.net wrote:
On Tuesday, July 30, 2013 6:25:03 PM UTC-5, lang...@fonz.dk wrote:

and the hold time on FFs are zero or less so that paths can be

arbitrarily fast



Zero (or less) hold time on FFs is not true for all FPGAs. It also does not account for finite skew between clock arrival at different FFs, even using a "global clock net".
then the tools better hide that if you want to use it for anything

if you don't have zero hold you can't tell if the path between two FFs might happen to be less that the required hold and if you could what would you do? insert some dummy logic to add some delay?

and if you can't assume the the skew is effectively zero how are you going to do a synchronous design?

-Lasse
 
langwadt@fonz.dk wrote:

(snip, someone wrote)
Zero (or less) hold time on FFs is not true for all FPGAs.
It also does not account for finite skew between clock
arrival at different FFs, even using a "global clock net".

then the tools better hide that if you want to use it for anything

if you don't have zero hold you can't tell if the path between
two FFs might happen to be less that the required hold and if you
could what would you do? insert some dummy logic to add some delay?
The FF's don't have to have zero hold time, they just have to
have a hold time less than the shortest route between a previous FF.

I remember in the TTL days, with zero hold time one could wire
from one output pin to an input, such as Qbar to D. That was
guaranteed to work.

In the case of FGPAs, though, you have the FPGA routine fabric
to go through. There will be a minimum length route.

and if you can't assume the the skew is effectively zero how
are you going to do a synchronous design?
Well, again, if the clock skew plus hold time is less than the
minimum length route, you won't notice it.

For some FPGA families and tools, one can hand route at least
some signals. If there was a possible route faster than skew
plus hold, the data sheet should tell you about it.

-- glen
 
On Wednesday, July 31, 2013 9:22:53 PM UTC+2, glen herrmannsfeldt wrote:
langwadt@fonz.dk wrote:



(snip, someone wrote)

Zero (or less) hold time on FFs is not true for all FPGAs.

It also does not account for finite skew between clock

arrival at different FFs, even using a "global clock net".



then the tools better hide that if you want to use it for anything



if you don't have zero hold you can't tell if the path between

two FFs might happen to be less that the required hold and if you

could what would you do? insert some dummy logic to add some delay?



The FF's don't have to have zero hold time, they just have to

have a hold time less than the shortest route between a previous FF.



I remember in the TTL days, with zero hold time one could wire

from one output pin to an input, such as Qbar to D. That was

guaranteed to work.



In the case of FGPAs, though, you have the FPGA routine fabric

to go through. There will be a minimum length route.



and if you can't assume the the skew is effectively zero how

are you going to do a synchronous design?



Well, again, if the clock skew plus hold time is less than the

minimum length route, you won't notice it.



For some FPGA families and tools, one can hand route at least

some signals. If there was a possible route faster than skew

plus hold, the data sheet should tell you about it.



-- glen

what I mean isn't that hold and skew should literally have
to be zero but it should be so that you can design as if it
was and it is guaranteed to work


-Lasse
 
On Wednesday, July 31, 2013 9:22:53 PM UTC+2, glen herrmannsfeldt wrote:
langwadt@fonz.dk wrote:

(snip, someone wrote)

Zero (or less) hold time on FFs is not true for all FPGAs.
It also does not account for finite skew between clock
arrival at different FFs, even using a "global clock net".

then the tools better hide that if you want to use it for anything
if you don't have zero hold you can't tell if the path between
two FFs might happen to be less that the required hold and if you
could what would you do? insert some dummy logic to add some delay?

The FF's don't have to have zero hold time, they just have to
have a hold time less than the shortest route between a previous FF.

I remember in the TTL days, with zero hold time one could wire
from one output pin to an input, such as Qbar to D. That was
guaranteed to work.

In the case of FGPAs, though, you have the FPGA routine fabric
to go through. There will be a minimum length route.

and if you can't assume the the skew is effectively zero how
are you going to do a synchronous design?

Well, again, if the clock skew plus hold time is less than the
minimum length route, you won't notice it.

For some FPGA families and tools, one can hand route at least
some signals. If there was a possible route faster than skew
plus hold, the data sheet should tell you about it.

-- glen


what I mean isn't that hold and skew should literally have
to be zero but it should be so that you can design as if it
was and it is guaranteed to work


-Lasse
What the OP should do is a trial fully-synchronous design, run it all th
way through the tools, and see whether the Static Timing Analysis show
that it is "fast enough". If not, start adding pipelining stages in th
areas that are causing a problem.

The major vendors' toolsets are all quite good at optimising for speed i
fully-synchronous datapath designs (although I have had various problem
with Virtex-5 parts in the past).


---------------------------------------
Posted through http://www.FPGARelated.com
 
On Wednesday, July 31, 2013 3:59:49 PM UTC-5, lang...@fonz.dk wrote:
what I mean isn't that hold and skew should literally have to be zero but it should be so that you can design as if it was and it is guaranteed to work -Lasse
In the case of Microsemi PA3E devices, their place & route tool works to solve any hold times for you, assuming you enable that setting. I have seen several hold time violations with that setting disabled, but not with the setting enabled.

Andy
 
On Thursday, August 1, 2013 9:11:43 PM UTC+2, jone...@comcast.net wrote:
On Wednesday, July 31, 2013 3:59:49 PM UTC-5, lang...@fonz.dk wrote:

what I mean isn't that hold and skew should literally have to be zero but it should be so that you can design as if it was and it is guaranteed to work -Lasse



In the case of Microsemi PA3E devices, their place & route tool works to solve any hold times for you, assuming you enable that setting. I have seen several hold time violations with that setting disabled, but not with the setting enabled.


Andy
how does that work? I mean if you don't enable that option and get a hold violation what can you do? can't just start adding random logic hoping it
fixes the problem


-Lasse
 
In article <26bc041f-8f5c-49af-afc6-0c47c124e5f9@googlegroups.com>,
<langwadt@fonz.dk> wrote:
On Thursday, August 1, 2013 9:11:43 PM UTC+2, jone...@comcast.net wrote:
On Wednesday, July 31, 2013 3:59:49 PM UTC-5, lang...@fonz.dk wrote:

what I mean isn't that hold and skew should literally have to bei
zero but it should be so that
you can design as if it was and it is guaranteed to work -Lasse


In the case of Microsemi PA3E devices, their place & route tool works
to solve any hold times for you, assuming you enable that setting.
I have seen several hold time violations with that
setting disabled, but not with the setting enabled.

how does that work? I mean if you don't enable that option and
get a hold violation what can you do? can't just start adding
random logic hoping it fixes the problem
The tool solves hold times buy just adding delay to the datapath.
Xilinx tools fix hold times too. Or am I misunderstanding your
question? It's on by default - don't even know if you can turn
it off.

Regards,

Mark
 
Or by adjusting placement to create appropriate skew in the clock arrival times at the source and destination registers.

I'm not sure why they allow the option to be disabled...

Andy
 
Your second example will run at half the clock speed of the first example because there only is half a clock from output to input.
The data gets through two flip-flops per clock so both examples get about the same throughput as a first estimate.
If you look at it in more detail the second example will be slower because clock duty cycle uncertainty now eats into your timing budget.

Also: Use rising_edge(clk) and falling_edge(clk) for safer simulation and better readability compared to clk'event...

Have fun,

Kolja

www.cronologic.de
 

Welcome to EDABoard.com

Sponsor

Back
Top