More synthesis myths?

T

Tricky

Guest
I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
begin
if en = '1' then
if rising_edge(clk) then
d <= b;
end if;
end if;
end process;

is better than the "normal" way

process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;

because the 2nd can produce latches where the clock is gated with
enable? has this ever been the case? running either through quartus
produces the same (expected) thing - a d-type with enable.

Are there some other legends out there that still influence design
today? were they really a problem, have they actually been fixed?
 
The normal way is the only way I code.
process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;
Historically some ASIC tools have/had a switch that would
allow the enable (en) to be transformed into a clock gate -
to be used for low power applications. Has anyone seen a
synthesis tool that will do this transformation without a
setting? I would consider this to be an error.

WRT to the other coding template. That coding style was not
in the 1076.6-1999 RTL coding styles, but is in the 1076.6-2004
RTL coding styles. I would still be concerned that there may be
some tools (such as ASIC synthesis tools) that do not support it.
Furthermore, since the code is logically the same, I would be
concerned that any misbehaving with the "normal" coding template
would also occur with this coding template.

Cheers,
Jim
SynthWorks
 
A third style is:

process (clk) is
begin
if rising_edge(clk) and en = '1' then
c <= a;
end if;
end process;

Which produces a clock enable and is the behavioral equivalent of
either style. The inclusion of en in the sensitivity list accomplishes
absolutely nothing, since nothing happens unless there was rising edge
event on clk. I sure hope we don't get back into the old days when the
order of nested if-then statements indicated priority from an
implementation/timing POV.

Note it does not say "rising_edge(clk and en)", if that were even
pemissible with rising_edge(), which would directly imply a gated
clock. There are some FPGA synthesis tools that will convert clock
enables into gated clocks, but only on devices that have "enabled
clock buffers" that are "safe". But you still have to set an option
for it to do that.

The "other" form with clock and enable in the sensitivity list could
also be drain on simulation performance with large systems, since such
processes cannot be merged with others that are either not clock-
enabled, enabled in another way, and/or enabled by other signals.

Andy
 
Tricky <Trickyhead@gmail.com> wrote:

I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
begin
if en = '1' then
if rising_edge(clk) then
d <= b;
end if;
end if;
end process;

is better than the "normal" way

process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;

because the 2nd can produce latches where the clock is gated with
Looking at that, I'd guess that you meant 1st not 2nd.
 
On Mar 31, 5:34 pm, Tricky <Trickyh...@gmail.com> wrote:
I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
  begin
    if en = '1' then
      if rising_edge(clk) then
        d <= b;
      end if;
    end if;
  end process;

is better than the "normal" way

process(clk)
  begin
    if rising_edge(clk) then
      if en = '1' then
        c <= a;
      end if;
    end if;
  end process;

because the 2nd can produce latches where the clock is gated with
enable? has this ever been the case? running either through quartus
produces the same (expected) thing - a d-type with enable.

Are there some other legends out there that still influence design
today? were they really a problem, have they actually been fixed?
The second example is the way to write it if you want a regular DFF
without asynchronous reset and with a clock enable. Usually the clock
enable is synthesised with a feedback mux. However, today most tools
have the possibility to do this with a clock gate latch instead (ie
the clock to the DFF is gated when en = '0' and the old value is
kept). I know that some tools do this by default (the FPGA tool we use
does this, and we usually turn it off to improve timing). However for
ASIC synthesis the automatic clockgating is disabled by default. We
work in a low-power process, then automatic clock gating is a simple
and safe way to save power (for a minor penalty in timing) so we use
it.

The first version you gave I'm less certain about, it doesn't match
any of the default DFF or DLAT patterns i've seen. But I guess since
the en signal is in the sensitivity list and is before the clock it
can be considered an asynchronous signal, so synthesis tools would not
try to do clock gate insertion on this since the clock gating has to
be synchonous with the clock.
 
Tricky wrote:

I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
begin
if en = '1' then
if rising_edge(clk) then
d <= b;
end if;
end if;
end process;

is better than the "normal" way

process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;

because the 2nd can produce latches where the clock is gated with
enable? has this ever been the case? running either through quartus
produces the same (expected) thing - a d-type with enable.
Why should 'en' be included in the sensitivity list in the first
template? It just does not make sense to me. Or does this fall under
the "or thereabouts"?

--
Paul
 
Marc Guardiani wrote:

Paul wrote:
Tricky wrote:

I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
begin
if en = '1' then
if rising_edge(clk) then
d <= b;
end if;
end if;
end process;

is better than the "normal" way

process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;

because the 2nd can produce latches where the clock is gated with
enable? has this ever been the case? running either through
quartus produces the same (expected) thing - a d-type with enable.

Why should 'en' be included in the sensitivity list in the first
template? It just does not make sense to me. Or does this fall
under the "or thereabouts"?

As I understand processes, 'en' is in the sensitivity list because
you want the process to "run", so to speak, whenever it changes.
Yes, that's correct. But why on earth would you like to run the
process when 'en' changes? Functionally it just adds nothing (as KJ
rightfully explained). The only thing that is added is obfuscation.

For a pure synchronous process, my favorite template is:

process is
begin
wait until clk = '1'; -- or: wait until rising_edge(clk);

if en = '1' then
q <= d;
end if;
end process;

Major advantage (IMHO): at the first glance you see this is a
synchronous process. Now doubt possible. Also no long winded
if/end-if needed, with an additional indentation level.

--
Paul Uiterlinden
 
Andy wrote:
process (clk) is
variable count: natural range 0 to 2**n-1;
begin
if rising_edge(clk) then
count := (count - 1) mod 2**n;
end if;
output <= count = 2; -- combinatorial decode
end if;
Personally I would avoid these constructions:
1. You generate FFs from variables, which are often hard to find due to name
changing during synthesis.
2. Your logic path consists of FF, - operator, comparator; from a timing
perspective it's better to use the comparator directly on the FF output, and
adjust the expected value accordingly.
3. Your output is not a FF, which may also create timing problems.

Kind regards,

Pieter Hulshoff
 
Pieter Hulshoff wrote:

Personally I would avoid these constructions:
1. You generate FFs from variables, which are often hard to find due to name
changing during synthesis.
In modelsim, I use an 'add wave' command for each process
to make the variables visible. Quartus uses the variable
names directly, when they represent flops rather than wires.

2. Your logic path consists of FF, - operator, comparator; from a timing
perspective it's better to use the comparator directly on the FF output, and
adjust the expected value accordingly.
In a simple example like this, you have a point.
In my processes, I may have 30 variables, and
these are mostly internal registers.

3. Your output is not a FF, which may also create timing problems.
The output is indeed a flip flop.
The signal assignment represents
the wire from Q to the output port.
Try it and see.

-- Mike Treseler
 
On Apr 5, 3:47 pm, Paul <pa...@sx4all.nl> wrote:
For a pure synchronous process, my favorite template is:

  process is
  begin
    wait until clk = '1'; -- or: wait until rising_edge(clk);

    if en = '1' then
      q <= d;
    end if;
  end process;

Major advantage (IMHO): at the first glance you see this is a
synchronous process. Now doubt possible. Also no long winded
if/end-if needed, with an additional indentation level.
Just to add more options to the mix (If you don't like additional
levels of if-then statements or indentation):

(I've not tried this, so I don't know if any synthesis tools will "get
it right" or not)

process is
begin
wait until rising_edge(clk) and en = '1';
q <= d;
end process;

Or it's concurrent behavioral equivalent:

q <= d when rising_edge(clk) and en = '1';

About the sensitivity list issue: some simulators use an optimization
whereby multiple processes that share the same sensitivity list are
merged into one process in order to save setup/teardown overhead
associated with multiple processes. Adding an enable to the
sensitivity list would defeat this optimization in most cases. The
same is true for the concurrent statement's implied sensitivity list.

There is also a process template that uses variables for storage, with
assignments to signals after the end of the clocked if-then statement
to infer combinatorial logic outputs (no combo in->out paths). It may
not be recognized by all synthesis tools, but at least Quartus,
Synplify and Precision handle it.

An example would be (ignoring reset):

process (clk) is
variable count: natural range 0 to 2**n-1;
begin
if rising_edge(clk) then
count := (count - 1) mod 2**n;
end if;
output <= count = 2; -- combinatorial decode
end if;

I'm not sure how you would/could do this with a wait statement.

Andy
 
Andy wrote:

output <= count = 2; -- combinatorial decode
Pieter Hulshoff wrote:
3. Your output is not a FF, which may also create timing problems.
Sorry, I read what I expected,
"output <= count;"
not what he wrote.

I agree with you, that with rare exceptions,
process outputs should be registers.

-- Mike
 
Mike,

2. Your logic path consists of FF, - operator, comparator; from a timing
perspective it's better to use the comparator directly on the FF
output, and
adjust the expected value accordingly.

In a simple example like this, you have a point.
In my processes, I may have 30 variables, and
these are mostly internal registers.
This second point had more to do with the logic generated by the compiler than
with the use of variables. Take for example:

WAIT UNTIL clk = '1';
counter := counter + 1;
IF counter = 5 THEN
counter := 0;
END IF;

vs

WAIT UNTIL clk = '1';
IF counter = 4 THEN
counter := 0;
END IF;
counter := counter + 1;

or

WAIT UNTIL clk = '1';
counter <= counter + 1;
IF counter = 4 THEN
counter <= 0;
END IF;

The last 2 examples will usually synthesize into faster logic than the 1st,
since the first assumes a + followed by a compare while the last two do the
compare directly on the FF output.

Kind regards,

Pieter Hulshoff
 
On Apr 6, 9:45 am, Pieter Hulshoff <phuls...@xs4all.nl> wrote:
Andy wrote:
process (clk) is
variable count: natural range 0 to 2**n-1;
begin
  if rising_edge(clk) then
    count := (count - 1) mod 2**n;
  end if;
  output <= count = 2; -- combinatorial decode
end if;

Personally I would avoid these constructions:
1. You generate FFs from variables, which are often hard to find due to name
changing during synthesis.
2. Your logic path consists of FF, - operator, comparator; from a timing
perspective it's better to use the comparator directly on the FF output, and
adjust the expected value accordingly.
3. Your output is not a FF, which may also create timing problems.

Kind regards,

Pieter Hulshoff
#1: I've never had problems finding variable-inferred register names.
The hierarchical naming works the same for signals or variables,
there's just an additional level of hierarchy for the process with
variables. Use descriptive process names and you won't have any
problems.

#2: I think you misunderstood what happens with signal assignments
from variables. For instance, the initical example I gave, and this
one, are cycle-accurately identical to each other WRT the output
signal:

process (clk) is
variable count: natural range 0 to 2**n-1;
begin
if rising_edge(clk) then
count := (count - 1) mod 2**n;
output <= count = 2; -- registered decode of combo count
end if;
end if;

The difference is where the register is implemented. In the initial
example, the register is after the decrement, splitting the decrement
and comparison. In this example, the register is after both the
decrement and the comparison, and is a separate register. The cycle
based timing for output in both is identical. Depending on where the
output is needed, the advantage generally lies with the former.
Naturally this is a trivial example which could easily be re-coded
behaviorally to compensate for an additional clock delay from a
registered outputs, but that is not the point. Re-coding for such
compensation often obfuscates the overall behavior that is intended.

When I specify two output signals, using the same expression, but one
within and one after the clocked clause, Synplify will recognize they
are functionally identical, and optimize the combinatorial output
version away. However, I've never seen it convert the combinatorial
output to a registered output unless such duplication was being
adressed, or register retiming was invoked.

Both simulate the same (WRT cycle-based timing on output), both behave
the same after synthesis.

#3: These examples are not intended as a verdict on the
appropriateness for all applications of combinatorial outputs from
synchronous processes, but rather an example of how to generate one
without introducing an additional process (implied or explicit).

I use signals only for inter-process communication. My processes tend
to be large and complex to minimize both the number of processes and
the signal-based communication between them, both of which contribute
to simulation efficiency. All intra-process communication uses
variables, whether the behavior implies a register or not. I prefer
not to focus on the explicit location of registers, but on the cyclic
behavior of the process, which is easier to read and debug from a
truly sequential description of variables than a pseudo-sequential
description of signals. Register re-timing optimizations change the
register/logic locations anyway, and usually do it better than I can
afford to. Just make sure you disable such optimizations (as well as
register replication, etc.) around synchronization boundaries (and
don't ask me how I know that!).

Andy
 
Pieter Hulshoff wrote:

The last 2 examples will usually synthesize into faster logic than the 1st,
since the first assumes a + followed by a compare while the last two do the
compare directly on the FF output.
I can't benchmark synthesis
until there is an entity and port assignments.

A real design also needs a reset strategy.

Synthesis sometimes creates duplicate registers
at the front end that are taken out
during mapping.

Because of these complications, I stick with
well-tested "known good" synchronous template
for my designs.

-- Mike Treseler
 
This second point had more to do with the logic generated by the compiler than
with the use of variables. Take for example:

WAIT UNTIL clk = '1';
counter := counter + 1;
IF counter = 5 THEN
  counter := 0;
END IF;

vs

WAIT UNTIL clk = '1';
IF counter = 4 THEN
  counter := 0;
END IF;
counter := counter + 1;

or

WAIT UNTIL clk = '1';
counter <= counter + 1;
IF counter = 4 THEN
  counter <= 0;
END IF;

The last 2 examples will usually synthesize into faster logic than the 1st,
since the first assumes a + followed by a compare while the last two do the
compare directly on the FF output.

Interesting example. While I agree with your conclusions above, there
is
another contributing factor with the last 2 examples. With an
incrementer
and a smart synthesis tool, the condition "counter = 4" is the same
as
(converting to unsigned for notation only) "counter(2) = '1'"

Cheers,
Jim
 
"JimLewis" <jim@synthworks.com> wrote in message
news:7082d731-d77d-406d-b347-2774cd918d83@y33g2000prg.googlegroups.com...

another contributing factor with the last 2 examples. With an
incrementer
and a smart synthesis tool, the condition "counter = 4" is the same
as
(converting to unsigned for notation only) "counter(2) = '1'"
Which is why it is usually better to code it as "counter >= 4". Then you
don't need to have as smart of a synthesis tool in order to reach the
conclusion that only bit 2 of the counter is needed.

KJ
 
I've always felt "safer" with '>=' or '<=' comparisons rather than '='
on counters, especially when dealing with non-modulo-2^n counters.

However, it should be noted that the three examples given do not
behave identically. Examples 1 and 3 count from 0 to 4 and repeat.
Example 2 counts from 1 to 4 and repeats!

Small, fast and wrong is still just wrong.

Andy
 
I've always felt "safer" with '>=' or '<=' comparisons rather than '='
on counters, especially when dealing with non-modulo-2^n counters.

However, it should be noted that the three examples given do not
behave identically. Examples 1 and 3 count from 0 to 4 and repeat.
Example 2 counts from 1 to 4 and repeats!

Small, fast and wrong is still just wrong.
Except in special cases, I almost always load with a value (or
the value from a base register) and down count to zero and
detect zero by watching the carry bit.

Special cases would be like incrementing to a number with a
sparse number of 1's (like 4) as the only thing you need to
check are the 1's.

Cheers,
Jim
 
On Apr 15, 4:11 am, "KJ" <kkjenni...@sbcglobal.net> wrote:
"JimLewis" <j...@synthworks.com> wrote in message

news:7082d731-d77d-406d-b347-2774cd918d83@y33g2000prg.googlegroups.com...

another contributing factor with the last 2 examples.  With an
incrementer
and a smart synthesis tool, the condition  "counter = 4" is the same
as
(converting to unsigned for notation only) "counter(2) = '1'"

Which is why it is usually better to code it as "counter >= 4".  Then you
don't need to have as smart of a synthesis tool in order to reach the
conclusion that only bit 2 of the counter is needed.

KJ
KJ,
I don't think I agree. While >= 4 seems to produce similar results
to the bit comparison, what about >= 5? If I did my kmaps right -
yikes this is digging back some, decoding the general sense of >= 5
requires:
((Count(2) and Count(0)) or (Count(2) and Count(1))

OTOH, if I count up to 5 and think "=" rather than ">=", due to
properties of counters, I can decode just the bits that are 1
and the resulting logic is:
(Count(2) and Count(0)) ='1'.

For =5 or >=5 to do as good as decoding bits, you need a smart
compiler. OTOH, in a LUT based design, will I notice the
difference of 1 LUT pin? Probably not - unless I have alot
of counters.

Cheers,
Jim
 
On Apr 15, 8:16 pm, JimLewis <j...@synthworks.com> wrote:
On Apr 15, 4:11 am, "KJ" <kkjenni...@sbcglobal.net> wrote:

"JimLewis" <j...@synthworks.com> wrote in message

news:7082d731-d77d-406d-b347-2774cd918d83@y33g2000prg.googlegroups.com....

another contributing factor with the last 2 examples.  With an
incrementer
and a smart synthesis tool, the condition  "counter = 4" is the same
as
(converting to unsigned for notation only) "counter(2) = '1'"

Which is why it is usually better to code it as "counter >= 4".  Then you
don't need to have as smart of a synthesis tool in order to reach the
conclusion that only bit 2 of the counter is needed.

KJ

KJ,
I don't think I agree.  While >= 4 seems to produce similar results
to the bit comparison, what about >= 5?  If I did my kmaps right -
yikes this is digging back some, decoding the general sense of >= 5
requires:
((Count(2) and Count(0)) or (Count(2) and Count(1))

OTOH, if I count up to 5 and think "=" rather than ">=", due to
properties of counters, I can decode just the bits that are 1
and the resulting logic is:
(Count(2) and Count(0)) ='1'.

For =5 or >=5 to do as good as decoding bits, you need a smart
compiler.  OTOH, in a LUT based design, will I notice the
difference of 1 LUT pin?  Probably not - unless I have alot
of counters.

Cheers,
Jim
I think I agree with KJ, but for different reasons. In many of my
designs, a counter typically indicates how long to remain in one state
of a FSM, or else is used to loop through a shortened (not a full
power of two) sequence, like pixel or line count in video. I've always
felt that decoding (n >= k) instead of (n=k) gives me more of a safety
net to get back more quickly to the restart state or get back into the
main loop in case I reach an unreachable state (defined for this
purpose as n > k). This might happen because I messed up some obscure
corner case in my multiple-interconnected-FSM control logic, in which
I freely admit I'm at fault. But it's been my experience that getting
your chassis hit with 20 kV from an ESD gun during product compliance
testing can do unusual things to your flops, which you should still
recover from ASAP. It seems to me that on the balance of probability,
if I include a "free" check (in the source code sense of free) for
unreachable states (e.g. NTSC_LINE_NUMBER >= 525 in preference to
NTSC_LINE_NUMBER=525) then I have a better chance of not getting stuck
forever when I go into the weeds.

There are a lot of cases where writing stuff in the source (like a
redundant 'when others' case in an otherwise fully covered case stmt
decoding an enumerated state type) has zero semantic meaning in VHDL.
In these cases, one can argue back and forth, and inconclusively, that
a tool ought to or ought not to take extra steps to take the hint to
cover unreachable states, but there's no clear, LRM-traceable
justification for this. But comparing against a counter in a power-of-
two modulus bit vector seems to give a pretty clear mandate to the
synthesizer.

Clearly this design style trick is nowhere near a rigorous proof of
recovery (not like a proper CTL model checking run, by any means), but
it helps. My next statement will probably offend the hard-core gate
bangers, but here goes: I'm too old to care about optimizing the last
p-term out of one comparison -- I tend to be more concerned with
correctness, recoverability and reliability. I'd much rather it never
locks up and recovers quickly when I fuzz test it.

- Kenn
 

Welcome to EDABoard.com

Sponsor

Back
Top