Non-Blocking versus blocking

On May 23, 7:38 pm, Patrick Maupin <pmau...@gmail.com> wrote:

Certainly, one of Cummings's major objectives is to insure that
simulation and synthesis results match, and there is no requirement
for that in a testbench.  But in that vein, if I understand your
guideline about blocking assignments correctly, I don't want *that*
restriction for a testbench at all.
The guideline of not using blocking assignments for communication is
intended to guarantee, by construction, that simulation results of
exactly the same testbench, written in standard Verilog, match among
themselves: between different runs with other command line options,
simulator revisions and of course different simulators. Wouldn't you
agree that this goal makes at least as much sense?

When a testbench is generating a completely independent signal for
simulation purposes (for an async input), non-blocking assignments may
not be required at all.
Independent doesn't mean "will never happen simultanuously". Depending
on the case, the parameters, independent clock period values,
randomized delay values, it may happen. With a blocking assignment,
the race potential is there, but it may be rare and obscure and
therefore hard to debug.

I grant to you that if this is a one-shot signal, it's unlikely that
this will cause a problem. But the real danger I see is this. You keep
the guideline not to mix blocking and nonblocking assignments in the
same always block, but apparently you relax the guideline to restrict
blocking assignments to purely combinatorial blocks. But blocking
assignments are much more expressive than nonblocking ones. Therefore,
the temptation to start using them in ways that are truly unsafe must
be almost irresistible. After all, everything will probably seem to
work fine at first. No guideline of yours prevents this; you have to
rely on the competence and discipline of each individual testbench
engineer.

When a testbench is manipulating dependent signals, we code that like
the RTL -- the blocking and nonblocking assignments are in *different*
always blocks.  Honestly, this is not a terrible burden.
As you know, I don't think highly of this coding style even for
synthesis. But with synthesis there is at least the typical argument
that such code is "closer to hardware". Obviously for testbenches this
doesn't count: the only thing that matters is expressing functionality
in the clearest way. So I wonder where the idea to use this coding
style for testbenches also comes from.

If I understand your guideline correctly, it says any varying inputs
to my DUT need to be created from nonblocking assignments inside the
testbench.  I certainly don't need or want that guideline.
Let me start with apologies to Janick Bergeron, whose book "Writing
Testbenches" (2000 edition) I have been rereading on this occasion.

It's not my guideline, but his. I now realize that I read it a long
time ago, and then forgot about its origin. Obviously it made so much
sense that I started thinking it was my own idea :) (To my credit, I
added the synthesis component, pointing out that it's the only
guideline you need in that case also.)

In my view the coding style you describe advocates the use of blocking
assignments in exactly the wrong way:
- a lax attitude towards blocking assignments for communication,
resulting in a real danger of non-deterministic, non-portable Verilog
test benches that may fail in mysterious ways
- severe restrictions on using blocking assignments locally, making it
cumbersome to use plain old variable semantics even if such usage
would be totally safe

Jan
 
On May 24, 10:33 am, Jan Decaluwe <j...@jandecaluwe.com> wrote:
On May 23, 7:38 pm, Patrick Maupin <pmau...@gmail.com> wrote:

Certainly, one of Cummings's major objectives is to insure that
simulation and synthesis results match, and there is no requirement
for that in a testbench.  But in that vein, if I understand your
guideline about blocking assignments correctly, I don't want *that*
restriction for a testbench at all.

The guideline of not using blocking assignments for communication is
intended to guarantee, by construction, that simulation results of
exactly the same testbench, written in standard Verilog, match among
themselves: between different runs with other command line options,
simulator revisions and of course different simulators. Wouldn't you
agree that this goal makes at least as much sense?
Nonblocking assignments will certainly guarantee this for signals
running off the same clock. If a signal is generated from a clock,
it's certainly a great idea, but for signals generated from a clock, I
already told you that we already use nonblocking assignments.

When a testbench is generating a completely independent signal for
simulation purposes (for an async input), non-blocking assignments may
not be required at all.

Independent doesn't mean "will never happen simultanuously". Depending
on the case, the parameters, independent clock period values,
randomized delay values, it may happen. With a blocking assignment,
the race potential is there, but it may be rare and obscure and
therefore hard to debug.
"Independent" == "Not synchronous" == "Synchronizer required inside
the DUT" == "Lots of testing to insure that clock cross happens OK" ="give me whatever race potentials you've got." Seriously.

I grant to you that if this is a one-shot signal, it's unlikely that
this will cause a problem.
Not a one-shot signal. An *independent* signal, that has to be
synchronized by the RTL.

But the real danger I see is this. You keep
the guideline not to mix blocking and nonblocking assignments in the
same always block, but apparently you relax the guideline to restrict
blocking assignments to purely combinatorial blocks.
Yes, for stimulus.

But blocking
assignments are much more expressive than nonblocking ones. Therefore,
the temptation to start using them in ways that are truly unsafe must
be almost irresistible.
You say that like I work with little children or something.

After all, everything will probably seem to
work fine at first. No guideline of yours prevents this; you have to
rely on the competence and discipline of each individual testbench
engineer.
The guidelines *do* prevent this for a correct definition of
"independent". And trust me, an incompetent test engineer can screw
up no matter how many guidelines you give him.

When a testbench is manipulating dependent signals, we code that like
the RTL -- the blocking and nonblocking assignments are in *different*
always blocks.  Honestly, this is not a terrible burden.

As you know, I don't think highly of this coding style even for
synthesis. But with synthesis there is at least the typical argument
that such code is "closer to hardware". Obviously for testbenches this
doesn't count: the only thing that matters is expressing functionality
in the clearest way. So I wonder where the idea to use this coding
style for testbenches also comes from.
You spent a lot of time saying "here, use this one rule on your
testbenches and your RTL; then they're the same." Then when I say our
testbenches and code mostly *are* the same, you act like that's
silly. Bah!

If I understand your guideline correctly, it says any varying inputs
to my DUT need to be created from nonblocking assignments inside the
testbench.  I certainly don't need or want that guideline.

Let me start with apologies to Janick Bergeron, whose book "Writing
Testbenches" (2000 edition) I have been rereading on this occasion.

It's not my guideline, but his. I now realize that I read it a long
time ago, and then forgot about its origin. Obviously it made so much
sense that I started thinking it was my own idea :) (To my credit, I
added the synthesis component, pointing out that it's the only
guideline you need in that case also.)

In my view the coding style you describe advocates the use of blocking
assignments in exactly the wrong way:
- a lax attitude towards blocking assignments for communication,
No, a clear understanding of the distinction between independent
stimulus and other stuff. Look, in most cases, if I'm generating
independent stimulus for a port, I don't even need to explicitly
generate a clock for it in the testbench. So, if something's
unclocked, and a random sweep is made to make sure that source
frequency and jitter variations aren't problems, how *exactly* does a
nonblocking assignment help me?

resulting in a real danger of non-deterministic, non-portable Verilog
test benches that may fail in mysterious ways
But they're not and they don't. (We just migrated a bunch of chips
from Mentor to Cadence last year FWIW.)

- severe restrictions on using blocking assignments locally, making it
cumbersome to use plain old variable semantics even if such usage
would be totally safe
You say "severe restriction" but that's simply not true. It's just a
different coding style which makes it very easy to reason about how
things work.

Regards,
Pat
 
On Fri, 21 May 2010 01:13:21 -0700 (PDT), nemo wrote:

  A = #5 EXPR;  // (1) Blocking
  A <= #5 EXPR; // (2) Nonblocking

Line (1) first evaluates EXPR, then blocks execution for 5 time units,
then updates A, then moves on to executing the next statement.

Wow! I didn't know it worked like that. I don't think that is the
same in VHDL. I think the VHDL "after" delay is done without blocking
the execution of the sequential flow even for variables.
Sorry, I just noticed I left this thread dangling.

Unless something changed while I wasn't looking, you can't
do AFTER-style delayed assignment to VHDL variables at all.
It works only for signals. Jolly good thing too :)

One very interesting gnarly corner of all this is that
you cannot do _inertial_ delayed assignment to Verilog
variables. I think I pointed out that Verilog's NBA
A <= #5 EXPR;
is pretty close to VHDL's
A <= transport EXPR after 5 ns;

But what about a non-TRANSPORT delay in VHDL? That
inertial-type delay is available in Verilog [*] only
by using a continuous assign with a delay, or a wire
delay:

wire #3 W3; // net with 3 units delay
wire W5; // wire with no delay

assign #5 W5 = EXPR5; // *Driver* with 5-unit delay
// like VHDL concurrent W5 <= EXPR5 after 5 ns;

assign W3 = EXPR3; // no driver delay, just the net delay
// like VHDL concurrent W3 <= EXPR3 after 3 ns;

[*] Wait until Cary R. puts his head above the parapet to
explain all the other weird and wonderful timing machinery
you can use in Verilog thanks to primitives and specify blocks.
It's complicated and I almost never use it, so I won't even
try to explain - I'd get it badly wrong. Gate-level sim
people rely heavily on it.
--
Jonathan Bromley
 
On May 24, 7:32 pm, Patrick Maupin <pmau...@gmail.com> wrote:
On May 24, 10:33 am, Jan Decaluwe <j...@jandecaluwe.com> wrote:

On May 23, 7:38 pm, Patrick Maupin <pmau...@gmail.com> wrote:

Certainly, one of Cummings's major objectives is to insure that
simulation and synthesis results match, and there is no requirement
for that in a testbench.  But in that vein, if I understand your
guideline about blocking assignments correctly, I don't want *that*
restriction for a testbench at all.

The guideline of not using blocking assignments for communication is
intended to guarantee, by construction, that simulation results of
exactly the same testbench, written in standard Verilog, match among
themselves: between different runs with other command line options,
simulator revisions and of course different simulators. Wouldn't you
agree that this goal makes at least as much sense?

Nonblocking assignments will certainly guarantee this for signals
running off the same clock.  If a signal is generated from a clock,
it's certainly a great idea, but for signals generated from a clock, I
already told you that we already use nonblocking assignments.
It might help if you stopped thinking about clocks. Verilog doesn't
know about them: all it sees are events that may happen
simultanuously.

When a testbench is generating a completely independent signal for
simulation purposes (for an async input), non-blocking assignments may
not be required at all.

Independent doesn't mean "will never happen simultanuously". Depending
on the case, the parameters, independent clock period values,
randomized delay values, it may happen. With a blocking assignment,
the race potential is there, but it may be rare and obscure and
therefore hard to debug.

"Independent" == "Not synchronous" == "Synchronizer required inside
the DUT" == "Lots of testing to insure that clock cross happens OK" => "give me whatever race potentials you've got."  Seriously.
Thanks for pointing that out.

As a thought experiment, I will now construct a standards-compliant
Verilog simulation of your testbench that, given the same inputs,
produces different results. I will wait until a blocking assignment of
a new value happens in the same timestep as the sampling in the DUT.
Given your independent timing, and thanks to your extensive testing,
this situation will certainly happen. At that point, I will reverse
the order of the two events within the simulator engine. The sampled
value will now behave differently. Hence, I have proven that your
testbench is nondeterministic.

But the real danger I see is this. You keep
the guideline not to mix blocking and nonblocking assignments in the
same always block, but apparently you relax the guideline to restrict
blocking assignments to purely combinatorial blocks.

Yes, for stimulus.

But blocking
assignments are much more expressive than nonblocking ones. Therefore,
the temptation to start using them in ways that are truly unsafe must
be almost irresistible.

You say that like I work with little children or something.
The fact is that it has happened and now the poison is in your
testbenches.

After all, everything will probably seem to
work fine at first. No guideline of yours prevents this; you have to
rely on the competence and discipline of each individual testbench
engineer.

The guidelines *do* prevent this for a correct definition of
"independent".  
I have proven otherwise.

When a testbench is manipulating dependent signals, we code that like
the RTL -- the blocking and nonblocking assignments are in *different*
always blocks.  Honestly, this is not a terrible burden.

As you know, I don't think highly of this coding style even for
synthesis. But with synthesis there is at least the typical argument
that such code is "closer to hardware". Obviously for testbenches this
doesn't count: the only thing that matters is expressing functionality
in the clearest way. So I wonder where the idea to use this coding
style for testbenches also comes from.

You spent a lot of time saying "here, use this one rule on your
testbenches and your RTL; then they're the same."  Then when I say our
testbenches and code mostly *are* the same, you act like that's
silly.  Bah!
In another post, I have warned explicitly about the danger that
Cumming's rule for blocking assignments, while safe in his specific
case, may encourage unsafe coding practices. You asked for a real-life
example of this danger, and I wasn't able to come up with a very good
one readily.

It now turns out that your own case provides an excellent example of
what I mean. In the same pass, it validates my concern from somewhat
speculative to very real. My case against Cumming's guidelines has
therefore become much stronger.

If I understand your guideline correctly, it says any varying inputs
to my DUT need to be created from nonblocking assignments inside the
testbench.  I certainly don't need or want that guideline.

Let me start with apologies to Janick Bergeron, whose book "Writing
Testbenches" (2000 edition) I have been rereading on this occasion.

It's not my guideline, but his. I now realize that I read it a long
time ago, and then forgot about its origin. Obviously it made so much
sense that I started thinking it was my own idea :) (To my credit, I
added the synthesis component, pointing out that it's the only
guideline you need in that case also.)

In my view the coding style you describe advocates the use of blocking
assignments in exactly the wrong way:
- a lax attitude towards blocking assignments for communication,

No, a clear understanding of the distinction between independent
stimulus and other stuff.  Look, in most cases, if I'm generating
independent stimulus for a port, I don't even need to explicitly
generate a clock for it in the testbench.
For a clear understanding, I repeat my suggestion to stop thinking
about clocks.

 So, if something's
unclocked, and a random sweep is made to make sure that source
frequency and jitter variations aren't problems, how *exactly* does a
nonblocking assignment help me?
With nonblocking assignments, any standards compliant Verilog
simulator would, given the same inputs, always give you the same
results, guaranteed. Hence, your testbenches would now be
deterministic.

resulting in a real danger of non-deterministic, non-portable Verilog
test benches that may fail in mysterious ways

But they're not and they don't.  (We just migrated a bunch of chips
from Mentor to Cadence last year FWIW.)
I have proven that your test benches are nonderministic. You have
strong evidence that this nondeterminism is not revealed between
Mentor and Cadence, which happens to be in line with my expectations.

Jan
 
On May 26, 4:19 am, Jan Decaluwe <j...@jandecaluwe.com> wrote:

It might help if you stopped thinking about clocks. Verilog doesn't
know about them: all it sees are events that may happen
simultanuously.
Sorry, one of the reasons I like Verilog is that it doesn't try to get
in the way of thinking about the clocks (when that's what I need to be
thinking about).

"Independent" == "Not synchronous" == "Synchronizer required inside
the DUT" == "Lots of testing to insure that clock cross happens OK" => > "give me whatever race potentials you've got."  Seriously.

Thanks for pointing that out.

As a thought experiment, I will now construct a standards-compliant
Verilog simulation of your testbench that, given the same inputs,
produces different results. I will wait until a blocking assignment of
a new value happens in the same timestep as the sampling in the DUT.
Given your independent timing, and thanks to your extensive testing,
this situation will certainly happen. At that point, I will reverse
the order of the two events within the simulator engine. The sampled
value will now behave differently. Hence, I have proven that your
testbench is nondeterministic.
Yes, but you *completely missed* the point, EVEN THOUGH I CLEARLY
POINTED IT OUT, where *it doesn't matter at all*, because the
testbench is sweeping to prove in testing that it doesn't matter which
side of an internal clock an external transition happens on. Enough
data is taken that, at the margin, where a single sample could have
been in one timeslot or the next, IT REALLY DOESN'T MATTER.

The fact is that it has happened and now the poison is in your
testbenches.
There is no poison. The testbenches work. The chips work. The fact
that one simulator might order a few events differently than another
is of no consequence.

The guidelines *do* prevent this for a correct definition of
"independent".  

I have proven otherwise.
No, you have proven (which I already knew) that Verilog won't
guarantee whether signal A or B happens first, which is a problem with
a fragile DUT where precise ordering of one signal vs. another is
important. You took the time to prove this AFTER I went out of my way
to explain that we don't do that.

In another post, I have warned explicitly about the danger that
Cumming's rule for blocking assignments, while safe in his specific
case, may encourage unsafe coding practices. You asked for a real-life
example of this danger, and I wasn't able to come up with a very good
one readily.
That's because there is no good case.

It now turns out that your own case provides an excellent example of
what I mean. In the same pass, it validates my concern from somewhat
speculative to very real. My case against Cumming's guidelines has
therefore become much stronger.
That's because you don't understand, either deliberately or not.

For a clear understanding, I repeat my suggestion to stop thinking
about clocks.
And I just explained very carefully that I *absolutely don't* need to
think about clocks when generating stimulus, because the stimulus is
*independent* of any clock, and if it matters whether stimulus that
occurs at the same time as a device clock (within some epsilon) falls
on one side or the other of the clock, then we have a broken device.

I have proven that your test benches are nonderministic.
Yes, at the margins, for cases where I already told you we are doing
huge sweeps of data. Guess what? In real life, the DUT needs to cope
with non-deterministic input. For the testbench, it is sufficient to
insure that all possible cases are covered, not that all possible
cases are covered in exactly the same fashion on every possible
simulator.

You have
strong evidence that this nondeterminism is not revealed between
Mentor and Cadence, which happens to be in line with my expectations.
It wouldn't be revealed for *any* compliant simulator we used, because
the tests are constructed in a manner where *it doesn't matter*
whether you believe that or not.

We have been doing this for *years* and have never had a broken chip
due to this thing you seem to think is the most important thing in the
world.

Regards,
Pat
 
Jonathan Bromley wrote:

Wait until Cary R. puts his head above the parapet to
explain all the other weird and wonderful timing machinery
you can use in Verilog thanks to primitives and specify blocks.
It's complicated and I almost never use it, so I won't even
try to explain - I'd get it badly wrong. Gate-level sim
people rely heavily on it.
I'm busy so I'll keep my head low! The quick answer is gate and UDP
primitives have inertial delays. Using a specify block (usually used to
model physical gate delays) you can control which pulse widths get
ignored (inertial delay), which get turned into an 'X' and which get
passed (transport delay). With this you can define that pulses less than
60% of the gates delay are ignore. Pulses less than 80% and greater than
or equal to 60% of the gates delay generate an 'X' and any pulse 80% of
the gates delay or greater will be passed. The default values for these
limits generate inertial delays equal to 100% of the gates delay.

Not quite as bad as Jonathan made it sound, but this is ignoring how
delays are actually defined in a specify block. Lets just say "Here be
Dragons" and I'm staying in the castle for now ;-).

If you are doing RTL design then you can safely ignore all this unless
you want to get fancy in your test code and even then my suggestion
would be ignore this unless you really need it. You can do many
wonderful things with the rest of Verilog if you are not constrained by
synthesis requirements.

Cary
 
On May 24, 4:29 am, Jonathan Bromley <s...@oxfordbromley.plus.com>
wrote:
So far, so clear.  Now things get a little more
difficult.  As you say, there is a value change
on 'a'.  Does that release the @*?  I would argue
that it does not, because 'a' already has its new
value at the moment when execution reaches @*.
There is no further value change on 'a'.
Your argument would be correct.

However, some simulators (I believe) compute
"value change" by looking at the value of a
variable before and after the execution of code
at a given point in time.  Such a simulator might
now see that 'a' has changed since the beginning
of the time-slot, and therefore might choose to
release the @* for a second time.  THIS DOES NOT
MATTER because your code describes proper
combinational logic and the second iteration,
if it occurs, will give exactly the same results
as before.
I am doubtful that there are any simulators that do this. If there
are, I would claim that they are not valid implementations of Verilog.

Depending on how I interpret your description, such a simulator might
be completely nonfunctional. Suppose that the simulator allowed an
event control to continue immediately, without blocking, if the event
expression had changed value earlier in the current time slice. Then
an always block with an event control would go into an infinite loop
and hang the simulation. It would wake up when the event it was
waiting for occurred. Then when it looped back to the event control
(assuming no other delaying statements in the block), it would find
that the event had occurred earlier in the current time slice, and
continue executing again. And again and again...

Perhaps you could find a way to adjust this mechanism so that it only
responded to a given event once. But it is much simpler to take the
LRM description at face value. It waits for the event. That means
the event must be one that occurs after it starts waiting.

So your earlier argument is correct. This speculation about
simulators that might do something else is probably just a red
herring.

Note that this is the distinction between the
wait(named_event.triggered) mechanism added in SystemVerilog and
@named_event. The "triggered" property (really more like a method)
does keep track of whether the named event was triggered earlier in
the current time slice. And if you wrote an always block whose only
delaying statement was that wait, it would go into an infinite loop as
soon as the event was triggered.
 
On Fri, 28 May 2010 17:00:30 -0700 (PDT), sharp@cadence.com wrote:


However, some simulators (I believe) compute
"value change" by looking at the value of a
variable before and after the execution of code
at a given point in time.
[...]
I am doubtful that there are any simulators that do this. If there
are, I would claim that they are not valid implementations of Verilog.
Oh dear. Of course you're right. On re-reading that paragraph
it's clear I was mostly writing garbage and I'm not entirely
sure what I was thinking about.

I suspect that I was working hard to point out that any
propely-formed description of combinational logic must
work correctly even if it unnecessarily executes again
after computing the correct stable output values.
And I tried, unsuccessfully, to find some
justification for why that might happen in the
code the OP presented. Whoops.

So your earlier argument is correct. This speculation about
simulators that might do something else is probably just a red
herring.
Red and putrescent, yes.

Note that this is the distinction between the
wait(named_event.triggered) mechanism added in SystemVerilog and
@named_event. The "triggered" property (really more like a method)
does keep track of whether the named event was triggered earlier in
the current time slice. And if you wrote an always block whose only
delaying statement was that wait, it would go into an infinite loop as
soon as the event was triggered.
Indeed so. event.triggered is extremely useful for some
kinds of testbench synchronisation, but wouldn't make sense
in this context.

Thanks for putting the record straight (as usual!).
--
Jonathan Bromley
 

Welcome to EDABoard.com

Sponsor

Back
Top