Is this a race?

C

Chris F Clark

Guest
I'm looking at some verilog code (see the module below) that I am
certain represents a race. However, when we run the code with
Modelsym, it consistently resolves the race the "correct" way (that is
the way the designer intended it). This makes me wonder if there
isn't some clause in the standard I have overlooked that specifies the
order of the two assignments (or their containing always blocks) that
I have overlooked. If anyone can point out a clause (in the standard)
that would require that the posedge block (and its non-blocking
assign) must be executed before the "level sensitive" block (and its
blocking assign), I would appreciate it. I would also appreciate
knowing if others see the same race I do. Results from other
simulators would be interesting too.

BTW, by consistently, I mean Modelysm not only runs the code as
written the way the designer wants, but it runs nuemrous variations on
the code in the desired order (i.e. reordering the blocks, changing
the first block to also be posedge on clk1, inserting assigns in
between the signals, and so forth). As close as I can get to a model
of Modelsyms ordering is that it appears to run the first half (the
value-capture/scheduling part) of non-blocking assignments first, then
blocking assigns. However, I find no support for ordering that in the
standard.

The issue, of course, is that our own home grown simulator runs the
blocking assign first, then the non-blocking assign (and that makes
the d_input value race through the circuit and get to d_out in one
cycle rather than being delayed a cycle). If it is an ambiguous race,
I don't care; the designer has to fix it. If it is supposed to be
deterministic, I have to figure out what's wrong with our simulator.

module rtv_latch_timing_test ( clk1, d_input, d_out );

input clk1 ;
input d_input ;
output d_out ;

reg d_out ;
reg latchout ;
wire clk1 ;
wire d_input ;


always @ ( d_input or clk1 ) // Hi level sens Latch
begin
if ( clk1 )
latchout = ( d_input );
end

always @ ( posedge clk1 ) // Rising edge trig FF
begin
d_out <= ( latchout );
end

endmodule

----------------------------------------------------------------

Modelsym runs the code almost as if it were written this way:

always @ ( d_input or clk1 ) // Hi level sens Latch
begin
if ( clk1 )
#0 latchout = ( d_input );
end

always @ ( posedge clk1 ) // Rising edge trig FF
begin
d_out <= ( latchout );
end

----------------------------------------------------------------

I would like it to interpret the code closer to this way (and argue
that because it is a race either interpretation is valid):

always @ ( d_input or clk1 ) // Hi level sens Latch
begin
if ( clk1 )
latchout = ( d_input );
end

always @ ( posedge clk1 ) // Rising edge trig FF
begin
#0 d_out <= ( latchout );
end

Thanks for any insights,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 
The #0 does nothing for my understanding. Do you mean to represent an
actual delay? The issue may be as simple as the scheduling of blocking
versus non-blocking operators in different always blocks.

"Chris F Clark" <cfc@shell01.TheWorld.com> wrote in message
news:sddk69un2yy.fsf@shell01.TheWorld.com...
I'm looking at some verilog code (see the module below) that I am
certain represents a race. However, when we run the code with
Modelsym, it consistently resolves the race the "correct" way (that is
the way the designer intended it). This makes me wonder if there
isn't some clause in the standard I have overlooked that specifies the
order of the two assignments (or their containing always blocks) that
I have overlooked. If anyone can point out a clause (in the standard)
that would require that the posedge block (and its non-blocking
assign) must be executed before the "level sensitive" block (and its
blocking assign), I would appreciate it. I would also appreciate
knowing if others see the same race I do. Results from other
simulators would be interesting too.

BTW, by consistently, I mean Modelysm not only runs the code as
written the way the designer wants, but it runs nuemrous variations on
the code in the desired order (i.e. reordering the blocks, changing
the first block to also be posedge on clk1, inserting assigns in
between the signals, and so forth). As close as I can get to a model
of Modelsyms ordering is that it appears to run the first half (the
value-capture/scheduling part) of non-blocking assignments first, then
blocking assigns. However, I find no support for ordering that in the
standard.

The issue, of course, is that our own home grown simulator runs the
blocking assign first, then the non-blocking assign (and that makes
the d_input value race through the circuit and get to d_out in one
cycle rather than being delayed a cycle). If it is an ambiguous race,
I don't care; the designer has to fix it. If it is supposed to be
deterministic, I have to figure out what's wrong with our simulator.

module rtv_latch_timing_test ( clk1, d_input, d_out );

input clk1 ;
input d_input ;
output d_out ;

reg d_out ;
reg latchout ;
wire clk1 ;
wire d_input ;


always @ ( d_input or clk1 ) // Hi level sens Latch
begin
if ( clk1 )
latchout = ( d_input );
end

always @ ( posedge clk1 ) // Rising edge trig FF
begin
d_out <= ( latchout );
end

endmodule

----------------------------------------------------------------

Modelsym runs the code almost as if it were written this way:

always @ ( d_input or clk1 ) // Hi level sens Latch
begin
if ( clk1 )
#0 latchout = ( d_input );
end

always @ ( posedge clk1 ) // Rising edge trig FF
begin
d_out <= ( latchout );
end

----------------------------------------------------------------

I would like it to interpret the code closer to this way (and argue
that because it is a race either interpretation is valid):

always @ ( d_input or clk1 ) // Hi level sens Latch
begin
if ( clk1 )
latchout = ( d_input );
end

always @ ( posedge clk1 ) // Rising edge trig FF
begin
#0 d_out <= ( latchout );
end

Thanks for any insights,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 
First, this is most certainly a race condition. If block A (latch) is
evaluated after block B (flop), you get different results than if block
A was evaluated befor block B. That is definition of a race.

That said, some simulators optimize around flops, such that all
Non-blocking assigned, edge triggered blocks are scheduled first. In
this scenario, any change in the ordering ( as it appears on the file
), or extra wire assignments, as you have tried, will yield the same
type of behavior.

There is nowhere in the standard that dictates the order in which these
two blocks are are triggered on the same event ( change of clk1) ,
therefore, the simulator vendors are allowed to make certain
optimizations with regard to the way they order these blocks.

Another consideration is coding style. If you follow Cummings "all
sequential logic should use non-blocking assignemnt", you would not
have a race condition on this code. Change the latch to be NBA and you
will always get the desired behavior.

I have seen problems in simulations where a badly coded ( blocking
assign ) hi-level sensitive latch, feeding a negative edge triggered
flop ( non-blocking assign ) will punch through, sometimes, depending
on order of eval. Some simulators will statically schedule these events
and always see a cycle delay, some simulators will dynamically schedule
these events, and sometimes will see the cycle delay sometimes will
not. Either way, its a race condition.

The best place in the standard that covers this stuff ( that I have
been able to find ) is IEEE 1364-2001 Sec. 5.2 Par. 4 :

Processes are sensitive to update events [ a clock change in your
case]. When an update event is executed, all processes that are
sensitive to that event [read: your two always blocks] are evaluated in
an arbitrary order.

I hope this clears it up.

-Art
 
I agree with Art. This is clearly a race and is nondeterministic
according to the LRM.
 
The #0 will cause the execution of the statement to be deferred, much
like an actual delay would. This is actually specified in the LRM,
though not completely correctly. The history is presumably something
like the following.

Verilog-XL apparently did not just throw a delay control of #0 away,
but handled it like any other delay control. It suspended the process
and re-scheduled it on the time queue for a time 0 away from now, i.e.
the current time queue. This deferred it to the end of the current
time queue (or perhaps put it on a new time queue to be executed
immediately after the current one, which has a similar effect). The
result was that a delay control of #0 would cause the current process
to be deferred until after all the other processes that were triggered
by the same inputs. Users started relying on the bad practice of
inserting #0 in their procedural code to try to resolve race conditions
(or if that wasn't sufficient, #0 #0 or #0 #0 #0).

So much legacy code relied on this, that a guarantee was put into the
LRM to try to ensure that this bad code would continue to work. In the
section on deterministic and nondeterministic event ordering, they
essentially needed to specify that #0 would defer a process to the end
of the event queue. But since the specification of nondeterministic
event ordering allowed processing the events on the queue in any order,
that wouldn't have helped. So they specified that it would go onto a
separate "inactive" queue until the "active" queue was processed, and
then the "inactive" queue would become the "active" queue.

The problem with this way of specifying it is that it is too strong.
There was never really a separate "inactive" queue; it was just a way
of saying "after the currently scheduled events". If taken literally,
the way it is specified implies that #0 not only defers until after the
currently scheduled events, but also after any events scheduled later
for the current time, as long as those events don't go onto the
"inactive" queue also. For example, it implies that a #0 delay is
bigger than an arbitrary number of continuous assignment delays in
series, which was not true in Verilog-XL.

Then the SystemVerilog committees got the idea that this "inactive"
queue was an actual separate queue that guaranteed this much stronger
ordering, and started building more elaborate things that assumed it.
Not good.
 
Art Stamness wrote:
First, this is most certainly a race condition. If block A (latch) is
....
The best place in the standard that covers this stuff ( that I have
been able to find ) is IEEE 1364-2001 Sec. 5.2 Par. 4 :

Processes are sensitive to update events [ a clock change in your
case]. When an update event is executed, all processes that are
sensitive to that event [read: your two always blocks] are evaluated in
an arbitrary order.

I hope this clears it up.

-Art
I agree with Art that it is a race condition.

I would also offer an explanation as to why all of the major commercial
simulators will resolve this race "as the designer intended".

always @ ( d_input or clk1 ) // Hi level sens Latch
begin
if ( clk1 )
latchout = ( d_input );
end

always @ ( posedge clk1 ) // Rising edge trig FF
begin
d_out <= ( latchout );
end

when clk1 has a change (for our case let us say posedge) both always
blocks wake up in some arbitrary order; both these processes will be on
the active events queue.
If the latch process was first and latchout has an update event it has
to go on the active events queue.
The question is whether latchout update event gets added at the
beginning of the active events queue or at the end of the list;
for historical reasons all "verilog-xl compatible" simulators add
events to the queue at the end of the list. And events are consumed
from the head of the list.

The list looked like -

[latch], [FF], [other stuff]

After the latch process is done it looks like -

[FF], [other stuff], [latchout update]

This results in the latchout update always happening after the FF
process.

There is a sizeable amount of similar code out there that happens to
work in simulation as the designer intended; since synthesis also
defines this code to work as the designer intended there has been no
cause to clean this up :)

For good measure note that always blocks within a module that are
sensitive to the same signal 'tend' to go on the event queue in the
lexical order; (testbench writers occassionally write code that
depends on this ordering). In your example all that means is that the
active events queue has [latch] followed by [FF].

-regards,
Ramesh
 
I am very interested in the reasoning in the post by ramesh@tharas.com:
I would also offer an explanation as to why all of the major commercial
simulators will resolve this race "as the designer intended".
....
when clk1 has a change (for our case let us say posedge) both always
blocks wake up in some arbitrary order; both these processes will be on
the active events queue.
If the latch process was first and latchout has an update event it has
to go on the active events queue.
So, are you saying that even though we don't have a non-blocking
assignment, the simulator is expected to schedule an update event to
change that value of latchout, rather than just updating the value of
latchout as part of the latch process and progressing? I can
understand that you need to schedule the effects of the update to
latchout to get its fanout run, but I never considered that you would
have to schedule the update itself (as separate from the process
representing the always block).

Is that really how XL worked? Are the update events (blocking
assignments) separate from the processes (like always blocks).

I had envisioned a model where the always block was a linked list of
events and one you started an always block, you just ran the events on
that linked list until something made you wait (e.g. an event
control). That's closer to the way our simulator works.

I'm more surprised with the contention that synthesizers would
"respect" that ordering. If I were writing a synthesizer, I would
attempt to match code within an always block to a device I have. That
would tend to clump all the always block code together. I wouldn't
make any effort avoiding having the latchout output of the latch
getting to the flip-flop before the flop had settled, thus sometimes
racing through (and sometimes not). Thus, the real hardware would
have a race and maybe it would work and maybe it wouldn't depending on
timing.

In fact, if I were a hardware designer, I would be a little pissed at
my simulator for not making that race obvious to me and resolving it
in a way that the hardware might not.

I'm in the process of discussing with my fellow in-house simulator
designers whether we want to call this particular race out as an
error.

-Chris
 
On 12 Apr 2006 11:36:48 -0700, "Art Stamness" <artstamness@gmail.com>
wrote:

First, this is most certainly a race condition. If block A (latch) is
evaluated after block B (flop), you get different results than if block
A was evaluated befor block B. That is definition of a race.
Yes, this is absolutely a race and the solution is to use the NBA for
all sequential logic. One reason the simulators behave this way is
that when this logic is moved to real hardware it works as if the
latch is also edge sensitive because its output can't change
instantenously so it acts as a shift register with the first reg as a
latch. You can time this hardware nicely and if the clock tree is done
correctly it always works as intended. The solution is to make the
latches use NBA. For most people the reason to write Verilog is to get
a bunch of gates so what the simulators do mimic what the gates do.
 
On 12 Apr 2006 18:38:01 -0400, Chris F Clark
<cfc@shell01.TheWorld.com> wrote:

So, are you saying that even though we don't have a non-blocking
assignment, the simulator is expected to schedule an update event to
change that value of latchout, rather than just updating the value of
latchout as part of the latch process and progressing? I can
understand that you need to schedule the effects of the update to
latchout to get its fanout run, but I never considered that you would
have to schedule the update itself (as separate from the process
representing the always block).

I think reading 1364-2001 might help here. Section 5.6.3 paragraph two
says: "When the process is returned (or if it returns immediately if
no delay is specified), the process performs the assignment to the
left-hand side and enables any events based upon the update of the
left-hand side. The values at the time the process resumes are used to
determine the target(s). Execution may then continue with the
next sequential statement or with other active events."

In other words, you can continue executing the next sequential
statement or any other active event. But the issue is a little bit
more complicated: where in the event queue do you put the "any events
based upon the update of the left-hand side" ? If you add them to the
head of the active event queue the register will see it.

Also Section 5.4 has a bit of pseudo-C code which shows that there are
update events and evaluation events. One possible interpretation of
all this is that XL first goes around and does all the evaluation
events and when it generates an update event, it switches to an other
process. This is perfectly acceptable per 1364 it seems.

I'm more surprised with the contention that synthesizers would
"respect" that ordering.
I don't think the synthesizer tries to "respect" anything here. Assume
zero-skew clock and non-zero clk->Q registers and latches. Then if the
synthesizer generates the "obvious" hardware, it will work without a
race.
 
I have to disagree with this description of the "reason".

This is a blocking assignment to a variable. A blocking assignment is
called that because the procedural process containing it is not allowed
to continue executing until the assignment is complete and the value
has been updated. So the only way you could schedule it to be updated
later is if you also suspended the always block and scheduled it to
continue after the "update event" on the variable.

In practice, no simulator would do that. It would be very inefficient.
Also, having the always block suspended somewhere other than waiting
at the event control means that you have made it insensitive to its
inputs during that time, which could cause unexpected behavior.
Instead, a simulator will treat an undelayed blocking assignment to a
variable pretty much like an assignment to a variable in any other
programming language: it will update the value and then execute the
next statement after the assignment. There is no scheduling of the
update for later.
 
Chris F Clark wrote:
I'm looking at some verilog code (see the module below) that I am
certain represents a race. However, when we run the code with
Modelsym, it consistently resolves the race the "correct" way (that is
the way the designer intended it).

BTW, by consistently, I mean Modelysm not only runs the code as
written the way the designer wants, but it runs nuemrous variations on
the code in the desired order (i.e. reordering the blocks, changing
the first block to also be posedge on clk1, inserting assigns in
between the signals, and so forth). As close as I can get to a model
of Modelsyms ordering is that it appears to run the first half (the
value-capture/scheduling part) of non-blocking assignments first, then
blocking assigns. However, I find no support for ordering that in the
standard.
For what little it may be worth, my observations of the big three
simulators are that at a ligh level, ModelSim has a "do what I mean"
mentality when your code could be legally interpreted multiple ways or
in multiple orders. NC generally seems to fall on this side as well.
VCS, on the other hand, seems to have a "do what simulates fastest"
mentality, especially with Radiant turned on, and this often yields the
opposite result of "do what I mean." The plus side to this is that it
forces you to clean up any ambiguous code.

For some reason, this topic has come in my group a few times lately.

-cb
 
Is that really how XL worked? Are the update events (blocking
assignments) separate from the processes (like always blocks).
The LRM allows for such behavior but as s...@cadence.com points out
this if implemented in a straight forward way would be inefficient. [if
this was somehow desired and required one could contrive reasonably
effecient code that seperated local updates with in the always block
from the external update; it would be a strange pseudo NBA].

On further analysis i find that my assertion that most all commerical
simulators exhibit this external behavior is incorrect. In one
simulator, one can change the behavior by changing the lexical order of
the two always blocks. In another, one gets the behavior to change by
order of always blocks if you create flop to flop paths. Which
suggests it is a straight forward race and no guarantees; different
optimizations move ordering around.

The modelsim behavior you report might be an artifact of how the wake
up list is sorted; experiment with more than two stages and different
ordering; you will likely break the pipeline behavior you observed.

I'm in the process of discussing with my fellow in-house simulator
designers whether we want to call this particular race out as an
error.
Do you intend to this statically or dynamically i.e., at simulation
time ? a static check might generate a lot of noise.
 
I wrote:
I'm in the process of discussing with my fellow in-house simulator
designers whether we want to call this particular race out as an
error.
Ramesh asked:
Do you intend to this statically or dynamically i.e., at simulation
time ? a static check might generate a lot of noise.
We need do it as a static check, because we've already lost the
information at dynamic time due to aspects of our implementation. The
noise issue is a valid one. However, the fact that in this case our
simulation strategy gives the "wrong" results, means that it isn't
simply just noise. Whether we can get an error that catches only the
problematic cases and not too much questionable, but okay cases
remains to be seen.

-Chris
 

Welcome to EDABoard.com

Sponsor

Back
Top