Scheduling of Sequential UDP devices

  • Thread starter Stephen Williams
  • Start date
S

Stephen Williams

Guest
OK, this one has got me. The standard is leaving out something
important here. My situation is that I have a design that
uses flip-flops. The hand written Verilog uses always @() and
non-blocking assignments to implement synchronous logic. No
problem.

(Especially since I now have the stratified event queue handling
non-blocking assign events properly.)

But in order to test a post-layout version, I get from the
Xilinx tools flip-devices implemented using synchronous user
defined primitives. Unfortunately, the IEEE1364-2001 LRM does
not say out sequential udp primitives are scheduled.

I suspect that all the big name tools cause the output of the
*sequential* primitives to be scheduled exactly like non-blocking
assignments, and the output of *combinational* primitives to be
scheduled exactly like continuous assignments.

I need confirmation of this assumption. The implications are
pretty drastic.
--
Steve Williams "The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep."
 
Steven Sharp wrote:
Stephen Williams <spamtrap@icarus.com> wrote in message news:<bjbccf$b3i$1@sun-news.laserlink.net>...

OK, this one has got me. The standard is leaving out something
important here.

I suspect that all the big name tools cause the output of the
*sequential* primitives to be scheduled exactly like non-blocking
assignments, and the output of *combinational* primitives to be
scheduled exactly like continuous assignments.


This is not true in Verilog-XL or NC-Verilog. They do not schedule
sequential UDP outputs on the NBA queue. As far as I know, they
schedule them like normal gate primitives. This is not necessarily
exactly the same as continuous assignments, but falls into the same
category from the viewpoint of the standard. So the standard isn't
leaving anything out; there just aren't any special requirements for
the scheduling of UDPs.

I don't know why this doesn't lead to more zero-delay race conditions
in gate-level designs. In timing simulations, the delays prevent them.
But they don't seem to come up a lot in zero-delay simulations either.
There are some choices that simulators can make in evaluation order
that help minimize them.
It is common for folks to use "always" statements with a non-blocking
assignment to model a FF in RTL designs, and it is reasonable to
expect that a "primitive" model of the same device have similar
scheduling characteristics. Since the standard leaves the scheduling
implications of user defined primitives completely unspecified,
I guess I am free to schedule sequential outputs as non-blocking
assignments.

(If there are multiple clocks in the design, this makes a big dif-
ference since multiple clock change events may exist in different
places in the active event queue. Register transfers really want
to be after all clock events.)

Incidentally, blocking assignments in sequential always blocks don't
necessarily result in race problems unless a simulator gets overly
aggressive in trying to optimize them. That can expose the races,
so now everyone has to use the more expensive nonblocking assignments,
resulting in an overall slowdown as a result of the attempt at
optimizations.
If there is a single clock, then everything is guaranteed to happen
after that one clock. It is not possible (in an RTL design) for there
to be any events scheduled in the clock time before the clock itself.
The clock edge puts all blocked threads and sequential primitives on
the active event queue en masse, and *all* propagation events thus
happen after clocked devices start and examine their inputs.


--
Steve Williams "The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep."
 
Steven Sharp wrote:
Stephen Williams <spamtrap@icarus.com> wrote in message news:<bjbccf$b3i$1@sun-news.laserlink.net>...
OK, this one has got me. The standard is leaving out something
important here.

I suspect that all the big name tools cause the output of the
*sequential* primitives to be scheduled exactly like non-blocking
assignments, and the output of *combinational* primitives to be
scheduled exactly like continuous assignments.

This is not true in Verilog-XL or NC-Verilog. They do not schedule
sequential UDP outputs on the NBA queue. As far as I know, they
schedule them like normal gate primitives. This is not necessarily
exactly the same as continuous assignments, but falls into the same
category from the viewpoint of the standard. So the standard isn't
leaving anything out; there just aren't any special requirements for
the scheduling of UDPs.

I don't know why this doesn't lead to more zero-delay race conditions
in gate-level designs. In timing simulations, the delays prevent them.
But they don't seem to come up a lot in zero-delay simulations either.
There are some choices that simulators can make in evaluation order
that help minimize them.
From earlier occasions and feedback I recall that the original
Verilog-XL did "something" special with ports with the effect to
avoid races. Unfortunately this "something" was never made explicit.

In the early years (~1990) we did Verilog design and synthesis with
blocking assignments *only*. (non-blocking was not supported by Synopsys
synthesis, and perhaps not even in Verilog, I don't know.) To make
this work, we had to encapsulate sequential always blocks in modules.
It worked because port communication was racefree.

Incidentally, blocking assignments in sequential always blocks don't
necessarily result in race problems unless a simulator gets overly
aggressive in trying to optimize them. That can expose the races,
so now everyone has to use the more expensive nonblocking assignments,
resulting in an overall slowdown as a result of the attempt at
optimizations.
There have always been races between sequential always blocks in the
same module using blocking assignments. The problems started when
some other simulators started to "optimize" the port behavior.
Suddenly previously racefree models now had races. Conversely,
from then on it was no longer possible to avoid races without
using non-blocking assignments. Progress!

Regards, Jan


--
Jan Decaluwe - Resources bvba - http://jandecaluwe.com
Losbergenlaan 16, B-3010 Leuven, Belgium
Bored with EDA the way it is? Check this:
http://jandecaluwe.com/Tools/MyHDL/Overview.html
 
Stephen Williams wrote:

If there is a single clock, then everything is guaranteed to happen
after that one clock. It is not possible (in an RTL design) for there
to be any events scheduled in the clock time before the clock itself.
The clock edge puts all blocked threads and sequential primitives on
the active event queue en masse, and *all* propagation events thus
happen after clocked devices start and examine their inputs.
I may not get the point from this remark, but it seems to
suggest that single clock models are racefree by nature?
It is not relevant that everything happens after the clock -
if you use blocking assignments, you may get races.

Jan

--
Jan Decaluwe - Resources bvba - http://jandecaluwe.com
Losbergenlaan 16, B-3010 Leuven, Belgium
Bored with EDA the way it is? Check this:
http://jandecaluwe.com/Tools/MyHDL/Overview.html
 
Stephen Williams <spamtrap@icarus.com> wrote in message news:<bjjn3g$su0$1@sun-news.laserlink.net>...
Since the standard leaves the scheduling
implications of user defined primitives completely unspecified,
I guess I am free to schedule sequential outputs as non-blocking
assignments.
Actually, you aren't. UDP outputs are not nonblocking assignments,
so their output changes are not nonblocking assign update events.
The standard requires that nonblocking assign events wait until
after all active and inactive events. That includes UDP outputs.
You have to allow an arbitrary number of levels of ordinary
zero delays to be evaluated until everything is settled, before
the nonblocking assignments get done.

You can add an extra level of end-of-time lists for UDP outputs
that executes after other active events and before nonblocking
assigns, if you like. As long as all active events (including
UDP outputs) have been evaluated before the nonblocking assigns,
you have satisfied the requirements. The standard allows complete
freedom in the order of evaluation of the active events, so you
can add extra stratification to those if you like. That should
have the same basic effect you are seeking.
 
Stephen Williams <spamtrap@icarus.com> wrote in message news:<bjjn3g$su0$1@sun-news.laserlink.net>...
If there is a single clock, then everything is guaranteed to happen
after that one clock. It is not possible (in an RTL design) for there
to be any events scheduled in the clock time before the clock itself.
The clock edge puts all blocked threads and sequential primitives on
the active event queue en masse, and *all* propagation events thus
happen after clocked devices start and examine their inputs.
If you are suggesting that this prevents races, then I'm afraid you
are incorrect. In particular, your last statement is incorrect.

Perhaps this is true of your implementation, but it is not guaranteed
by the standard. I am not a fan of section 5 of the LRM, but it is
what we have, so I will refer to its rules and terminology.

When the update event for the clock is processed, the processes
sensitive to the clock are added to the event queue as evaluation
events. In your scenario, they are the only active events in the
queue. One of them gets evaluated first, and if its output changes,
it adds an update event to the queue. At this point, the simulator
can choose to process any active event. That includes this new
update event as well as the existing evaluation events. If it
processes the update event first, then the data value changes
before the other evaluations start and examine their inputs. This
means that other evaluations may see the old data value or the new
one.

This may seem odd, but it happens all the time. Consider a blocking
assign to a reg/variable. Section 5 considers the value change to be
an update event. But the procedural code cannot continue until the
blocking assign is complete. So section 5 must consider this conceptually
as the process suspending, an update event being performed, and then
the process resuming. So this update event not only occurred before
any other evaluation events, but before the current evaluation completed.

And some simulators may process continuous assignments on the fanout
of a variable as part of updating the variable, propagating that value
change immediately. In this case it is just an attempt at optimization.
 
Steven Sharp wrote:
Stephen Williams <spamtrap@icarus.com> wrote in message news:<bjjn3g$su0$1@sun-news.laserlink.net>...

If there is a single clock, then everything is guaranteed to happen
after that one clock. It is not possible (in an RTL design) for there
to be any events scheduled in the clock time before the clock itself.
The clock edge puts all blocked threads and sequential primitives on
the active event queue en masse, and *all* propagation events thus
happen after clocked devices start and examine their inputs.


If you are suggesting that this prevents races, then I'm afraid you
are incorrect. In particular, your last statement is incorrect.

When the update event for the clock is processed, the processes
sensitive to the clock are added to the event queue as evaluation
events. In your scenario, they are the only active events in the
queue. One of them gets evaluated first, and if its output changes,
it adds an update event to the queue. At this point, the simulator
can choose to process any active event. That includes this new
update event as well as the existing evaluation events. If it
processes the update event first, then the data value changes
before the other evaluations start and examine their inputs. This
means that other evaluations may see the old data value or the new
one.
You are absolutely right, it is *my implementation* that makes
this race free. I suspect this is also true in Cadence tools because
I remember long and painful discussions with people who compared
results. And there are cases where I had users writing to an
input of a gate, then expecting to be able to read the output
in the same thread. That took some convincing. I think I finally
concluded that the expectation is absurd because it leads to
logical contradictions.

I will need to reassess my scheduling of synchronous UDP output
scheduling. I may ask for an errata to make it explicit so that
I can rest the case. I see your point, but I'm torn between that
and the user expectation that a UDP version of a register behave
the same as a behavioral model of a register.

It's all black magic, really. I don't write code, I cast spells.
Sometimes I feel like the sorcerer's apprentice.
--
Steve Williams "The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep."
 
Jan Decaluwe <jan@jandecaluwe.com> wrote in message news:<3F5D7BAA.D8C915E1@jandecaluwe.com>...
From earlier occasions and feedback I recall that the original
Verilog-XL did "something" special with ports with the effect to
avoid races. Unfortunately this "something" was never made explicit.
This may not have been done deliberately for this purpose. It may
just have been a natural consequence of the implementation.

A value passing through a port always comes out as a net. An output
port declared as a reg effectively gets a continuous assignment
inserted so that a net comes out. A reg is updated immediately as
part of a blocking assignment, but a net may just be scheduled to
be updated later. This may defer the updating of the net long
enough that all of the always blocks on the same clock are executed
before any of their output nets are updated.

This is basically the same way VHDL avoids such races. VHDL
requires that all pending processes finish executing before any
signals they scheduled are updated. This alternating of execution
and signal updates in "delta cycles" avoids races between processes
triggered by the same signal.

I think Verilog-XL does immediately evaluate some continuous
assignments and update their outputs when a reg that they read
changes. Maybe it deliberately avoids doing this for ports, to
get the race avoidance you describe.

NC-Verilog was designed from the start as a mixed Verilog/VHDL
simulator. Since there is a single scheduler for both languages,
it follows the same rules as VHDL. This effectively gives
NC-Verilog much of the same race avoidance. Two always blocks
triggered by the same clock net and communicating via nets (which
includes ports) should not have races even with blocking assignments.
But the desire for speed or matching XL means that the VHDL rules
may not be strictly followed in all situations for Verilog.

There have always been races between sequential always blocks in the
same module using blocking assignments. The problems started when
some other simulators started to "optimize" the port behavior.
Suddenly previously racefree models now had races. Conversely,
from then on it was no longer possible to avoid races without
using non-blocking assignments. Progress!
Yep.

And since blocking assignments execute as much as twice as fast as
nonblocking ones, it is likely that the overall result was a slowdown
in simulation. Progress indeed.
 

Welcome to EDABoard.com

Sponsor

Back
Top