Assignment synchronization disaster?

A

Andy Ross

Guest
I'm a long-time software guy with a stale-but-reasonably-robust
background in digital electronics, am in the process of teaching
myself Verilog, and have stumbled into a gotcha that I'm having
trouble understanding. Or rather: I think I *do* understand it (and
have fixed it in my code), but am having trouble understanding why the
scheduler rules permit this insanity in the first place. So please
read to the bottom -- I'm not asking for help with the circuit, I'm
trying to understand the language's behavior.

Here's the exposition. My circuit design (a USB 1.1 receiver, as it
happens) has an edge detector in it using a single synchronous
flip-flop. Stripped of all the other stuff, it looks basically like
this (I've been led to believe by most sources that this is more or
less the canonical way one writes such a circuit; if there are bugs
please tell me!):

module edge_detector(input clk, x, output edg);
reg x0;
assign edg = (x != x0);
always @(posedge clk) x0 <= x;
endmodule


So that being done, I want to write a testbench, provide some
stimulus, and get out a waveform I can look at in (in this case)
gtkwave. So here's the code for that:

module test;
reg clk=1, x=0;
edge_detector ed(.clk(clk), .x(x));
always #(0.5) clk = !clk;
initial begin
$dumpfile("edge_detector.vcd");
$dumpvars(1, ed);
ed.x0 = 0;
repeat(100) #4 x = !x;
$finish;
end
endmodule

Basically: instantiate the module, clock it with a period of 1, and
every four clocks toggle the value of the input. Again, this is
pretty much standard stuff from everything I'm finding on the web and
in books. The desired output, of course, is that I see the "ed.x"
line oscillate with 4x the frequency of "ed.clk", the "ed.x0" line
should be a 1-clock-delayed version of "ed.x", and that on every edge
of x I see "ed.edg" rise to a logic 1 and stay there until the next
rising edge of clk.

But that's not what happens in the simulator! Running exactly this
code under Icarus 0.9.1, I see the x and x0 lines move in lockstep,
and edg NEVER TRIGGERS AT ALL.

And this doesn't appear to be a bug in Icarus as far as I can tell,
the scheduling rules allow it. For example: just before T=4 the x0
line is still zero, then the "x = !x" line fires first and toggles x
from 0 to 1 with a blocking assignment (this is the edge I want to
detect). Then the "clk = !clk" trigger fires and creates a rising
edge, which in turn triggers the synchronous block in the device to
save the current value of x (which is now one!) in the x0 flipflop.
So we exit the T=4 time step with x and x0 having *both* toggled, and
thus no edge is detected!

The "solution", of course, is to use a nonblocking assignment to x in
the testbench to prevent it from being seen prematurely by the
flip-flop, and (once I puzzled this out, of course) that's what I've
done. The core of the problem here seems basically to be that I have
testbench code running in the same time step as device code and racing
against each other.

But that's insane! First off, no one writes test benches like that.
I've looked at bunch of references at this point, and *everybody*
writes their Verilog test code in a traditional/imperative style using
blocking assignments. Doing otherwise would be a nightmare.

But more importantly, I don't see how this can ever be made to work
reliably. As far as I can tell, making a blocking assignment can only
be made "safe" from this kind of race condition if you can guarantee
either that no other code is sampling it (or any combinatorial signal
derived from it!) using a nonblocking assignment, or that it isn't
possible for your testbench code to occupy the same time step as
device code. You see the "don't mix assignment types" rule all the
time, but this problem appears to be much more serious: the converse
of the above is that you can't use a nonblocking assignment unless you
know that every reg variable referenced by ths RHS can *never* be
assigned to with "=".

So we finally come to my questions: how is this not a *huge* booby
trap in Verilog testbench development? Am I just incredibly unlucky
that I discovered this so early? If not, what were the language
designers thinking when they decided to allow inevitable race
conditions without synchronization primitives? How do professionals
deal with the issue: use nonblocking assignments always? Shift the
time steps so that testbench code is never synchronous to any device
clocks? Are there well-known conventions people follow, or is this
just something that everyone needs to figure out on their own?

I was really enjoying learning Verilog until I hit this issue. Now I
find myself thinking through ways I can isolate it to just synthesis
code; I'm currently playing with Verilator's C++ environment, and am
seriously thinking about jumping ship. Basically: what am I missing?

Andy
 
On 2009-04-15, Andy Ross <andy.ross.or@gmail.com> wrote:
read to the bottom -- I'm not asking for help with the circuit, I'm
trying to understand the language's behavior.
Hi,

I remember having exactly the same kind of problems as you have when I
started to write some more advanced test-benches. (First of all, I should
say that I haven't actually had time to test any code that I have written
in this post, there are probably a few syntax errors lurking here and there,
but the ideas should be clear, if not, just say so and I'll try to clarify
them.)


module edge_detector(input clk, x, output edg);
reg x0;
assign edg = (x != x0);
always @(posedge clk) x0 <= x;
endmodule
Is this supposed to synchronize an external input that is not
synchronuous with your clock? In that case you probably want
to avoid any combinational logic before the input is actually
synchronized.

As in:

module edge_detector_and_sync(input clk, x, output edg);
reg x_sync, x0;
assign edg = (x_sync != x0);
always @(posedge clk) begin
x_sync <= x;
x0 <= x_sync;
end
endmodule


module test;
reg clk=1, x=0;
edge_detector ed(.clk(clk), .x(x));
always #(0.5) clk = !clk;
initial begin
$dumpfile("edge_detector.vcd");
$dumpvars(1, ed);
ed.x0 = 0;
repeat(100) #4 x = !x;
$finish;
end
endmodule
What you are doing here is basically undefined as you have figured out.
What is worse, it would be quite undefined in hardware as well since you
would violate the setup requirements of the flip-flop in the edge_detector.
(If you ever need to do write a test-bench capable of simulating a post
synthesis netlist or post place and route netlist you will certainly need
to care about setup/hold-time requirements of flip-flops.)

What I usually do in simple test-benches is to make sure that the signals
will not change exactly at clock edges. More advanced test-benches will
mostly look like a program, something like:

initial begin
bus_write(0,95);
bus_write(4,42);
bus_write(8,98);
// And so on
end

// Or possibly like this if the tasks are hidden in another module
// (which is probably a good idea for ease of reuse)
initial begin
bus0.write(0,95);
bus0.write(4,42);
bus0.write(8,98);

fork // Two accesses at more or less the same time to different buses
bus1.write(400,50);
bus0.write(200,10);
join
end

And then the bus_write task will be written so as to synchronize all
signals with the bus signal. I hope this example can show you that writing
robust test-benches in Verilog doesn't have to be very tedious.
(See below for an example of how I would solve your problem.)



The "solution", of course, is to use a nonblocking assignment to x in
the testbench to prevent it from being seen prematurely by the
flip-flop, and (once I puzzled this out, of course) that's what I've
done. The core of the problem here seems basically to be that I have
testbench code running in the same time step as device code and racing
against each other.

But that's insane! First off, no one writes test benches like that.
I've looked at bunch of references at this point, and *everybody*
writes their Verilog test code in a traditional/imperative style using
blocking assignments. Doing otherwise would be a nightmare.
It depends on what you are doing. One way is to synchronize your test-bench
to the clock signal. In that case you certainly need to use non-blocking
assignments:

initial begin
repeat(100) begin
repeat (4) @(posedge clk);
x <= !x; // Possibly with a small delay if you want to be able
// to simulate a post place and route netlist
end
end


Finally, if you are able to use SystemVerilog instead of Verilog in your
test-bench, you may want to investigate the clocking block construct in
SystemVerilog.

/Andreas
 
On Wed, 15 Apr 2009 16:51:37 -0700 (PDT), Andy Ross wrote:

module edge_detector(input clk, x, output edg);
reg x0;
assign edg = (x != x0);
always @(posedge clk) x0 <= x;
endmodule
Looks good, PROVIDED that the input signal 'x' is
already synchronized to the clock. If not, you
probably want two resynch flops first.

So that being done, I want to write a testbench, provide some
stimulus, and get out a waveform I can look at in (in this case)
gtkwave. So here's the code for that:

module test;
reg clk=1, x=0;
edge_detector ed(.clk(clk), .x(x));
always #(0.5) clk = !clk;
So the clock has rising edges at t=1.0, t=2.0, .....

initial begin
$dumpfile("edge_detector.vcd");
$dumpvars(1, ed);
ed.x0 = 0;
repeat(100) #4 x = !x;
and x toggles at t=4.0, t=8.0, ...

The "solution", of course, is to use a nonblocking assignment to x in
the testbench to prevent it from being seen prematurely by the
flip-flop, and (once I puzzled this out, of course) that's what I've
done. The core of the problem here seems basically to be that I have
testbench code running in the same time step as device code and racing
against each other.
Exactly. In effect, your x signal is provoking a hold-time
violation on the sampling flop in your DUT. All bets are off.

But that's insane! First off, no one writes test benches like that.
I've looked at bunch of references at this point, and *everybody*
writes their Verilog test code in a traditional/imperative style using
blocking assignments. Doing otherwise would be a nightmare.
Yes, but they DO take care (if they are sane) to ensure that
all inputs to the DUT exhibit some nonzero setup and hold time,
and that all DUT outputs are sampled at some appropriate time
slightly away from the clock edge.

Am I just incredibly unlucky
that I discovered this so early?
No, you're incredibly LUCKY; now you won't get burned by it
on a nontrivial project where it's harder to track down!

There are various ways around this. Probably the simplest
to understand is...

- Set up DUT stimulus on the INACTIVE clock edge, not the
active edge. This gives half a cycle of setup and hold.
It also makes it rather easier to interpret wave displays,
because it is now blindingly obvious what input value will
be sampled by the DUT on each clock.
- Sample DUT output shortly before the ACTIVE clock edge,
so that you can be sure all the DUT outputs are settled
and stable (even if your DUT models its gate delays).
In practice this can be done with a simple @(posedge clk)
because your DUT will surely have nonblocking assignments
to any clocked outputs, and therefore the TB will sample
the current-state values of outputs (not the next-state)
reliably. But if you're a tad paranoid, which is no bad
thing, you could explicitly do the sampling a short time
before the clock edge.

Note that this sampling strategy also allows you to sample
the DUT stimulus at the same moment as the DUT outputs, and
you will reliably see the new stimulus and the old outputs.

SystemVerilog's "clocking block" allows you to encapsulate
all this business in a single declarative construct, but
if you don't have access to SV it's still fairly easy:

module testbench;

... declarations ...
... clock generator, DUT instance, etc ...

initial begin : Stimulus_generator
... initialisations ...
forever @(negedge clock) begin
... apply stimulus on inactive edge
... using blocking assignment
end
end

initial begin : TB_sampling
@(negedge clock) // get to inactive clock edge
#(CLOCK_PERIOD/4); // get to just before next clock
... sample DUT inputs and outputs;
... outputs are those set up by DUT on previous clock,
... inputs are those that will be sampled by DUT on next clock
end

endmodule

Note that this completely decouples stimulus generation
from output checking, which is usually reckoned to be
a Good Thing.

I was really enjoying learning Verilog until I hit this issue.
Ah, the pleasure and the pain ;-)

Postscript: Another fairly neat approach to this drive/sample
business is to use a virtual clock in the testbench, operating
a short time earlier than the real clock supplied to the DUT's
clock input. Here's a sketch:

//--------------------------------------------------------
module TB_with_virtual_clock;

... declarations left as an exercise ...

initial begin : stoppable_clock_generator
TB_clock = 1'b0; // prefer to avoid active edge at t=0
while (stop !== 1'b1)
wait (stop !== 1'b1)
#(PERIOD/2)
TB_clock = ~TB_clock;
end

// Derive DUT clock from TB clock
assign #(PERIOD/4) DUT_clock = TB_clock;

initial begin : testbench
... initializations ...
repeat (TEST_CYCLES) @(posedge TB_clock) begin
... sample previous clock's DUT inputs and outputs
... (YOU MUST DO THIS FIRST!!!!)
... drive DUT's new inputs
end
// Test activity now finished, stop the clock
stop = 1'b1;
end

dut_module DUT( .clock(DUT_clock), ... );

endmodule
//--------------------------------------------------------

Now you're sampling the DUT's input and output signals
a short while before the DUT's clock, just as real
flipflops would do; and you're then driving DUT inputs
in such a way that they exhibit just a little setup time
before the next clock.

HTH
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
On 2009-04-16, Andreas Ehliar <ehliar-nospam@isy.liu.se> wrote:
would violate the setup requirements of the flip-flop in the edge_detector.
^^^^^

Before someone else corrects me: I really meant hold requirements here.

/Andreas
 
I would have stated the issue differently. Your testbench found a
problem in your design. While I don't do actual Vlog coding so my
views should be taken with a grain of salt. I do work with others who
do and have done so a while and listen to what they have as rules.

One of the rules they enforce here is registration of signals. That
is no signal enters of leaves a module without passing through a flop.
In your case, you were using the unregistered signal x. Since x
didn't go through a flop, you don't know when it changes relatively to
the clock. Your testbase made it change at a time where it made your
design act in a way you don't expect. The recommended design change
of:

module edge_detector_and_sync(input clk, x, output edg);
reg x_sync, x0;
assign edg = (x_sync != x0);
always @(posedge clk) begin
x_sync <= x;
x0 <= x_sync;
end
endmodule

ameliorates that problem. Now, x_sync is registered and we are
compring the results of two flops, one which holds the value from the
previous clock cycle. So, now the edge detector detects whether the
value of x has changed from what it was in the previous clock cycle.

Note, this is a specific definition of what an edge means and it
doesn't handle "edges" caused by glitches or violations of the setup &
hold times of the registers, etc. However, if you want to deal with
those cases (which I don't think you are trying to do), then in my
opinion you are not doing the kind of design Vlog was intended to
model.

Even with this design if your x is too close to the clock, you can see
faulty behaviors, such as the edge being detected earlier or later is
some cases than others because there still maybe a race between x and
clk. However, once x_sync gets values, the relationship between x_sync
and edg should be consistent.

Hope this helps,
Chris
 

Welcome to EDABoard.com

Sponsor

Back
Top