Non-Blocking versus blocking

A

Ali Karaali

Guest
Hello everybody,

I'm very newbie at verilog and I know this is very classical issue for
newbies but I haven't got rid of confusing yet. Very simple design
below;

module delay(input clk, output enable);
reg[2:0] count;
always@(posedge clk)
count = count + 1;

assign enable = (count == 7);
endmodule

module myTopModule(input clk, output out);
wire tmp;
always@(posedge clk)
out <= tmp;

delay d_i(.clk(clk), .enable(tmp));
endmodule

What is the problem of that design or are there any? I tried to give
to output 7 cycles 0 and 1 cycle 1.
 
On May 19, 4:41 pm, Ali Karaali <ali...@gmail.com> wrote:
I'm very newbie at verilog and I know this is very classical issue for
newbies but I haven't got rid of confusing yet. Very simple design
below;

module delay(input clk, output enable);
    reg[2:0] count;
    always@(posedge clk)
        count = count + 1;

    assign enable = (count == 7);
endmodule

module myTopModule(input clk, output out);
    wire tmp;
    always@(posedge clk)
        out <= tmp;

    delay d_i(.clk(clk), .enable(tmp));
endmodule

What is the problem of that design or are there any?
(1)
Variable "count" is not reset, so it will start at 3'bX
and will remain stuck at 3'bX throughout (because X+1 = X).
In hardware it would start at *some* value, and you would
indeed get one pulse per 8 clocks. But simulation is
more pessimistic. You need a reset.

(2)
Yes, you were right in the title: you *must* use
nonblocking assignment to "count", because its
value is used outside the always block:

always @(posedge clk)
count <= count + 1;

(3)
Why two always blocks, a module instance and an assign?
Why not

module YourTopModule (
input clk,
input synchronous_reset,
output reg out );

reg [2:0] count;

always @(posedge clk) begin
if (synchronous_reset)
count <= 0;
out <= 0;
else begin
count <= count + 1;
out <= (count == 7);
end
end
endmodule

Astute observers will note that it would then be possible
to bury "count" in the always block and assign to it using
blocking assignment, but let's set that aside for a while
to avoid unnecessary confusion for the OP.
--
Jonathan Bromley
 
On 19 MayÄąs, 19:00, Jonathan Bromley <s...@oxfordbromley.plus.com>
wrote:
On May 19, 4:41 pm, Ali Karaali <ali...@gmail.com> wrote:



I'm very newbie at verilog and I know this is very classical issue for
newbies but I haven't got rid of confusing yet. Very simple design
below;

module delay(input clk, output enable);
    reg[2:0] count;
    always@(posedge clk)
        count = count + 1;

    assign enable = (count == 7);
endmodule

module myTopModule(input clk, output out);
    wire tmp;
    always@(posedge clk)
        out <= tmp;

    delay d_i(.clk(clk), .enable(tmp));
endmodule

What is the problem of that design or are there any?

(1)
Variable "count" is not reset, so it will start at 3'bX
and will remain stuck at 3'bX throughout (because X+1 = X).
In hardware it would start at *some* value, and you would
indeed get one pulse per 8 clocks.  But simulation is
more pessimistic.  You need a reset.
I know I need a reset, I just wanted to interest just the real problem
but thanks.

(2)
Yes, you were right in the title: you *must* use
nonblocking assignment to "count", because its
value is used outside the always block:
So this is the rule? If I use a left hand side variable from outside
of the always block, I need an nonblocking assignment? Is it true for
the always@(*) because I used the code at below in a design as a
lookup table.

always@(*)
case(address)
3 : real_addr = 2;
6 : real_addr = 4;
9 : real_addr = 6;
default: real_addr = real_addr;
endcase

and I used the real_addr in another always block.


  always @(posedge clk)
     count <= count + 1;

(3)
Why two always blocks, a module instance and an assign?
Why not

module YourTopModule (
  input clk,
  input synchronous_reset,
  output reg out );

  reg [2:0] count;

  always @(posedge clk) begin
    if (synchronous_reset)
      count <= 0;
      out <= 0;
    else begin
      count <= count + 1;
      out <= (count == 7);
    end
  end
endmodule

Astute observers will note that it would then be possible
to bury "count" in the always block and assign to it using
blocking assignment, but let's set that aside for a while
to avoid unnecessary confusion for the OP.
Actually I wrote the code like that

reg [2:0] count;
always@(posedge clk) begin
count <= count + 1;
case(count)
0, 1, 2, 3, 4, 5, 6: out <= 0;
7: out <= 1;
endcase
end

and the instructor is also angry that because he said the design had
an unnecessary flop just before the out I suppose yours too?
--
Jonathan Bromley
 
On May 19, 8:41 am, Ali Karaali <ali...@gmail.com> wrote:
Hello everybody,

I'm very newbie at verilog and I know this is very classical issue for
newbies but I haven't got rid of confusing yet. Very simple design
below;

module delay(input clk, output enable);
    reg[2:0] count;
    always@(posedge clk)
        count = count + 1;

    assign enable = (count == 7);
endmodule

module myTopModule(input clk, output out);
    wire tmp;
    always@(posedge clk)
        out <= tmp;

    delay d_i(.clk(clk), .enable(tmp));
endmodule

What is the problem of that design or are there any? I tried to give
to output 7 cycles 0 and 1 cycle 1.
You should take a look at papers by Cliff Cummings, especially
http://www.sunburst-design.com/papers/CummingsSNUG2000SJ_NBA.pdf
He has other good papers at his web site that are very informative.

Note that rules are made to be broken, but Cliff's suggestions and
insights
will serve you well!

John Providenza
 
On May 19, 9:32 am, Ali Karaali <ali...@gmail.com> wrote:
On 19 MayÄąs, 19:00, Jonathan Bromley <s...@oxfordbromley.plus.com
wrote:



On May 19, 4:41 pm, Ali Karaali <ali...@gmail.com> wrote:

I'm very newbie at verilog and I know this is very classical issue for
newbies but I haven't got rid of confusing yet. Very simple design
below;

module delay(input clk, output enable);
    reg[2:0] count;
    always@(posedge clk)
        count = count + 1;

    assign enable = (count == 7);
endmodule

module myTopModule(input clk, output out);
    wire tmp;
    always@(posedge clk)
        out <= tmp;

    delay d_i(.clk(clk), .enable(tmp));
endmodule

What is the problem of that design or are there any?

(1)
Variable "count" is not reset, so it will start at 3'bX
and will remain stuck at 3'bX throughout (because X+1 = X).
In hardware it would start at *some* value, and you would
indeed get one pulse per 8 clocks.  But simulation is
more pessimistic.  You need a reset.

I know I need a reset, I just wanted to interest just the real problem
but thanks.

(2)
Yes, you were right in the title: you *must* use
nonblocking assignment to "count", because its
value is used outside the always block:

So this is the rule? If I use a left hand side variable from outside
of the always block, I need an nonblocking assignment? Is it true for
the always@(*) because I used the code at below in a design as a
lookup table.

always@(*)
    case(address)
        3      : real_addr = 2;
        6      : real_addr = 4;
        9      : real_addr = 6;
        default: real_addr = real_addr;
    endcase

and I used the real_addr in another always block.





  always @(posedge clk)
     count <= count + 1;

(3)
Why two always blocks, a module instance and an assign?
Why not

module YourTopModule (
  input clk,
  input synchronous_reset,
  output reg out );

  reg [2:0] count;

  always @(posedge clk) begin
    if (synchronous_reset)
      count <= 0;
      out <= 0;
    else begin
      count <= count + 1;
      out <= (count == 7);
    end
  end
endmodule

Astute observers will note that it would then be possible
to bury "count" in the always block and assign to it using
blocking assignment, but let's set that aside for a while
to avoid unnecessary confusion for the OP.

Actually I wrote the code like that

reg [2:0] count;
always@(posedge clk) begin
    count <= count + 1;
    case(count)
        0, 1, 2, 3, 4, 5, 6: out <= 0;
        7: out <= 1;
    endcase
end

and the instructor is also angry that because he said the design had
an unnecessary flop just before the out I suppose yours too?

--
Jonathan Bromley
Note that your case statement will infer a latch, usually not desired/
expected
and likely to irritate your professor/colleagues/tools
always@(*)
case(address)
3 : real_addr = 2;
6 : real_addr = 4;
9 : real_addr = 6;
default: real_addr = real_addr;
endcase
The default statement says "if you don't get 3, 6, or 9, hold the
current value".
Since this is not clocked, "hold" equates to a latch.

John Providenza
 
On Wed, 19 May 2010 09:32:09 -0700 (PDT), Ali Karaali wrote:

I know I need a reset, I just wanted to interest just the real problem
but thanks.
OK. Sorry, you said you were new to this game so it
seemed like an obvious thing to pick up.

(2)
Yes, you were right in the title: you *must* use
nonblocking assignment to "count", because its
value is used outside the always block:

So this is the rule? If I use a left hand side variable from outside
of the always block, I need an nonblocking assignment?
Very close. I should have been more precise: that rule is
valid for CLOCKED always blocks.

Is it true for
the always@(*) because I used the code at below in a design as a
lookup table.

always@(*)
case(address)
3 : real_addr = 2;
6 : real_addr = 4;
9 : real_addr = 6;
default: real_addr = real_addr;
endcase

and I used the real_addr in another always block.
No. Combinational always (using always @* or some other
complete sensitivity list) does not require nonblocking assignment.
It's usually harmless, but (as Cliff Cummings discusses in the
papers that JohnP mentioned) there are some situations
where it can be less appropriate. Blocking assignment is
good in combinational always blocks because such blocks
will invariably re-trigger whenever necessary, so they
are sure to settle to the correct value.

On the other hand,
clocked always blocks trigger just once because of the
external clock condition - note that this clock does
NOT appear anywhere else in the code of the always
block - and it's easy to get read/write race conditions
if you don't use nonblocking assignment to variables
that are used outside the always block.

(3)
Why two always blocks, a module instance and an assign?
[snip]
Actually I wrote the code like that

reg [2:0] count;
always@(posedge clk) begin
count <= count + 1;
case(count)
0, 1, 2, 3, 4, 5, 6: out <= 0;
7: out <= 1;
endcase
end

and the instructor is also angry that because he said the design had
an unnecessary flop just before the out I suppose yours too?
Hmmm.. Your original code had an additional flop on "out"
(in the second clocked always block, in the top module)
so I didn't have any problem with putting one in myself :)

If you absolutely insist on not having a flop on "out" (and I can't
see why you should so insist; your instructor owes you that
explanation) then you must combinationally decode it off "count",
without an added flop. That is bad for other reasons. But if the
specification said "only three flops", then you have no choice.
For extra credit, work out how to use those three flops to make
an 8-state counter that can be decoded without glitches :)
--
Jonathan Bromley
 
On Wed, 19 May 2010 09:32:09 -0700 (PDT), Ali Karaali wrote:

An afterthought:

I used the code at below in a design as a lookup table.
I think JohnP may have mentioned this...
your design is flawed.

always@(*)
case(address)
3 : real_addr = 2;
6 : real_addr = 4;
9 : real_addr = 6;
default: real_addr = real_addr;
endcase
The "no change" assignment real_addr=real_addr
(or, equivalently, missing off the default clause completely)
implies a latch, which is not nice. It's certainly not a
lookup table. Did you perhaps intend

default: real_addr = address;

?? That would have been OK.

An always@* block (or any logically equivalent construct,
such as a continuous assignment) should always generate
a pure function of its inputs, so that it maps to combinational
logic.

In verification (testbench) code none of these rules
necessarily apply. They are there to ensure reliable
mapping between simulation behaviour and the final
results of logic synthesis.
--
Jonathan Bromley
 
On Wed, 19 May 2010 14:55:44 -0700 (PDT), Ali Karaali wrote:

Confusing goes on. Actually I'm a grad student and after the many
years imperative languages experience, that's a kind of languages is
very hard to understand. Sorry for "easy" questions that I have one
million... :)
OK! Some parts of Verilog are just like other imperative
languages, but some are quite different....

If need to back the first design, what is the problem? Why can't I use
blocking assignment in clocked always block. You said if you use the
value used outside the always block, so what happened if I used?

I know that
assign enable = (count == 7);
and
always@(*)
enable = (count == 7);
are same, aren't it? And the execution of the star always block will
begin when the count is changed. When will count change? The blocking
assignment is finished.
Yes, you're right so far.

always@(posedge clk)
count = count + 1;

So what is wrong?
Nothing is wrong IN THAT EXAMPLE ALONE.

But, in the real world, such examples fit together
into bigger designs. That's where the trouble starts.

If I say,
event_1 is and
always@(posedge clk)
count = count + 1;
event_2 is
always@(*)
enable = (count == 7);

The order of the events are guaranteed to event_1 and event_2?
Yes (unless some other block also updates "count", but that
would be very bad for design).

But please consider this little design:

always @(posedge clock)
if (reset)
Q0 = 0;
else
Q0 = ~Q1;

always @(posedge clock)
if (reset)
Q1 = 0;
else
Q1 = Q0;

Each block, taken on its own, is a good description of
a flipflop with synchronous reset. Linking the two blocks
together, you should expect to get a 2-bit twisted ring
counter (count sequence 00-01-11-10-00), right?
No, wrong. The problem is that the SIMULATOR
must execute the two blocks sequentially at the
moment of @(posedge clock). If the first block
executes first, then Q1 will get the wrong value.
If the second block executes first, then Q0 will
get the wrong value. Simulation will not match
the results you get from synthesised hardware.

But when we use nonblocking assignment,
life is good:

always @(posedge clock)
if (reset)
Q0 <= 0;
else
Q0 <= ~Q1;

always @(posedge clock)
if (reset)
Q1 <= 0;
else
Q1 <= Q0;

Now, when @(posedge clock) happens, the
two blocks both execute - in some unknown
sequence - but that's OK because each one
uses the OLD value for the right-hand side of
the assignments. In other words, the value of
Q0 and Q1 as they were just before the clock.
The nonblocking assignment <= does NOT
immediately update the left-side target. Instead
it schedules the update to take place a short time
later.

What does "a short time" mean? It is what VHDL
would call a "delta cycle": we wait until ALL always
blocks have finished executing and are waiting for
their controlling event. Then, and NOT BEFORE,
we update all the left-side target variables with their
new scheduled values. Of course this may trigger
some other always blocks to execute, but that's OK
because NONE OF THE UPDATES HAVE CAUSED A
NEW CLOCK EVENT so the clocked always blocks
will NOT execute again.

Please don't fight it. Follow the rule - the only one
that matters:
- Use blocking assignment (=) everywhere, EXCEPT:
- when assigning, in a clocked always block, to a
variable that will be used outside the block, use
nonblocking assignment (<=).

You will read other, less flexible rules - for example in
Cliff Cummings's papers. That's OK too. It is NOT
OK to use blocking assignment to clocked variables
that will be used in other always blocks.

Enjoy!
--
Jonathan Bromley
 
On 19 MayÄąs, 23:03, Jonathan Bromley <s...@oxfordbromley.plus.com>
wrote:
On Wed, 19 May 2010 09:32:09 -0700 (PDT), Ali Karaali wrote:
I know I need a reset, I just wanted to interest just the real problem
but thanks.

OK.  Sorry, you said you were new to this game so it
seemed like an obvious thing to pick up.

(2)
Yes, you were right in the title: you *must* use
nonblocking assignment to "count", because its
value is used outside the always block:

So this is the rule? If I use a left hand side variable from outside
of the always block, I need an nonblocking assignment?

Very close.  I should have been more precise: that rule is
valid for CLOCKED always blocks.
Confusing goes on. Actually I'm a grad student and after the many
years imperative languages experience, that's a kind of languages is
very hard to understand. Sorry for "easy" questions that I have one
million... :)

I didn't read the Cliff's papers but I will as soon as possible.

If need to back the first design, what is the problem? Why can't I use
blocking assignment in clocked always block. You said if you use the
value used outside the always block, so what happened if I used?

I know that
assign enable = (count == 7);
and
always@(*)
enable = (count == 7);
are same, aren't it? And the execution of the star always block will
begin when the count is changed. When will count change? The blocking
assignment is finished.

always@(posedge clk)
count = count + 1;

So what is wrong? Really I don't understand more correctly can't. Is
there any wrong the order of the changing values of count and enable?
If so, what, why? :)
If I say,
event_1 is and
always@(posedge clk)
count = count + 1;
event_2 is
always@(*)
enable = (count == 7);

The order of the events are guaranteed to event_1 and event_2?




Is it true for
the always@(*) because I used the code at below in a design as a
lookup table.

always@(*)
   case(address)
       3      : real_addr = 2;
       6      : real_addr = 4;
       9      : real_addr = 6;
       default: real_addr = real_addr;
   endcase

and I used the real_addr in another always block.

No.  Combinational always (using always @* or some other
complete sensitivity list) does not require nonblocking assignment.
It's usually harmless, but (as Cliff Cummings discusses in the
papers that JohnP mentioned) there are some situations
where it can be less appropriate.  Blocking assignment is
good in combinational always blocks because such blocks
will invariably re-trigger whenever necessary, so they
are sure to settle to the correct value.  

On the other hand,
clocked always blocks trigger just once because of the
external clock condition - note that this clock does
NOT appear anywhere else in the code of the always
block - and it's easy to get read/write race conditions
if you don't use nonblocking assignment to variables
that are used outside the always block.


(3)
Why two always blocks, a module instance and an assign?
[snip]
Actually I wrote the code like that

reg [2:0] count;
always@(posedge clk) begin
   count <= count + 1;
   case(count)
       0, 1, 2, 3, 4, 5, 6: out <= 0;
       7: out <= 1;
   endcase
end

and the instructor is also angry that because he said the design had
an unnecessary flop just before the out I suppose yours too?

Hmmm.. Your original code had an additional flop on "out"
(in the second clocked always block, in the top module)
so I didn't have any problem with putting one in myself :)

If you absolutely insist on not having a flop on "out" (and I can't
see why you should so insist; your instructor owes you that
explanation) then you must combinationally decode it off "count",
without an added flop.  That is bad for other reasons.  But if the
specification said "only three flops", then you have no choice.
For extra credit, work out how to use those three flops to make
an 8-state counter that can be decoded without glitches :)
--
Jonathan Bromley
 
On May 19, 6:33 pm, Jonathan Bromley <s...@oxfordbromley.plus.com>
wrote:
On Wed, 19 May 2010 14:55:44 -0700 (PDT), Ali Karaali wrote:
If I say,
event_1 is and
always@(posedge clk)
   count = count + 1;
event_2 is
always@(*)
   enable = (count == 7);

The order of the events are guaranteed to event_1 and event_2?

Yes (unless some other block also updates "count", but that
would be very bad for design).

But please consider this little design:

  always @(posedge clock)
    if (reset)
      Q0 = 0;
   else
      Q0 = ~Q1;

  always @(posedge clock)
    if (reset)
      Q1 = 0;
   else
      Q1 = Q0;

Each block, taken on its own, is a good description of
a flipflop with synchronous reset.  Linking the two blocks
together, you should expect to get a 2-bit twisted ring
counter (count sequence 00-01-11-10-00), right?
No, wrong.  The problem is that the SIMULATOR
must execute the two blocks sequentially at the
moment of @(posedge clock).  If the first block
executes first, then Q1 will get the wrong value.
If the second block executes first, then Q0 will
get the wrong value.  Simulation will not match
the results you get from synthesised hardware.

But when we use nonblocking assignment,
life is good:

  always @(posedge clock)
    if (reset)
      Q0 <= 0;
   else
      Q0 <= ~Q1;

  always @(posedge clock)
    if (reset)
      Q1 <= 0;
   else
      Q1 <= Q0;

Now, when @(posedge clock) happens, the
two blocks both execute - in some unknown
sequence - but that's OK because each one
uses the OLD value for the right-hand side of
the assignments.  In other words, the value of
Q0 and Q1 as they were just before the clock.
The nonblocking assignment <= does NOT
immediately update the left-side target.  Instead
it schedules the update to take place a short time
later.

What does "a short time" mean?  It is what VHDL
would call a "delta cycle": we wait until ALL always
blocks have finished executing and are waiting for
their controlling event.  Then, and NOT BEFORE,
we update all the left-side target variables with their
new scheduled values.  Of course this may trigger
some other always blocks to execute, but that's OK
because NONE OF THE UPDATES HAVE CAUSED A
NEW CLOCK EVENT so the clocked always blocks
will NOT execute again.

Please don't fight it.  Follow the rule - the only one
that matters:
- Use blocking assignment (=) everywhere, EXCEPT:
- when assigning, in a clocked always block, to a
  variable that will be used outside the block, use
  nonblocking assignment (<=).

You will read other, less flexible rules - for example in
Cliff Cummings's papers.  That's OK too.  It is NOT
OK to use blocking assignment to clocked variables
that will be used in other always blocks.
I don't use Verilog much so this is a bit confusing to me. I have
always equated the blocking assignment to variable assignment in VHDL
and non-blocking assignment to signal assignment. Your statement that
blocking assignments for combinatorial logic in always blocks doesn't
map to VHDL. What is wrong with non-blocking assignments for
combinatorial logic in always blocks?

Rick
 
On May 19, 9:52 pm, rickman <gnu...@gmail.com> wrote:
On May 19, 6:33 pm, Jonathan Bromley <s...@oxfordbromley.plus.com
wrote:
Please don't fight it.  Follow the rule - the only one
that matters:
- Use blocking assignment (=) everywhere, EXCEPT:
- when assigning, in a clocked always block, to a
  variable that will be used outside the block, use
  nonblocking assignment (<=).

You will read other, less flexible rules - for example in
Cliff Cummings's papers.  That's OK too.  It is NOT
OK to use blocking assignment to clocked variables
that will be used in other always blocks.

I don't use Verilog much so this is a bit confusing to me.  I have
always equated the blocking assignment to variable assignment in VHDL
and non-blocking assignment to signal assignment.  Your statement that
blocking assignments for combinatorial logic in always blocks doesn't
map to VHDL.  What is wrong with non-blocking assignments for
combinatorial logic in always blocks?
I know that it can sometimes cause simulation slowdowns if you use non-
blocking assignments in combinatorial always blocks. Other than that,
I've never done it, so never thought deeply about it.

Regards,
Pat
 
[me]
- Use blocking assignment (=) everywhere, EXCEPT:
- when assigning, in a clocked always block, to a
  variable that will be used outside the block, use
  nonblocking assignment (<=).
[Rick]
I don't use Verilog much so this is a bit confusing to me.  I have
always equated the blocking assignment to variable assignment in VHDL
and non-blocking assignment to signal assignment.  Your statement that
blocking assignments for combinatorial logic in always blocks doesn't
map to VHDL.  What is wrong with non-blocking assignments for
combinatorial logic in always blocks?
There's nothing _wrong_ with it, and you're broadly right to
think of nonblocking assignment as similar to signal assignment
in VHDL. However, there are a few issues that change the scene.

First, the continuous assignment in Verilog

assign target = expression;

is superficially very much like a VHDL concurrent signal
assignment, but it has blocking assignment (no delta delay)
behaviour. So it's probably more consistent to make all your
other combinational logic have that same behaviour too.

The second issue is that blocking assignment in combinational
logic allows you to implement clock gating without a delta
delay. As I'm sure you are aware, the fact that every VHDL
signal assignment - even a trivial copy - costs a delta delay
can easily introduce strange behaviour in RTL (zero-delay)
simulation, if you do any manipulation at all of the clock.
Blocking assignment in Verilog combinational logic allows you
to avoid that problem in almost all practical cases.

There's a further tweak, too - what should you do about latches
(if you use them)? Most people would say that assignments to
latch variables like this...

always @*
if (Transparent)
Q = D;

should be done with blocking assignment because they are
essentially combinational.

Patrick is right that nonblocking assignment has a
performance cost in the simulator. It typically prevents
the simulator from merging combinational blocks into a
single sequential process. Whether that is a significant
issue in practice I really have no idea. My gut feeling
is that the difference would be insignificant in most
realistic simulations, but others may know different from
experience.
--
Jonathan Bromley
 
On May 20, 4:52 am, rickman <gnu...@gmail.com> wrote:

I don't use Verilog much so this is a bit confusing to me.  I have
always equated the blocking assignment to variable assignment in VHDL
and non-blocking assignment to signal assignment.  Your statement that
blocking assignments for combinatorial logic in always blocks doesn't
map to VHDL.  What is wrong with non-blocking assignments for
combinatorial logic in always blocks?
Absolutely nothing. In fact, using non-blocking assignments in the
VHDL way, for every signal that communicates with other blocks, is a
very good idea.

Cummings' guideline to use only blocking assignments for combinatorial
logic is problematic, because it creates an unnecessary exception that
encourages something that is inherently dangerous.

The fact that communication based on blocking assignments works for
combinatorial logic is a coincidence and actually not that trivial to
prove. It depends not only on the inherent nature of combinatorial
logic, but also on "sensible usage".

Cummings' guidelines are problematic in general because they
artificially discuss races in the context of synthesizable logic. But
Verilog, the language, doesn't care about synthesis. Races are races,
and there are plenty of race opportunities in high level models and
test benches also. Those cases need a working guideline too.

So here it is: Decaluwe's universal guideline for race-free HDL
assignments.
"""
Use non-blocking assignments (signals) for communication.
Use blocking assignments (variables) for local computation.
"""

In other words: keep those bl**king assignments local :)

It works for everything: for combinatorial logic, sequential logic,
test benches and high level models. Moreover, it works for both
Verilog and VHDL (and MyHDL). Finally, it lets you use blocking
assignments locally, including in sequential logic descriptions, a
feature dear to my heart.

Think about it: one simple guideline which is safer and more powerful,
and you can simply forget about Cummings' awkward guidelines and
reasoning style. Life can be simple :)

Jan
 
On May 20, 3:38 am, Jonathan Bromley <s...@oxfordbromley.plus.com>
wrote:
- Use blocking assignment (=) everywhere, EXCEPT:
- when assigning, in a clocked always block, to a
  variable that will be used outside the block, use
  nonblocking assignment (<=).

[Rick]

I don't use Verilog much so this is a bit confusing to me.  I have
always equated the blocking assignment to variable assignment in VHDL
and non-blocking assignment to signal assignment.  Your statement that
blocking assignments for combinatorial logic in always blocks doesn't
map to VHDL.  What is wrong with non-blocking assignments for
combinatorial logic in always blocks?

There's nothing _wrong_ with it, and you're broadly right to
think of nonblocking assignment as similar to signal assignment
in VHDL.  However, there are a few issues that change the scene.

First, the continuous assignment in Verilog

  assign target = expression;

is superficially very much like a VHDL concurrent signal
assignment, but it has blocking assignment (no delta delay)
behaviour.  So it's probably more consistent to make all your
other combinational logic have that same behaviour too.

The second issue is that blocking assignment in combinational
logic allows you to implement clock gating without a delta
delay.  As I'm sure you are aware, the fact that every VHDL
signal assignment - even a trivial copy - costs a delta delay
can easily introduce strange behaviour in RTL (zero-delay)
simulation, if you do any manipulation at all of the clock.
Blocking assignment in Verilog combinational logic allows you
to avoid that problem in almost all practical cases.
I was not aware that Verilog did not use delta delays in combinatorial
assignments. I would say in this case blocking assignments are a bad
thing. If I am gating my clock, then that introduces delays. If
simulating a delta delay causes issues in my simulation, then that
would match the real world, no? I suppose there are tools used to
assure that minimum hold times are maintained, which is where clock
skew would bite you.


There's a further tweak, too - what should you do about latches
(if you use them)?  Most people would say that assignments to
latch variables like this...

   always @*
     if (Transparent)
       Q = D;

should be done with blocking assignment because they are
essentially combinational.
But why would it matter? Convention is fine, but if it doesn't have a
purpose, then what's the... purpose?


Patrick is right that nonblocking assignment has a
performance cost in the simulator.  It typically prevents
the simulator from merging combinational blocks into a
single sequential process.  Whether that is a significant
issue in practice I really have no idea.  My gut feeling
is that the difference would be insignificant in most
realistic simulations, but others may know different from
experience.
How could combinatorial blocks be merged or even properly be evaluated
without delta delays? Help me understand.

assign A = B xor C;

assign B = not C;

If C changes, does A fire once or twice? Does it depend on the order
of evaluation of the two assignments? Does A ever have the value of
zero?

I'm not so worried about performance issues unless they are truly
significant. I'm much more concerned that the simulation matches the
real world as much as possible.

BTW, is there a reason why the non-delay assignment is called
"blocking"? I'm trying to come up with a way to remember the names
correctly. It seems to me the delayed assignment would be called
blocking...

Rick
 
Rick,

I was not aware that Verilog did not use delta delays in combinatorial
assignments.  I would say in this case blocking assignments are a bad
thing.  If I am gating my clock, then that introduces delays.  If
simulating a delta delay causes issues in my simulation, then that
would match the real world, no?  I suppose there are tools used to
assure that minimum hold times are maintained, which is where clock
skew would bite you.
Right - I think the idea is that clock gating would normally
be designed (after P&R) to avoid hold time trouble, but it's
tough to represent that in your RTL, so one possible way out
is to arrange that zero-delay combinational logic is "faster"
than zero-delay clock-to-output propagation of a FF. This will
automatically give RTL zero-delay sim that matches your finished
device with real delays in it.

There's a further tweak, too - what should you do about latches

But why would it matter?  Convention is fine, but if it doesn't have a
purpose, then what's the... purpose?
Errrm, I don't really know... Cliff Cummings discusses this briefly
in the well-known publications, but I've never had any reason to
design latches in Verilog so I didn't give it much attention.
I'm pretty sure that the same arguments apply as for combinational
logic: nonblocking <= assignment will work just fine, but introduces
delta delays that might possibly cause trouble with clock skew;
blocking = assignment is also OK and may possibly help to avoid
the clock skew problem.

How could combinatorial blocks be merged or even properly be evaluated
without delta delays?  Help me understand.

assign A = B xor C;
assign B = not C;
That's a perfect example: the simulator can analyse the
dependencies among variables, and then collapse that to
(approximately!)

always @(C) begin
B = not C;
A = B xor C;
end

The transformation saves some swapping between processes,
and there's no visible difference to the user. It's
process swapping that dominates simulation performance
for RTL, and especially for gate-level sim, so this
kind of process merging is good. In VHDL it usually
can be done only for processes that have identical
sensitivity lists (in other words, clocked processes
triggered by the same clock edge).

If C changes, does A fire once or twice?  Does it depend on the order
of evaluation of the two assignments?  
I guess so. But I defy you to write any code that would reliably
allow you to tell the difference, and of course you would never
expect RTL code to model combinational glitching in a trustworthy
manner, so there's nothing wrong with that. The final settled
result will be the same in either case.

Does A ever have the value of zero?
It might, but it would have that value for "less than a delta
cycle" (in VHDL-speak) - the zero might be present on the
internal representation of A, but only within iterations around
the Active region of the Verilog scheduler (don't ask); and
simulators are NOT obliged to detect such glitches even if you
have some other block somewhere like this...

always @A $display("at time %0d, A=%b", $time, A);

Delta-delay glitches, such as can be obtained with
combinational logic in VHDL or nonblocking assignment
in Verilog, *would* be detected by that code.

I'm not so worried about performance issues unless they are truly
significant.  I'm much more concerned that the simulation matches the
real world as much as possible.
Fully agreed. I was merely responding to Patrick's point.
On the other hand, one of Verilog's flagship features is
its performance (matters very much to post-layout sim people)
and some of the rules about nondeterminacy are at least in
part designed to help get that performance by allowing
simulators to do various optimisations. Mismatches with
reality will always happen when you work in the grey areas,
and that's precisely what "coding guidelines" are designed
to help you avoid (as, of course, you know very well).

BTW, is there a reason why the non-delay assignment is called
"blocking"?  I'm trying to come up with a way to remember the names
correctly.  It seems to me the delayed assignment would be called
blocking...
"Blocking" in the sense of blocking the flow of procedural
execution until the update has taken effect. The meaning
is more obvious when you add intra-assignment delay (roughly
equivalent to VHDL "after" delay):

A = #5 EXPR; // (1) Blocking
A <= #5 EXPR; // (2) Nonblocking

Line (1) first evaluates EXPR, then blocks execution for 5 time units,
then updates A, then moves on to executing the next statement.

Line (2) evaluates EXPR, but then does NOT block; it simultaneously:
- sets up a scheduled update of A at a time 5 units in the future;
- moves on immediately to execution of the next statement.

The second form (2), nonblocking, is very similar to VHDL
A <= transport EXPR after 5 ns;

The first form (1), blocking, is effectively:
temp_variable = EXPR;
#5; // like WAIT FOR 5 NS;
A = temp_variable;

Anyway, that's the reason for the [non]blocking naming, I think.

I don't think anyone can be blamed for wanting to steer clear
of all this stuff....

--
Jonathan Bromley
 
On May 20, 3:08 am, Jan Decaluwe <j...@jandecaluwe.com> wrote:

Cummings' guideline to use only blocking assignments for combinatorial
logic is problematic, because it creates an unnecessary exception that
encourages something that is inherently dangerous.
Can you show a real-world example of this danger?

The fact that communication based on blocking assignments works for
combinatorial logic is a coincidence and actually not that trivial to
prove. It depends not only on the inherent nature of combinatorial
logic, but also on "sensible usage".
Blocking assignments to "registers" inside non-clocked blocks and
continuous assignments to "wires" are essentially the same thing. I
don't see this as any kind of coincidence. If proof of a non-clocked
set of gates requires some sort of inherent propagation delay, that
might make the proof suspect, unless you can also prove that the
propagation delay is the correct magnitude.

Cummings' guidelines are problematic in general because they
artificially discuss races in the context of synthesizable logic. But
Verilog, the language, doesn't care about synthesis. Races are races,
and there are plenty of race opportunities in high level models and
test benches also. Those cases need a working guideline too.
In general, the bad effects of a race in your test bench will be that
the test fails. In general the bad effects of a post-synthesis race
are either, again, that the test fails (if you are lucky) or that the
silicon fails (if you are unlucky). So why is it a problem to explain
things in the context of synthesis?

So here it is: Decaluwe's universal guideline for race-free HDL
assignments.
"""
Use non-blocking assignments (signals) for communication.
Use blocking assignments (variables) for local computation.
"""
[ Rest of comment on this snipped. ]

Yes, this will work. But in practice, examination to insure that
these guidelines have been followed can be more time-consuming than if
other guidelines are followed, and it may be possible, using just
these guidelines, to write code that may be more conceptually
difficult to understand than the code you write using other
guidelines. But that's (obviously) just my opinion.

Regards,
Pat
 
On Thu, 20 May 2010 10:28:11 -0700 (PDT), Patrick Maupin wrote:

On May 20, 3:08 am, Jan Decaluwe <j...@jandecaluwe.com> wrote:

Cummings' guideline to use only blocking assignments for combinatorial
logic is problematic, because it creates an unnecessary exception that
encourages something that is inherently dangerous.

Can you show a real-world example of this danger?
Easily, if you will permit the choice of "real world" to
extend somewhat outside the very constrained bounds of
RTL-for-synthesis.

The fact that communication based on blocking assignments works for
combinatorial logic is a coincidence and actually not that trivial to
prove. It depends not only on the inherent nature of combinatorial
logic, but also on "sensible usage".
I suspect that Jan, like me, is unhappy about Verilog's
completely uncontrolled concurrency model.

Decades ago, software folk discovered that you need some
structures and disciplines to make concurrency work.
Without them, processes merrily trample on each other's
variables and all bets are off. There are many possible
ways to do it right, some prettier than others:

- evaluate/update disciplines (VHDL, SystemC, various other
discrete-event simulation languages)
- semaphores (readily available as OS primitives in many
environments)
- monitors (ditto) - VHDL protected types are a bit like this
- Ada-style client-server rendezvous
- CSP (Occam, Handel-C, Haste)

And then there's Verilog:
- anyone can mess with any shared variable at any time,
with an explicit caveat in the language that you can't
expect any kind of built-in control

So I would totally support Jan's position that combinational
logic, with its acyclic dependencies and stateless behaviour,
is a special case that just happens to work well in Verilog.
The late addition of nonblocking assignment to Verilog was,
I suggest, an admission that this uncontrolled concurrency
was indeed a very serious problem in other situations.
Clocked logic is just one such, but one that is likely to
be evident and problematic for hardware designers.

Blocking assignments to "registers" inside non-clocked blocks and
continuous assignments to "wires" are essentially the same thing.
In the synthesis mindset, yes. But only there. In the
semantics of Verilog as a simulation language (and that's
how it's defined) they are as different as chalk and cheese.
They happen to give the same results if you use them
to model zero-delay combinational logic. Of course
that is partly a reflection of Verilog's cleverly
focused design for its chosen applications.

don't see this as any kind of coincidence.
No more than the "coincidence" that concurrent signal
assignment and a combunational-style process in VHDL
are exactly equivalent by definition. And there they
both exhibit delta-delay behaviour. Once again:
combinational logic is a special case that can be
handled in a variety of ways. Some of those ways don't
work reliably for more general concurrency problems.

If proof of a non-clocked
set of gates requires some sort of inherent propagation delay, that
might make the proof suspect, unless you can also prove that the
propagation delay is the correct magnitude.
There is of course no such "proof"; indeed, the nature of
combinational logic (output must be a pure function of
inputs) makes it possible to prove just the opposite.
I don't think that in any way invalidates Jan's concerns,
except possibly from a myopic RTL-centric viewpoint.
--
Jonathan Bromley
 
On May 20, 3:34 pm, Jonathan Bromley <s...@oxfordbromley.plus.com>
wrote:
On Thu, 20 May 2010 10:28:11 -0700 (PDT), Patrick Maupin wrote:
On May 20, 3:08 am, Jan Decaluwe <j...@jandecaluwe.com> wrote:

Cummings' guideline to use only blocking assignments for combinatorial
logic is problematic, because it creates an unnecessary exception that
encourages something that is inherently dangerous.

Can you show a real-world example of this danger?

Easily, if you will permit the choice of "real world" to
extend somewhat outside the very constrained bounds of
RTL-for-synthesis.
OK, I'll bite. Show me...

[ Other stuff snipped..]

I suspect that Jan, like me, is unhappy about Verilog's
completely uncontrolled concurrency model.

Decades ago, software folk discovered that you need some
structures and disciplines to make concurrency work.  
Without them, processes merrily trample on each other's
variables and all bets are off.  There are many possible
ways to do it right, some prettier than others:

- evaluate/update disciplines (VHDL, SystemC, various other
  discrete-event simulation languages)
- semaphores (readily available as OS primitives in many
  environments)
- monitors (ditto) - VHDL protected types are a bit like this
- Ada-style client-server rendezvous
- CSP (Occam, Handel-C, Haste)
While the language constructs are nice, certainly it is possible to do
useful concurrency work in C. And semaphores, monitors, and mutexes
are used from C every day without a fundamental language rework. (As
an aside, while erlang or go may be the way of the future, replacing
Ada and Occam as the way of the future, and while the C/C++ standard
may lurch towards the future by digesting those really useful bits
from other languages, it is possible that the dominant system language
for a long time to come will be called C, just as the dominant
technical language is called English.)

And then there's Verilog:
- anyone can mess with any shared variable at any time,
  with an explicit caveat in the language that you can't
  expect any kind of built-in control
Au contraire. It may not be as great as some of the languages you
mentioned, but it's certainly much better than C. If you want to
insure that another process can never see incoherent state between
variable X and variable Y, just make sure you update them at exactly
the same time. No semaphore, mutex, or monitor required. Also,
unlike C, (assuming you at least follow the rule of at most one module
per file) two different processes in different files cannot write to
the same "shared variable" in synthesizable code at all, and can't
very easily do it inadvertently in non-synthesizable code.

So I would totally support Jan's position that combinational
logic, with its acyclic dependencies and stateless behaviour,
is a special case that just happens to work well in Verilog.
Special case, perhaps. Coincidence? I think not :)

The late addition of nonblocking assignment to Verilog was,
I suggest, an admission that this uncontrolled concurrency
was indeed a very serious problem in other situations.
Clocked logic is just one such, but one that is likely to
be evident and problematic for hardware designers.

Blocking assignments to "registers" inside non-clocked blocks and
continuous assignments to "wires" are essentially the same thing.

In the synthesis mindset, yes.  But only there.  In the
semantics of Verilog as a simulation language (and that's
how it's defined) they are as different as chalk and cheese.
They happen to give the same results if you use them
to model zero-delay combinational logic.  Of course
that is partly a reflection of Verilog's cleverly
focused design for its chosen applications.
I do view the synthesis results as the end product, yes. Anything
that has to be done in the testbench is ancillary to that (and by the
way, Verilog's laissez-faire attitude to coding is a double-edged
sword that can, in some cases, make the test code very readable and
maintainable).

don't see this as any kind of coincidence.

No more than the "coincidence" that concurrent signal
assignment and a combunational-style process in VHDL
are exactly equivalent by definition. And there they
both exhibit delta-delay behaviour.  Once again:
combinational logic is a special case that can be
handled in a variety of ways.  Some of those ways don't
work reliably for more general concurrency problems.

If proof of a non-clocked
set of gates requires some sort of inherent propagation delay, that
might make the proof suspect, unless you can also prove that the
propagation delay is the correct magnitude.

There is of course no such "proof"; indeed, the nature of
combinational logic (output must be a pure function of
inputs) makes it possible to prove just the opposite.
Agreed.

I don't think that in any way invalidates Jan's concerns,
except possibly from a myopic RTL-centric viewpoint.
Guilty as charged. While the process of getting to correct
synthesizable logic is interesting, useful, and necessary, the actual
synthesizable logic itself is, if not more interesting, at least more
useful and necessary.

Regards,
Pat
 
Patrick,

I don't have time to write a complete reply just now,
but I can't let this piece of nonsense go unchallenged:

[me]
And then there's Verilog:
- anyone can mess with any shared variable at any time,
snip
[Pat]
If you want to
insure that another process can never see incoherent state between
variable X and variable Y, just make sure you update them at exactly
the same time.
That is simply untrue:

always @(posedge clock)
X = some_expression;
always @(posedge clock)
Y = some_other_expression;
always @(posedge clock) begin : observer
if (some relationship between X and Y) ....;
end

"observer" has absolutely no guarantee of coherence of X and Y.
So, you may say, let's update them together:

always @(posedge clock) begin
X = some_expression;
Y = some_other_expression;
end

Bzzzt! Doesn't work: Verilog does not guarantee atomic execution
of a zero-time sequential block, so "observer" still has no
guarantee of coherence. So we must resort to yet another
special case (and even this unusable mess isn't strictly
guaranteed to give atomic update, even though it will
most likely do so in practice):

always @(posedge clock)
{X,Y} = {some_expression, some_other_expression};

No semaphore, mutex, or monitor required.
You're kidding, right? In Verilog you make this
work correctly by wheeling out the evaluate/update
model made possible by nonblocking assignment.
In other environments you would use other exclusion
or synchronisation primitives. Either way, you need
some discipline.

Verilog is _much_ worse than C in this regard because
it has concurrency constructs built in to the language,
but they are completely undisciplined. In C, to get
concurrency you must appeal to some library or toolkit;
if properly designed, that library will provide not
only the parallelism but also the synchronisation
primitives that you need, so you get a proper way to
do things as a single package.

I'll try to respond more thoughtfully to your other
points over the weekend.
--
Jonathan Bromley
 
On May 20, 10:15 am, Jonathan Bromley <s...@oxfordbromley.plus.com>
wrote:
Rick,

I was not aware that Verilog did not use delta delays in combinatorial
assignments.  I would say in this case blocking assignments are a bad
thing.  If I am gating my clock, then that introduces delays.  If
simulating a delta delay causes issues in my simulation, then that
would match the real world, no?  I suppose there are tools used to
assure that minimum hold times are maintained, which is where clock
skew would bite you.

Right - I think the idea is that clock gating would normally
be designed (after P&R) to avoid hold time trouble, but it's
tough to represent that in your RTL, so one possible way out
is to arrange that zero-delay combinational logic is "faster"
than zero-delay clock-to-output propagation of a FF.  This will
automatically give RTL zero-delay sim that matches your finished
device with real delays in it.
Personally I think this is a strange concept that a zero-delay sim
should be... I can't think of a word, "optimized" is not right,
"tweaked" seems biased... "adjusted" to allow a particular type of
design to simulate as if there were no delay in a logic element, but
that there is a delay in sequential elements. This can be handled
very easily by adding explicit delays to sequential elements. Then it
is very clear that an assumption is being made about relative delays.
I guess this shows that I never gate clocks. I can see how this would
be a real can of worms.

Has anyone here used clock gating in VHDL? How do you handle that?


How could combinatorial blocks be merged or even properly be evaluated
without delta delays?  Help me understand.

assign A = B xor C;
assign B = not C;

That's a perfect example: the simulator can analyse the
dependencies among variables, and then collapse that to
(approximately!)

  always @(C) begin
    B = not C;
    A = B xor C;
  end

The transformation saves some swapping between processes,
and there's no visible difference to the user.  
Except that it produces a glitch free output which is not realistic.
Of course, in the real world the implemented logic will be determined
by the tools so there may or may not be a glitch in the output. So I
guess this is similar to clock gating.


BTW, is there a reason why the non-delay assignment is called
"blocking"?  I'm trying to come up with a way to remember the names
correctly.  It seems to me the delayed assignment would be called
blocking...

"Blocking" in the sense of blocking the flow of procedural
execution until the update has taken effect.  The meaning
is more obvious when you add intra-assignment delay (roughly
equivalent to VHDL "after" delay):

  A = #5 EXPR;  // (1) Blocking
  A <= #5 EXPR; // (2) Nonblocking

Line (1) first evaluates EXPR, then blocks execution for 5 time units,
then updates A, then moves on to executing the next statement.
Wow! I didn't know it worked like that. I don't think that is the
same in VHDL. I think the VHDL "after" delay is done without blocking
the execution of the sequential flow even for variables.


Line (2) evaluates EXPR, but then does NOT block; it simultaneously:
- sets up a scheduled update of A at a time 5 units in the future;
- moves on immediately to execution of the next statement.

The second form (2), nonblocking, is very similar to VHDL
  A <= transport EXPR after 5 ns;

The first form (1), blocking, is effectively:
  temp_variable = EXPR;
  #5;  // like WAIT FOR 5 NS;
  A = temp_variable;

Anyway, that's the reason for the [non]blocking naming, I think.

I don't think anyone can be blamed for wanting to steer clear
of all this stuff....
Yeah, it will be interesting to learn.

Rick
 

Welcome to EDABoard.com

Sponsor

Back
Top