Use of #delay_value in synchronous processes?

JT · Oct 22, 2003

I've seen verilog code that uses #delay_value in synchronous processes
while doing the assign with a non blocking assignment:

(taken from a previous posting about grey counters)
always @(posedge clk or posedge global_async_reset)
gray_ctr <= #1 global_async_reset ? 0 :
preload_en ? preload_val :
inc_en ? gray_plus_1 :
gray_ctr;

or

always @ (posedge fpga_sysclk_b or negedge reset_l)
begin
if (!reset_l)
begin
mv_in <= `dly 'h0;
mv_datai <= `dly 'h0;
end

else
begin
mv_datai <= `dly mv_dataix;
mv_in <= `dly mv_datai;
end
end

I have never encountered problems in not using delays in sequential
processes with non blocking assignments. Why do people use delays??
I'd understand the need for the delay if blocking assignments were
used, but not for non-blocking.

Any thoughts??

David Rogoff · Oct 22, 2003

jthibeault@yahoo.com (JT) writes:

I've seen verilog code that uses #delay_value in synchronous processes
while doing the assign with a non blocking assignment:

else
begin
mv_datai <= `dly mv_dataix;
mv_in <= `dly mv_datai;
end
end

I have never encountered problems in not using delays in sequential
processes with non blocking assignments. Why do people use delays??
I'd understand the need for the delay if blocking assignments were
used, but not for non-blocking.

Any thoughts??

There are a couple of reasons to do this (I hate it, personally).
First, if your RTL instantiates other modules that have delays
(gate-level netlists with timing, behavioural RAM models, etc), you
can sometimes get errors if your code doesn't have delays. Some
people also like them for viewing in waveform viewers to see inputs
vs. outputs of registers.

Using `dly, as in your example, is good since you can define dly to
nothing and get rid of them, if you want.

David

Jonathan Bromley · Oct 22, 2003

"JT" <jthibeault@yahoo.com> wrote in message
news:ea38511d.0310220816.6ebc3f38@posting.google.com...

I've seen verilog code that uses #delay_value in synchronous processes
while doing the assign with a non blocking assignment:

(taken from a previous posting about grey counters)
always @(posedge clk or posedge global_async_reset)
gray_ctr <= #1 <some_expression>;
[...]
I have never encountered problems in not using delays in sequential
processes with non blocking assignments. Why do people use delays??
I'd understand the need for the delay if blocking assignments were
used, but not for non-blocking.

Two reasons that make at least some sense to me:

(1) It makes it a lot easier to read waveforms if your
synchronous logic has non-zero Tco;
(2) If you have any kind of clock gating or buffering,
and the clock logic is modelled using zero delay,
you can get spurious problems in zero-delay simulation
because a clock can be delayed by more iterations of
the simulation cycle than the synchronous output it's
supposed to be sampling. Adding non-zero Tco avoids
this problem **in zero-delay sims**. Once you have
a timing-backannotated sim, of course, everything
should work just fine because the real backannotated
Tco is presumably longer than the real backannotated
clock delay plus FF input hold time.

Despite these good reasons, it always looks to me like an
act of desperation (a.k.a. hack). I guess it's the kind
of thing you can have a jolly good religious war about.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Archer · Oct 27, 2003

"Jonathan Bromley" <jonathan.bromley@doulos.com> wrote in message news:<bn6ce9$a64$1$8300dec7@news.demon.co.uk>...

"JT" <jthibeault@yahoo.com> wrote in message
news:ea38511d.0310220816.6ebc3f38@posting.google.com...
I've seen verilog code that uses #delay_value in synchronous processes
while doing the assign with a non blocking assignment:

(taken from a previous posting about grey counters)
always @(posedge clk or posedge global_async_reset)
gray_ctr <= #1 <some_expression>;
[...]
I have never encountered problems in not using delays in sequential
processes with non blocking assignments. Why do people use delays??
I'd understand the need for the delay if blocking assignments were
used, but not for non-blocking.

Two reasons that make at least some sense to me:

(1) It makes it a lot easier to read waveforms if your
synchronous logic has non-zero Tco;
(2) If you have any kind of clock gating or buffering,
and the clock logic is modelled using zero delay,
you can get spurious problems in zero-delay simulation
because a clock can be delayed by more iterations of
the simulation cycle than the synchronous output it's
supposed to be sampling. Adding non-zero Tco avoids
this problem **in zero-delay sims**. Once you have
a timing-backannotated sim, of course, everything
should work just fine because the real backannotated
Tco is presumably longer than the real backannotated
clock delay plus FF input hold time.

Despite these good reasons, it always looks to me like an
act of desperation (a.k.a. hack). I guess it's the kind
of thing you can have a jolly good religious war about.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

HI Jonathan,

From the bench mark SNUG 2002 indicated, the non-blocking assingment
with zero delay can reach the highest performace compared to dely and
blocking asignemnt. That is why we use all non zero delay assignment.
But we are sufferring from the problem you mentiond item 2.

Please tell me how to "Adding non-zero Tco avoids this problem **in
zero-delay sims** " like you say. What is Tco ? I am doing all zero
delay simulation and suffering from the hold time violation when
signal destined to a FF clocked by the divided clock from the source
clock. My current solution is to add #1 to slove the problem. IS there
any way I can keep my design clean without # .

Archer,

Best Regards,

Archer Hsu
Waveplus Incorp.
archerh@waveplus.com.tw
www.waveplus.com.tw
(O) (03) 563-6025 ext : 119
(M) 0930-182-609

Jonathan Bromley · Oct 28, 2003

"Archer" <archer_hsu38@yahoo.com.tw> wrote in message
news:6beb1ab.0310271013.52114e52@posting.google.com...

Two reasons that make at least some sense to me:

(1) It makes it a lot easier to read waveforms if your
synchronous logic has non-zero Tco;
(2) If you have any kind of clock gating or buffering,
and the clock logic is modelled using zero delay,
you can get spurious problems in zero-delay simulation
because a clock can be delayed by more iterations of
the simulation cycle than the synchronous output it's
supposed to be sampling. Adding non-zero Tco avoids
this problem **in zero-delay sims**. Once you have
a timing-backannotated sim, of course, everything
should work just fine because the real backannotated
Tco is presumably longer than the real backannotated
clock delay plus FF input hold time.

From the bench mark SNUG 2002 indicated, the non-blocking assingment
with zero delay can reach the highest performace compared to dely and
blocking asignemnt. That is why we use all non zero delay assignment.

No, that's complete crap. Anyone who adopts a coding style, simply
because some idiot benchmark claims it gives higher performance,
is asking for trouble. We (I?) use nonblocking assignment because
it correctly models zero-delay flip-flop behaviour. Blocking
assignment gives even higher performance, but it breaks any
non-trivial synchronous model, so we don't use it.

But we are sufferring from the problem you mentiond item 2.
Please tell me how to "Adding non-zero Tco avoids this problem **in
zero-delay sims** " like you say. What is Tco ?

Clock-to-output delay of a flip-flop. I'm sorry I made an assumption
that everyone uses the same abbreviations! The flip-flop model
with a nonblocking # delay introduces clock-to-output delay:

always @(posedge clock)
Q <= #Tco D;

I am doing all zero
delay simulation and suffering from the hold time violation when
signal destined to a FF clocked by the divided clock from the source
clock. My current solution is to add #1 to slove the problem.

Of course, there is a hint here. You *risk* getting exactly the
same problem in the real hardware, because of course the divided
clock suffers a flip-flop delay in just the same way as it does in
your simulation. Note, too, that your solution is not robust:
If you had applied #1 delays to ALL your flip-flops, including the
flip-flops that create the divided clock, then the simulation
would not work reliably.

IS there any way I can keep my design clean without # .

I can think of three possible approaches...

a) Move to a fully synchronous design with clock enables, instead
of using a divided clock.

b) Add some delay (perhaps using wire delay) to the data signal
that goes from one clock domain to the other.

c) If you must use a divided clock, add a holding register between
the clock domains:

// fast_clock domain
always @(posedge fast_clock) begin

// Clock divider
divided_clock <= !divided_clock;

// data generated in this clock domain
// (this code may be very complicated, of course)
data_to_transfer <= .......;

// Holding register, enabled only at the fast clock
// edge that gives a falling edge of divided_clock.
// This guarantees that buffered_data is stable
// around the rising edge of divided_clock.
if (divided_clock)
buffered_data <= data_to_transfer;

end // fast_clock domain

// divided_clock domain
always @(posedge divided_clock) begin
Q <= buffered_data;
end // divided_clock domain

Note that it is always OK for the fast clock to sample
data created in the divided clock domain, because the
divided clock is later and will provide the necessary
hold time.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Bob Perlman · Oct 28, 2003

Hi -

On 22 Oct 2003 09:16:26 -0700, jthibeault@yahoo.com (JT) wrote:

I've seen verilog code that uses #delay_value in synchronous processes
while doing the assign with a non blocking assignment:

(taken from a previous posting about grey counters)
always @(posedge clk or posedge global_async_reset)
gray_ctr <= #1 global_async_reset ? 0 :
preload_en ? preload_val :
inc_en ? gray_plus_1 :
gray_ctr;

This is code I posted earlier.

First, let me assure you that the #1 delay is *not* there to remove a
race in synchronous simulation, because there is no race to fix. The
right-hand side of a nonblocking assignment is evaluated and stored
away, then assigned to the left-hand side at the end of the timestep.
The upshot is that all flip-flops clocked from the same clock signal
update *after* their new value is evaluated, so there's no chance of
one flip-flop changing, then affecting the output of the next
flip-flop. Verilog gurus may have a more elegant way of describing
it, but that's essentially what's happening. No race.

I add the delay to make the simulation results more pleasing to me in
the waveform viewer. With the #1, anything that would be expected to
change after a clock edge in the actual hardware appears to change
*after* the clock, not coincident to it. This is a convenience,
nothing more.

But I'm seriously reconsidering my use of such delays in future
projects, primarily because there's an increasing popular sentiment
that use of #delay in such instances is a sign of Verilog illiteracy.
(I've seen some pretty strong statements to this effect in previous
threads.) People will home in on the #1 because it's something they
read about in a newsgroup, and overlook the fact that the code (a) is
well-formatted, (b) is well-documented, and (c) works. Sometimes it's
easier to remove the offending item than to explain why it's not
offensive.

One other comment, for you folks who do FPGA synthesis: the
ternary-switch coding style I used above is a favorite of mine, and
works very well with Synplicity. But I found out recently that the
Altera Quartus Verilog compiler is totally bamboozled by it. So
beware.

Bob Perlman
Cambrian Design Works

Cliff Cummings · Nov 6, 2003

"Jonathan Bromley" <jonathan.bromley@doulos.com> wrote in message news:<bnlcc3$50d$1$830fa7a5@news.demon.co.uk>...

"Archer" <archer_hsu38@yahoo.com.tw> wrote in message
news:6beb1ab.0310271013.52114e52@posting.google.com...

Two reasons that make at least some sense to me:

(1) It makes it a lot easier to read waveforms if your
synchronous logic has non-zero Tco;
(2) If you have any kind of clock gating or buffering,
and the clock logic is modelled using zero delay,
...

From the bench mark SNUG 2002 indicated, the non-blocking assingment
with zero delay can reach the highest performace compared to dely and
blocking asignemnt. That is why we use all non zero delay assignment.

No, that's complete crap. Anyone who adopts a coding style, simply
because some idiot benchmark claims it gives higher performance,
is asking for trouble.

I guess I should confess to being the author of the "idiot benchmark!"
;-) ;-) ;-)

In case anyone is interested in the 53-page Boston SNUG 2002 paper
with benchmarks, it can be freely downloaded from:
www.sunburst-design.com/papers

I did a lot of talking with respected consultant Steve Golson and the
Technical Director of the Synopsys Technical Verification Group
(responsible for VCS) in putting together the paper and benchmarks.
For very high performance simulators like VCS, omitting the #1 delays
gave a 18% - 92% performance increase over the same models with the #1
delays. Lower performance simulators may show no improvement (no such
optimization has been included in those simulators).

For waveform viewing (early debugging), the #1 delays are nice, but
when you start going for the all-out regression speed as described by
Bening and Foster in their book Principles of Verifiable RTL Design,
being able to turn off the delays is very useful. In the paper I
suggest the following for those that like #1 delays:

// To enable <= #1 (NonBlocking Delays), simulate with the
// following command: +define+NBD
// Default is to simulate with the higher performance no-delay
`ifdef NBD
`define D #1
`else
`define D
`endif

Then use delay values of `D in the sequential always blocks.

If you could increase the performance of your Verilog regression runs
by up to 92% by making a small modification to your Verilog coding
style, wouldn't you be interested?

We (I?) use nonblocking assignment because
it correctly models zero-delay flip-flop behaviour. Blocking
assignment gives even higher performance, but it breaks any
non-trivial synchronous model, so we don't use it.

Agreed.

For most split-buffer clocks, as long as the clock buffers are modeled
using continuous assignments or blocking assignments, there will be no
race problems (more details in the paper).

Mixed RTL and gate-level simulations may require that delays be added
to nonblocking assignments (more details in the paper).

In a separate thread, Bob Perlman points out that some people
incorrectly believe that the "use of #delay in such instances is a
sign of Verilog illiteracy." Bob is correct. Whenever I see the #1
delays, I always ask why the engineer added the delays. If they
mention clk-to-q delays in a waveform viewer, they usually understand
what they are doing. If they mention that #1's fix nonblocking
assignments, I know I just found one of the illiterate souls that Bob
is referring to!

Hope this helps.

Regards - Cliff Cummings
----------------------------------------------------
Cliff Cummings - Sunburst Design, Inc.
Verilog On-Site Training Sale - 4-day Courses for $1,200/Student
cliffc@sunburst-design.com / www.sunburst-design.com
Expert Verilog, SystemVerilog, Synthesis and Verification Training

Jonathan Bromley · Nov 6, 2003

"Cliff Cummings" <cliffc@sunburst-design.com> wrote in
message news:7788812d.0311061421.436b95ff@posting.google.com...

I guess I should confess to being the author of the "idiot benchmark!"
;-) ;-) ;-)

Hah! Once again, fingers hit keyboard without intervention
by superego... apologies for any offence when none was meant.

Benchmarks is benchmarks - "there is no arguing with facts
and experience" [(c) I. Newton]. Problem is, a benchmark
sometimes gives answers to a different question than the one
you thought you asked.

My complaint was not against benchmarks, but against using
performance guidelines as a high-priority driver for
coding technique. Fast code is useless if it's broken.

[...]

If you could increase the performance of your Verilog regression runs
by up to 92% by making a small modification to your Verilog coding
style, wouldn't you be interested?

I'd sure be interested; but if I have to run the sim three
times to nail some silly race condition that got introduced
because I tried to improve performance, I ain't won much.
(In this case, of course, your recommendations are 100% safe
and the problem wouldn't arise. But I hope you can see what
I'm driving at).

Performance is a many-faceted issue. Many verification
engineers are using bolt-on tools and extensions to their
sims, interfacing through the PLI and thereby introducing
a measurable performance hit. Why? Because the new, maybe
slower tools give them hugely greater verification productivity
in terms of bugs found per hour of simulation time, even
though the sim itself is running a tad slower.

For most split-buffer clocks, as long as the clock
buffers are modeled using continuous assignments
or blocking assignments, there will be no
race problems (more details in the paper).

An interesting (and important!) example of a case where
you can program your way out of a problem in Verilog,
but there's no equivalent trick in VHDL.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Use of #delay_value in synchronous processes?

JT

Guest

David Rogoff

Guest

Jonathan Bromley

Guest

Archer

Guest

Jonathan Bromley

Guest

Bob Perlman

Guest

Cliff Cummings

Guest

Jonathan Bromley

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Use of #delay_value in synchronous processes?

JT

Guest

David Rogoff

Guest

Jonathan Bromley

Guest

Archer

Guest

Jonathan Bromley

Guest

Bob Perlman

Guest

Cliff Cummings

Guest

Jonathan Bromley

Guest

Log in

Welcome to EDABoard.com

Sponsor