How raise a flag immediately /after/ an operation completes

J

James Harris

Guest
To carry out an operation such as addition and then raise a flag I was
thinking to write

always @(a1, a2) begin
y = a1 + a2;
flag = 1;
end

but HDL Programming Fundamentals (Nazeih M Botros) says of behavioural
descriptions: "For VHDL, the statements inside the process are
sequential. In Verilog, all statements are concurrent."

I thought that '=' was sequential and '<=' was a concurrent
assignment. So I am puzzled. I don't want to build in delays with
#<delay> as I don't quite see how they synthesize.

Is there a way (or a best way) to get the flag to raise as soon as the
addition is complete?

--
TIA,
James
 
On Fri, 14 Sep 2007 00:01:13 -0700, James Harris
<james.harris.1@googlemail.com> wrote:

To carry out an operation such as addition and then raise a flag I was
thinking to write

always @(a1, a2) begin
y = a1 + a2;
flag = 1;
end

but HDL Programming Fundamentals (Nazeih M Botros) says of behavioural
descriptions: "For VHDL, the statements inside the process are
sequential. In Verilog, all statements are concurrent."
That's utter nonsense. Jettison the book.

I thought that '=' was sequential and '<=' was a concurrent
assignment.
Not quite. = is "blocking assignment". It takes effect
immediately (from the simulator's point of view). <= is
"nonblocking assignment". It executes without delay,
execution proceeding to the next statement in the sequential
flow; but the variable update that it causes will not take
effect until later in the same simulation timeslot, after
all active processes have reached a delay control (@, # or wait).

So I am puzzled. I don't want to build in delays with
#<delay> as I don't quite see how they synthesize.
That is completely sound; they *don't* synthesize - synth
tools ignore # delays.

Is there a way (or a best way) to get the flag to raise
as soon as the addition is complete?
From the point of view of the *simulation*, executing as
a piece of software, your original code is perfectly correct,
in the sense that if you wait until "flag" is asserted, you
can be sure that the addition and the updating of y has
already taken place.

HOWEVER, in synthesized logic, these two things *will*
happen in parallel because they will be implemented by
different pieces of logic. And, as you might expect,
the addition will be quite a bit slower than setting
a simple flag register. So if you use the rising edge
of "flag" to trigger some other logic that uses the
result of the addition, it may well get bad results -
you have a race condition. You need a different and
much more conservative approach.

Are you designing truly asynchronous logic, with no
clock? If so, you probably won't get much help here
because *all* mainstream synthesis design techniques
are based on the use of a clock to synchronize the
actions of various parts of the logic.

If, however, you are designing synchronous logic, you
can do this:

always @(posedge clk) // clocked process!
if (its_time_to_do_an_addition) begin
y <= a1 + a2; // note: NONBLOCKING assign in clocked logic
flag <= 1;
end else begin
flag <= 0;
end

Now, when the clock ticks and the "do an addition" condition
is true, register y will be updated with the sum and
register "flag" will get 1. These two things are both
guaranteed to happen AFTER the clock (not long after, but
definitely after); they are not guaranteed to happen in
any particular order in the synthesized logic, BUT the
remainder of the system is clocked logic too, and therefore
will not notice the flag asserted until the NEXT clock,
by which time the addition result will be well and truly
stable. So you can safely use "flag" as a clock-enable to
downstream logic to say "here is a valid result from the
addition".

If you want to use "flag" itself to clock the result -
maybe, for example, it is the write strobe to an
asynchronous memory device - then you will need to delay
"flag" by yet one further clock cycle, using some kind of
state machine or perhaps a simple shift register.

hth
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
On 14 Sep, 09:02, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
On Fri, 14 Sep 2007 00:01:13 -0700, James Harris

james.harri...@googlemail.com> wrote:
To carry out an operation such as addition and then raise a flag I was
thinking to write

always @(a1, a2) begin
y = a1 + a2;
flag = 1;
end

but HDL Programming Fundamentals (Nazeih M Botros) says of behavioural
descriptions: "For VHDL, the statements inside the process are
sequential. In Verilog, all statements are concurrent."

That's utter nonsense. Jettison the book.

I thought that '=' was sequential and '<=' was a concurrent
assignment.

Not quite. = is "blocking assignment". It takes effect
immediately (from the simulator's point of view). <= is
"nonblocking assignment". It executes without delay,
execution proceeding to the next statement in the sequential
flow; but the variable update that it causes will not take
effect until later in the same simulation timeslot, after
all active processes have reached a delay control (@, # or wait).
OK. That's clear. Thanks.

So I am puzzled. I don't want to build in delays with
#<delay> as I don't quite see how they synthesize.

That is completely sound; they *don't* synthesize - synth
tools ignore # delays.
I'm suprised to say the least that simulation and synthesis are not
closer - to reduce development/test cycle costs. I expect that I could
build "components" and code their delays (if I pick a suitable
timescale and a given logic family). These components should then work
the same way under simulation as they will when synthesized but that
won't allow the design to be synthesized to ASIC or FPGA without
changes for the propagation delays inherent in each.

Is there a way (or a best way) to get the flag to raise
as soon as the addition is complete?

From the point of view of the *simulation*, executing as
a piece of software, your original code is perfectly correct,
in the sense that if you wait until "flag" is asserted, you
can be sure that the addition and the updating of y has
already taken place.
Yes, that is exactly the intention. Flag could be called "data ready"
or similar.

HOWEVER, in synthesized logic, these two things *will*
happen in parallel because they will be implemented by
different pieces of logic. And, as you might expect,
the addition will be quite a bit slower than setting
a simple flag register. So if you use the rising edge
of "flag" to trigger some other logic that uses the
result of the addition, it may well get bad results -
you have a race condition. You need a different and
much more conservative approach.

Are you designing truly asynchronous logic, with no
clock? If so, you probably won't get much help here
because *all* mainstream synthesis design techniques
are based on the use of a clock to synchronize the
actions of various parts of the logic.
Yes, this is intended to be asynchronous. That is the main point of
the design. Each component is to handshake its output to the next.
Each component is to assert its output ready flag as soon as its
computation is complete.

Given what you say this is suddenly looking to be a rather large
challenge.

I guess I could recode at gate level, perhaps for an addition
operation noting when carry stops propagating or suchlike (just an
example, haven't thought it through). For multiplication there is
probably a way to check for 'bits left to multiply by' and so on. But
apart from being a whole lot harder I'm not sure how something as
specific as gate logic would code into an FPGA which is what I wanted
to use for testing.

Phew! If anyone has any pointers on what path I should follow to do
this asynchronously would be appreciated. Thanks, Jonathan, for the
help so far.

....

If you want to use "flag" itself to clock the result -
maybe, for example, it is the write strobe to an
asynchronous memory device - then you will need to delay
"flag" by yet one further clock cycle, using some kind of
state machine or perhaps a simple shift register.
Understood. The trouble is this uses a clock again. The design is
meant to be clockless. Why couldnt I have started with something
easy???
 
On Fri, 14 Sep 2007 03:19:23 -0700,
James Harris <james.harris.1@googlemail.com> wrote:

I'm suprised to say the least that simulation and synthesis are not
closer - to reduce development/test cycle costs. I expect that I could
build "components" and code their delays (if I pick a suitable
timescale and a given logic family). These components should then work
the same way under simulation as they will when synthesized but that
won't allow the design to be synthesized to ASIC or FPGA without
changes for the propagation delays inherent in each.
It's not quite that simple. Once synthesis and place-and-route
is complete, back-end tools can predict fairly accurately what
the gate and net delays will be. These delays can then be
back-annotated into agate-level netlist (synthesis
output, a bunch of gates connected together by nets,
all coded in Verilog that's not really meant for human
consumption) and, when this version of the design is
simulated, you'll see the real gate delays much as they
would be in the finished device. However, it is *very*
difficult - effectively impossible - for synthesis to
build logic that will have a certain requested delay;
it's even pretty hard for synthesis to build two
logic paths in a way that guarantees one to be slower
than the other, which I guess is what you need.
This is part of the reason why asynch design hasn't
yet hit the mainstream.

[...]
this is intended to be asynchronous. That is the main point of
the design. Each component is to handshake its output to the next.
Each component is to assert its output ready flag as soon as its
computation is complete.
This is exactly what asynchronous design aims to achieve.
There's a considerable academic literature on it - most of
which is opaque to me - but you're most likely to get a
grip on it by looking in two places:

1) Google for "Muller C-element", one of the key building
blocks of current asynchronous design styles;
2) have a look at what Handshake Solutions are doing -
they're a spin-off from Philips research, and they
have a tool and language for synthesis of asynchronous
logic. It's very interesting, and radically different
from any other HDL.

Given what you say this is suddenly looking to be a rather large
challenge.
Indeed :)

I guess I could recode at gate level, perhaps for an addition
operation noting when carry stops propagating or suchlike (just an
example, haven't thought it through). For multiplication there is
probably a way to check for 'bits left to multiply by' and so on. But
apart from being a whole lot harder I'm not sure how something as
specific as gate logic would code into an FPGA which is what I wanted
to use for testing.
Good insight. "Optimizations" and device-specific fitting by
synthesis tools and FPGA place-and-route are both likely to
subvert your efforts, unless you do some fairly wild stuff.

Why couldnt I have started with something easy???
Because you're looking for fun, challenge and the thrill
of the bleeding-edge? Nah, just kidding :)
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
On 14 Sep, 12:06, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:

....

Why couldnt I have started with something easy???

Because you're looking for fun, challenge and the thrill
of the bleeding-edge? Nah, just kidding :)
Haha - yep, and am bleeding profusely! I think I can get this to work
if only the place and route or suchlike does not optimise the data
path so much that the handshaking gets left behind.

So far it seems the handshaking itself takes significant time so on
the one hand it allows time for the signals to settle. On the other
hand the circuit would struggle to match Intel's or AMD's clock cycle
times. Hmm. I'd be happy with 100th their speed, based on the hardware
response times.

It seems odd to me (guess I haven't gone in to it far enough yet) that
simulation is not constrained to accurately reflect the hardware from
day one. Would have thought it would slash development costs if
simulation and practice were effectively one and the same all the way
through. Anyway....

I have been coding this purely as gate logic so far. BIG learning
experience, but fun too.

If anyone else is looking at this later he or she might be interested
in

http://www.na2.es/research.htm

where they promise some asynch fpga integration in the near future.

--
James
 
On Oct 11, 4:23 am, James Harris <james.harri...@googlemail.com>
wrote:
On 14 Sep, 12:06, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com
wrote:

...

Why couldnt I have started with something easy???

Because you're looking for fun, challenge and the thrill
of the bleeding-edge? Nah, just kidding :)

Haha - yep, and am bleeding profusely! I think I can get this to work
if only the place and route or suchlike does not optimise the data
path so much that the handshaking gets left behind.

So far it seems the handshaking itself takes significant time so on
the one hand it allows time for the signals to settle. On the other
hand the circuit would struggle to match Intel's or AMD's clock cycle
times. Hmm. I'd be happy with 100th their speed, based on the hardware
response times.

It seems odd to me (guess I haven't gone in to it far enough yet) that
simulation is not constrained to accurately reflect the hardware from
day one. Would have thought it would slash development costs if
simulation and practice were effectively one and the same all the way
through. Anyway....
You have to consider that Verilog (and other HDL's) was originally a
"behavioral modelling" language - that is, it was used to describe,
approximately, the behavior of the design. Verilog models weren't
actually meant to *be* the device, just a model you could play with
while other engineers were working on the actual design for that
device.

That is, each Verilog module is sort of like a black box which says
"this module does this and that" without regards to the actual
feasibility of the module.

On day 1, the HDL module was just equivalent to the specs of the
device. The actual module would be built by someone else using the
Verilog module as the specifications for the device. So no -
simulation and practice could never be equivalent, in much the same
way that an actual device rarely 100% reaches the specs (not without a
lot of work, and probably some modifications to the specs).

It was somewhat later that a subset of HDL's (called RTL) was defined
which could be accepted by a new tool, the "synthesis tool". The job
of the synthesis tool was to do the work of the human engineers who
were creating the actual design; in fact, even today additional
verification should really be done on the actual design to ensure that
it really is equivalent to your RTL behavior.

Currently, synthesis is still being developed. So while what you want
to do may be possible in the future, it isn't possible *yet*.

And no, current tools cannot handle asynch (as in unclocked) designs,
and yes, asynch designs aren't very hard, unless you're doing a
complex asynch design. For complex synch designs, a lot of tools
exist (RTL, synthesis, etc.). For complex asynch designs, none exist
yet.

I have been coding this purely as gate logic so far. BIG learning
experience, but fun too.

If anyone else is looking at this later he or she might be interested
in

http://www.na2.es/research.htm

where they promise some asynch fpga integration in the near future.

--
James
 

Welcome to EDABoard.com

Sponsor

Back
Top