Tool to help detecting race conditions with asych inputs?

T

Thomas Stanka

Guest
Hello,

do you know any tool, that would help detecting race conditions due to
asynchronous inputs?

I had a design with asynchronous inputs. I inspected the rtl code to
ensure, the asynch inputs would only be used if they are stable with
respect to the specification. Unfortunately I missed a line, where an
asynchronous input release an synchron reset. The synthesis generated
a race condition which lead to disfunction of the design. After
founding the problem it was very easy to see the failure in the
netlist. But it seem to me very hard to detect the problem without
knowing that it would happen, because whether timing analysis (maybe
not propper done) nor equivalence checking nor gate level simulation
failed.

Are there tools, that would help in such cases? I don't like the idea
to spend hours and days inspecting netlists for asynchronous inputs I
use to ensure, that this failure won't happen a second time.

I know, that it would be best to avoid asych. inputs by inserting
registers, but I have some designs with hard area constraints and
other designs with timing constraints that didn't permit the use of
registers for all inputs.

bye Thomas
 
do you know any tool, that would help detecting race conditions due to
asynchronous inputs?
One clean approach is to run all external async inputs through
the standard 2-FF synchronizer. Then they are synchronous and
normal tools will work.

The key tool for that is a pair of eyeballs. Scan the source
code and make sure that the only place an async inputs go is into
the synchronizers.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
usenet_10@stanka-web.de (Thomas Stanka) wrote in message news:<ef424d2c.0405272250.ba032e@posting.google.com>...
Hello,

do you know any tool, that would help detecting race conditions due to
asynchronous inputs?
No. That would difficult problem even if you were
given all the gate and route delay ranges.

I had a design with asynchronous inputs. I inspected the rtl code to
ensure, the asynch inputs would only be used if they are stable with
respect to the specification. Unfortunately I missed a line, where an
asynchronous input release an synchron reset. The synthesis generated
a race condition which lead to disfunction of the design. After
founding the problem it was very easy to see the failure in the
netlist.
You found the source of one race condition.
There are no doubt others that will introduce themselves
over time, temperature, state and input variations.

I don't like the idea
to spend hours and days inspecting netlists for asynchronous inputs I
use to ensure, that this failure won't happen a second time.
I don't either. Consider doing whatever is necessary
synchronize all the inputs to the system clock.

I know, that it would be best to avoid asych. inputs by inserting
registers,
That's it.

but I have some designs with hard area constraints and
other designs with timing constraints that didn't permit the use of
registers for all inputs.
That is an engineering problem.
There are always alternatives.

-- Mike Treseler
 
I had a design with asynchronous inputs. I inspected the rtl code to
ensure, the asynch inputs would only be used if they are stable with
respect to the specification.
Thinking about this some more...

What does that actually mean? If a signal is asynchronous (relative
to some other clock/signal) how/when can it be stable?

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
Hal Murray wrote:

What does that actually mean? If a signal is asynchronous (relative
to some other clock/signal) how/when can it be stable?
Stable phase could mean the signal has
already been properly synchronized.

But a "stable" signal transition could also
occur exactly at the active clock edge.

--Mike Treseler
 
On 27 May 2004 23:50:12 -0700, usenet_10@stanka-web.de (Thomas Stanka)
wrote:

Hello,

do you know any tool, that would help detecting race conditions due to
asynchronous inputs?
I used to use a tool like this when I was at Agilent. It was written
in-house (Hi Mark!).

[snip]
Are there tools, that would help in such cases? I don't like the idea
to spend hours and days inspecting netlists for asynchronous inputs I
use to ensure, that this failure won't happen a second time.
Note that a synthesis tool will sometimes *create* races or glitches
(e.g. when a ff used in a sychroniser gets replicated due to fanout -
yes, this happened in real-world designs).
Such problems *cannot* be caught by inspecting the RTL source; the
only way is to look at the post-synth netlist. We ended up using the
post-PAR back-annotated VHDL netlist (although I guess Verilog would
do just as well).

If you try, you could probably write such a tool in a few days,
assuming you already know how to program in a text processing language
such as Perl. (I suppose you could use C if you must.)
It is a fairly simple matter to trace all signals in the netlist back
(via combinatorial logic) to either the output of something that is
clocked (which Mark called a synchronous element, e.g. ff, bram, SRL),
or to a pin.

By the time all the feature creep had ended, the tool we used checked
for:

1. Any clock gating (i.e. if the clock input of any synchronous
element is driven by combinatorial logic).

2. A list of all clocks used. You'd be suprised how often extra
clocks turn up, particularly in code written by less experienced
engineers.

3. Glitches, which we defined as a synchronous element with data
input(s) that could be traced back to more than one source in another
clock domain (including pins).

4. Races, which we defined as a synchronous element which feeds more
than one synchronous element in a different clock domain.

[Note: I don't think this is quite the same as the classic definition
of glitch and race, but it was ok for our purposes.]

5. Any use of async set or reset. It would trace all of these back
to their ultimate source. (Ideally, this would just be a single pin
called "reset" or something similar.)


We had the problem of integrating large chunks of design written at
multiple sites, and this tool saved lots of time by finding problems
that couldn't be found in simulation and would only show up in the lab
intermittently (e.g. it crashes once every 500 boots). Indeed, it
found several problems before we even had an inkling a problem
existed!

The majority of our problems were due to cross-clock domain paths
inside a single FPGA, but the same issues could apply to signals
coming from pins.
Prior to the creation of this tool, I estimated about half the debug
time on some projects was due to improperly handled cross clock domain
signals. Many of the bugs were in "proven" legacy code that had been
"working fine" for years.
There weren't that many bugs, it's just that they took a long time to
find compared with straighforward functional bugs.

Regards,
Allan.
 
Hi,

mike_treseler@comcast.net (Mike Treseler) wrote
usenet_10@stanka-web.de (Thomas Stanka) wrote
I had a design with asynchronous inputs. I inspected the rtl code to
ensure, the asynch inputs would only be used if they are stable with
respect to the specification. Unfortunately I missed a line, where an
asynchronous input release an synchron reset. The synthesis generated
a race condition which lead to disfunction of the design. After
founding the problem it was very easy to see the failure in the
netlist.

You found the source of one race condition.
There are no doubt others that will introduce themselves
over time, temperature, state and input variations.
:). Indeed there was a possible second race condition for a very
unusual input constellation, but I ensured, that there were no other
race conditions by inspection of every path from asynchronous inputs
to registers. Even over temperature and voltage. This job was very
nasty and seems to me very errorprone when having more than about 10
pathes to inspect. So I wonder whether there exist allready tools
helping you doing this job.

I don't like the idea
to spend hours and days inspecting netlists for asynchronous inputs I
use to ensure, that this failure won't happen a second time.

I don't either. Consider doing whatever is necessary
synchronize all the inputs to the system clock.
Impossible for this design due to hard area constraints.

but I have some designs with hard area constraints and
other designs with timing constraints that didn't permit the use of
registers for all inputs.

That is an engineering problem.
There are always alternatives.
Tell me your employer, it seems very good to have a job, where an fpga
designer has the possibillity to deny projects with hard constraints
*g*.

Beside the hard area criteria, when a design had to fit into an given
fpga with no possibillity to get an bigger fpga to place the
neccessary FF to synchronise every input, there are other designs with
timing criteria, that didn't allow to synchronise by inserting two ff
between two clock domains.
Whenever your designs allows you no clock cycle to respond on
requests, you have to deal with asynchronity by using other technics
like handshake (if possible).

bye Thomas
 
I used to use a tool like this when I was at Agilent. It was written
in-house (Hi Mark!).
[Big snip of feature list.]

This seems like a great candidate for a FPGA related open-source project.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
On Wed, 02 Jun 2004 02:13:35 -0500, hmurray@suespammers.org (Hal
Murray) wrote:

I used to use a tool like this when I was at Agilent. It was written
in-house (Hi Mark!).

[Big snip of feature list.]

This seems like a great candidate for a FPGA related open-source project.
Agreed.

Allan.
 
Hello,

Allan Herriman <allan.herriman.hates.spam@ctam.com.au.invalid> wrote:
By the time all the feature creep had ended, the tool we used checked
for:
[snip]

Thanks for this list. I try to setup a script helping me to check for
this list.
I will post, if the script is running stable.

The majority of our problems were due to cross-clock domain paths
inside a single FPGA, but the same issues could apply to signals
coming from pins.
Prior to the creation of this tool, I estimated about half the debug
time on some projects was due to improperly handled cross clock domain
signals. Many of the bugs were in "proven" legacy code that had been
"working fine" for years.
There weren't that many bugs, it's just that they took a long time to
find compared with straighforward functional bugs.
I agree with you that it is very hard and long lasting to debug errors
regarding race conditions :). Especially if you can't find the source
of the problems by RTL inspection.


bye Thomas
 
usenet_10@stanka-web.de (Thomas Stanka) wrote in message news:<ef424d2c.0406020556.15ead43d@posting.google.com>...

Thanks for this list. I try to setup a script helping me to check for
this list.
I will post, if the script is running stable.
Yes, thanks Allan, for the excellent posting.
I like the idea of having a way to verify
the "known-good" designs that are not
well documented.

I would note that for *new* designs, all of
these defects can be prevented with
the right set of design rules.

-- Mike Treseler
 

Welcome to EDABoard.com

Sponsor

Back
Top