Xilinx XST and a State Machine - A Mystery

Here's another reason to NEVER NEVER NEVER feed asynchronous signals to
a state machine: Sometimes the back end tools in the default settings
will replicate flip flops to make routing easier. For instance if there
is a gray code counter bit that goes to many places, the router could
split it into two equivalent bits that go to half as many places each.

I have been burned by this before on input deglitching flip flops, where
the tool replaced it with two parallel deglitching flip flops, which is
obviously not much of a deglitcher.

There are tool directives to prevent this from happening, but I like to
do the two deglitchers in series trick so that the first deglitcher will
only have one load, so it can't be split.

-Jeff
 
Concise summary of this thread:

While it's possible to build a state machine that safely handles
asynchronous inputs, there's seldom any justification for it. Failure
to synchronize inputs invites trouble and violates decades of solid
design practice.
 
Back when we used PALs with only 4 or 8 registers (16R4 & 16R8 PALs),
separately synchronizing an input to a state machine was not always an
option from a resource point of view.

As has been said above, there are methods* of making a state machine
tolerant of asynchronous inputs. But these methods are exceedingly
difficult to verify and/or review, especially when compared to the use
of an explicit synchronizer. Furthermore, these methods are often
thwarted by optimizations that are commonplace in FPGA synthesis
tools. Finally, given the abundance of registers in FPGAs, there is
very rarely an acceptable excuse to not use an explicit synchronizer.

* the rules for these methods are:

1. All transitions of the state machine in response to an asyncrhonous
reset, including any registered outputs, must result in the
possibility of only one bit changing.

1A. A state may have conditional destinations of itself and an
adjacent (on a K-map) state

1B. A state may have two separate destination states that are adjacent
to each other, but then staying in the current state is not allowed.

1C. A state may change a single-bit registered output based on the
async input, but it cannot also transition out of that state at the
same time. An old trick to effectively "re-use" a synchronizing
register was to set a flag while in the state based on the async
input, then transition out of the state based on the flag a clock
later. The same flag could be used in multiple states, for multiple
different async inputs.

The hoops we jumped through to get things done in 8 flops or less...

Andy
 
On Sep 20, 7:28 pm, John McCaskill <jhmccask...@gmail.com> wrote:
On Sep 20, 9:04 pm, Darol Klawetter <darol.klawet...@l-3com.com
wrote:



On Sep 20, 4:31 pm, d_s_klein <d_s_kl...@yahoo.com> wrote:

On Sep 20, 2:01 pm, Darol Klawetter <darol.klawet...@l-3com.com
wrote:

On Sep 20, 3:49 pm, KJ <kkjenni...@sbcglobal.net> wrote:

I agree with your last paragraph. It's typically better to use a
couple of flops on an asynchronous control signal than trying to
handle it within the state machine (especially if the tool is going to
take encoding liberties).- Hide quoted text -

The rule is that any time there is a logic path from an asynchronous
input to more than one device that samples the signal(s), you must
first synchronize the signal before feeding it into any logic path.

This condition occurs for any state machine with more than flip flop
used to implement the state.

KJ

I agree that this is a rule that should be generally followed, but
there are exceptions. For example, a gray-coded state machine is
sometimes used to tolerate an asynchronous control input. If timing is
violated on any of the flops, the machine will stay at the current
state or transition to the next state.

This is not a rule that "should be generally followed".  It is rule
that must ALWAYS be followed!

If a signal from a "foreign" clock domain is used in a conditional
inside a state machine, the state machine will fail (eventually).

This is NOT a metastability problem.  It is because the time to decode
the D inputs of the state bits is different for each D input, and if
the "foreign clock" signal is used in a conditional it will go to more
than one D input.  There is no way to ensure that the D inputs are all
valid at the same time, and when the clock happens while one is valid
and the other is not state machine gets corrupted.  This syndrome is
independent of the coding method of the state variable.

Synchronizing all inputs to a state machine to the state machines
clock will prevent that.

Not optional.  Sorry.

RK

RK,

You'll have to convince me that the rule "must ALWAYS" be followed,
independent of state encoding. It still seems to me that gray coding
could work - the machine either stays in the current state or
transitions to the next state because only one bit is transitioning.
If the input to this bit's register meets timing, then the machine
goes to the next state; if timing is not met, it either goes to the
next state or stays at the current one. The other bits are static.

Must always is too strong. Your Gray coding example is valid and deals
with the issue of coherency, since only one FF changes state based on
the asynchronous signal.  It does not address the issue of
metastability, but that is becoming much less of an issue because of
how fast the FFs have become.  There have been post by either Austin
or Peter that they have a problem even trying to make it happen so
that they can measure it.

What I would say is that unless you have a very good reason not to,
you want to just use a synchronizing circuit before you do anything
else with an asynchronous signal.  The cost is usually low, and the
consequences of doing it another way and getting it wrong are high.  I
have debugged this sort of problem before, and it is a painful
process.  FPGAs and their tools assume that you are using synchronous
techniques, and can work against you if you deviate from that path.
They can take your carefully coded Gray state machine and change it to
one hot if you don't have the proper constraints, or they might
replicate state registers if they have a high enough fan out and you
did not turn off that option.

Regards,

John McCaskillwww.FasterTechnology.com
If, and only if your gray code is implemented with a gray counter.

Otherwise, it's just another state coding, and the idea that only one
bit changes at a time is false.

RK.
 
On Sep 21, 11:39 am, Andy <jonesa...@comcast.net> wrote:
Back when we used PALs with only 4 or 8 registers (16R4 & 16R8 PALs),
separately synchronizing an input to a state machine was not always an
option from a resource point of view.

As has been said above, there are methods* of making a state machine
tolerant of asynchronous inputs. But these methods are exceedingly
difficult to verify and/or review, especially when compared to the use
of an explicit synchronizer. Furthermore, these methods are often
thwarted by optimizations that are commonplace in FPGA synthesis
tools. Finally, given the abundance of registers in FPGAs, there is
very rarely an acceptable excuse to not use an explicit synchronizer.

* the rules for these methods are:

1. All transitions of the state machine in response to an asyncrhonous
reset, including any registered outputs, must result in the
possibility of only one bit changing.

1A. A state may have conditional destinations of itself and an
adjacent (on a K-map) state

1B. A state may have two separate destination states that are adjacent
to each other, but then staying in the current state is not allowed.

1C. A state may change a single-bit registered output based on the
async input, but it cannot also transition out of that state at the
same time. An old trick to effectively "re-use" a synchronizing
register was to set a flag while in the state based on the async
input, then transition out of the state based on the flag a clock
later. The same flag could be used in multiple states, for multiple
different async inputs.

The hoops we jumped through to get things done in 8 flops or less...

Andy
Ah the good old PAL days, where you could use PALASM and get exactly
what you
coded - no need to check the "technology schematic". Most of my state
machine
code in those days was carefully coded so that the state variables
WERE the
desired outputs of the machine, with possibly one or two "nodes" when
the
outputs were not sufficient to define all of the required states.
Every signal
was visible on an oscilloscope / logic analyzer - no buried state.
Even
back then I ran into cases of state logic failure from asynchronous
inputs
and usually managed to solve it by making sure only one macrocell
used the async input directly. In FPGA's where flip-flops are
virtually
free, it's a no-brainer to just add one more to synchronize your
inputs
unless you are very highly concerned about latency.

On the topic of metastability, you don't necessarily need to use
two flip-flops to reduce the metastability failure rate to near zero.
Generally the tools allow you to define a slack requirement for
the path from the output of the synchronizing flop to any flops
further down the pipe. If you can meet the slack requirement
without adding a pipeline stage, you save one flop and one
cycle of latency.

Regards,
Gabor
 
On 09/21/2010 10:16 AM, Darol Klawetter wrote:
Concise summary of this thread:

While it's possible to build a state machine that safely handles
asynchronous inputs, there's seldom any justification for it. Failure
to synchronize inputs invites trouble and violates decades of solid
design practice.
And, the best answer to this is to see if a different architecture that
makes the state machine immune to these asynch inputs fixes your problem.

Jon
 
Very good point, metastability on outputs driving non-causal inputs
can easily be handled with a little extra settling time (a couple of
extra ns can buy centuries of MTBF in many FPGAs). Only if the output
is driving a causal input (e.g. clock or async reset input) do you
need the second flop (what we call a metastable rejecter flop).

Andy
 
On Sep 21, 3:48 pm, Jon Elson <jmel...@wustl.edu> wrote:
On 09/21/2010 10:16 AM, Darol Klawetter wrote:

Concise summary of this thread:

While it's possible to build a state machine that safely handles
asynchronous inputs, there's seldom any justification for it. Failure
to synchronize inputs invites trouble and violates decades of solid
design practice.

And, the best answer to this is to see if a different architecture that
makes the state machine immune to these asynch inputs fixes your problem.

Jon
Actually, you can build an asynchronous state machine that will
respond to each input as it changes rather than wait for a clock. But
that would likely require synchronization on the outputs of the state
machine and may not meet all the requirements of the design if other
elements linked to it are synchronous.

Rick
 
On 21 Sep., 18:19, d_s_klein <d_s_kl...@yahoo.com> wrote:

If, and only if your gray code is implemented with a gray counter.
Wrong.

Otherwise, it's just another state coding, and the idea that only one
bit changes at a time is false.
Why do you insist on this?
There definitely are state machines that are more complex than a
counter but where
there are no transitions that change more than one bit. This is what
the other posters are talking about.
Signal skew is not an issue in these desings.

To go to the extreme: There are complete microprocessors on the market
that
work without a clock signal.

Asynchronous design has a couple of advantages, especially with regard
to power consumption.
It therefore has been thoroughly explored by academia and some
companies for decades now.
However, there are a lot of pitfals, so the general conclusion seems
to be that the biggest problem
in IC design is correctness and designer productivity and both clearly
are better for synchronous designs.

Kolja
 
On Sep 23, 2:50 am, Kolja Sulimma <ksuli...@googlemail.com> wrote:
On 21 Sep., 18:19, d_s_klein <d_s_kl...@yahoo.com> wrote:

If, and only if your gray code is implemented with a gray counter.

Wrong.

Otherwise, it's just another state coding, and the idea that only one
bit changes at a time is false.

Why do you insist on this?
There definitely are state machines that are more complex than a
counter but where
there are no transitions that change more than one bit. This is what
the other posters are talking about.
Signal skew is not an issue in these desings.

To go to the extreme: There are complete microprocessors on the market
that
work without a clock signal.

Asynchronous design has a couple of advantages, especially with regard
to power consumption.
It therefore has been thoroughly explored by academia and some
companies for decades now.
However, there are a lot of pitfals, so the general conclusion seems
to be that the biggest problem
in IC design is correctness and designer productivity and both clearly
are better for synchronous designs.

Kolja
Well, another point in this thread is that for FPGA design tools, a
synchronous
design is presumed by the tools and therefore all the necessary
robustness
required to check asynchronous state changes must be done by hand or
via some very expensive third-party tools. It's not even clear that
asynchronous
design has a big power advantage in an FPGA due to the structure of
the
fabric. Global clock routing has been optimized and takes much less
power
per load than general routing, for example.
 
On Sep 20, 9:05 am, John McCaskill <jhmccask...@gmail.com> wrote:
On Sep 20, 10:52 am, Darol Klawetter <darol.klawet...@l-3com.com
wrote:

I recently fixed a problem with one of my state machines, and I think
that the cause could be a bug in Xilinx XST. Below is the code that
produced the failure. The state-machine worked most of the time,
though occasionally the 'phaseRamWE' signal would be stuck low, even
though I could see 'phaseRamReadAdr' and 'phaseRamWriteAdr'
incrementing.  I fixed it by explicitly declaring the desired value of
'phaseRamWE' in every state. Notice that all states are defined so it
should recover from any conditions introduced by asynchronous inputs.

Have any of you seen similar behavior? Appears to be an XST bug to me.

snip

I only skimmed the code, but this bit jumped out at me:

            if (pipeFill == 1'b1)                     // 'pipeFill' is
asynchronous to 'clk'

If pipeFill is really asynchronous to the clock that is running your
state machine,  using it without synchronizing it first is going to
cause you problems.  Probably not because of metastability, but
because of coherency problems. If pipeFill is changing near the clock
edge, some FFs will see the new value, and some the old.  This can
cause weird and flaky behaviour.
So, could he have fixed the problem by not writing to both state and
phaseRamWE in S1? E.g.,

S1:
begin
if (pipeFill == 1'b1)
begin
// phaseRamWE <= 1'b1;
state <= S1A; // was S2
end
end


S1A:
begin
phaseRamWE <= 1'b1;
state <= S2;
end

S2:
...

My understanding is that this would still be hazardous because there's
no guarantee that all of the bits composing the 'state' value are
guaranteed to toggle. Even though there's only one "variable" that
changes in response to pipeFill, the same failure mode still exists as
long as the register itself is more than one bit wide.

Being relatively new to Verilog, it seems that the textbooks warn
extensively about metastability (which as noted below isn't as big a
problem as it used to be) but not much about coherency at all. I can
see how Darol became complacent, since it's easy to see the S1->S2
transition as an all-or-nothing event...

-- john, KE5FX
 
On Sep 27, 2:05 am, "jmi...@pop.net" <jmi...@gmail.com> wrote:
On Sep 20, 9:05 am, John McCaskill <jhmccask...@gmail.com> wrote:



On Sep 20, 10:52 am, Darol Klawetter <darol.klawet...@l-3com.com
wrote:

I recently fixed a problem with one of my state machines, and I think
that the cause could be a bug in Xilinx XST. Below is the code that
produced the failure. The state-machine worked most of the time,
though occasionally the 'phaseRamWE' signal would be stuck low, even
though I could see 'phaseRamReadAdr' and 'phaseRamWriteAdr'
incrementing.  I fixed it by explicitly declaring the desired value of
'phaseRamWE' in every state. Notice that all states are defined so it
should recover from any conditions introduced by asynchronous inputs.

Have any of you seen similar behavior? Appears to be an XST bug to me..

snip

I only skimmed the code, but this bit jumped out at me:

            if (pipeFill == 1'b1)                     // 'pipeFill' is
asynchronous to 'clk'

If pipeFill is really asynchronous to the clock that is running your
state machine,  using it without synchronizing it first is going to
cause you problems.  Probably not because of metastability, but
because of coherency problems. If pipeFill is changing near the clock
edge, some FFs will see the new value, and some the old.  This can
cause weird and flaky behaviour.

So, could he have fixed the problem by not writing to both state and
phaseRamWE in S1?  E.g.,

 S1:
    begin
       if (pipeFill == 1'b1)
       begin
          // phaseRamWE <= 1'b1;
          state <= S1A; // was S2
       end
    end

S1A:
     begin
         phaseRamWE <= 1'b1;
         state <= S2;
     end

S2:
   ...

My understanding is that this would still be hazardous because there's
no guarantee that all of the bits composing the 'state' value are
guaranteed to toggle.  Even though there's only one "variable" that
changes in response to pipeFill, the same failure mode still exists as
long as the register itself is more than one bit wide.

Being relatively new to Verilog, it seems that the textbooks warn
extensively about metastability (which as noted below isn't as big a
problem as it used to be) but not much about coherency at all.  I can
see how Darol became complacent, since it's easy to see the S1->S2
transition as an all-or-nothing event...

-- john, KE5FX
John,

The same failure mode would exist, as you suspected. As has been said
before, there are state machines that can tolerate asynchronous inputs
(I thought that my first version of this machine could do so, since I
thought that all states were defined and all states would provide a
path for recovery), but it's simpler, less error-prone, and tool-
independent to just synchronize all inputs.
 
On Sep 27, 9:28 am, Darol Klawetter <darol.klawet...@l-3com.com>
wrote:
On Sep 27, 2:05 am, "jmi...@pop.net" <jmi...@gmail.com> wrote:





On Sep 20, 9:05 am, John McCaskill <jhmccask...@gmail.com> wrote:

On Sep 20, 10:52 am, Darol Klawetter <darol.klawet...@l-3com.com
wrote:

I recently fixed a problem with one of my state machines, and I think
that the cause could be a bug in Xilinx XST. Below is the code that
produced the failure. The state-machine worked most of the time,
though occasionally the 'phaseRamWE' signal would be stuck low, even
though I could see 'phaseRamReadAdr' and 'phaseRamWriteAdr'
incrementing.  I fixed it by explicitly declaring the desired value of
'phaseRamWE' in every state. Notice that all states are defined so it
should recover from any conditions introduced by asynchronous inputs.

Have any of you seen similar behavior? Appears to be an XST bug to me.

snip

I only skimmed the code, but this bit jumped out at me:

            if (pipeFill == 1'b1)                     // 'pipeFill' is
asynchronous to 'clk'

If pipeFill is really asynchronous to the clock that is running your
state machine,  using it without synchronizing it first is going to
cause you problems.  Probably not because of metastability, but
because of coherency problems. If pipeFill is changing near the clock
edge, some FFs will see the new value, and some the old.  This can
cause weird and flaky behaviour.

So, could he have fixed the problem by not writing to both state and
phaseRamWE in S1?  E.g.,

 S1:
    begin
       if (pipeFill == 1'b1)
       begin
          // phaseRamWE <= 1'b1;
          state <= S1A; // was S2
       end
    end

S1A:
     begin
         phaseRamWE <= 1'b1;
         state <= S2;
     end

S2:
   ...

My understanding is that this would still be hazardous because there's
no guarantee that all of the bits composing the 'state' value are
guaranteed to toggle.  Even though there's only one "variable" that
changes in response to pipeFill, the same failure mode still exists as
long as the register itself is more than one bit wide.

Being relatively new to Verilog, it seems that the textbooks warn
extensively about metastability (which as noted below isn't as big a
problem as it used to be) but not much about coherency at all.  I can
see how Darol became complacent, since it's easy to see the S1->S2
transition as an all-or-nothing event...

-- john, KE5FX

John,

The same failure mode would exist, as you suspected. As has been said
before, there are state machines that can tolerate asynchronous inputs
(I thought that my first version of this machine could do so, since I
thought that all states were defined and all states would provide a
path for recovery), but it's simpler, less error-prone, and tool-
independent to just synchronize all inputs.- Hide quoted text -

- Show quoted text -
Not disregarding all the caveats already mentioned, but a better way
to do this would be to set phaseRamWE in S1, and then transition from
S1 to S2 based on phaseRamWE, which is synchronous (though
metastable), instead of pipeFill.

Andy
 
On Sep 27, 11:51 am, Andy <jonesa...@comcast.net> wrote:
Not disregarding all the caveats already mentioned, but a better way
to do this would be to set phaseRamWE in S1, and then transition from
S1 to S2 based on phaseRamWE, which is synchronous (though
metastable), instead of pipeFill.
I'm assuming that you're comparing what you outlined to that of
explicit synchronizers as had been pointed out...so by what measure
could you possibly consider what you listed as a 'better way'?
Different perhaps, but better? Some ways I see it to be worse are:

- Encourages use (and down the road misuse) of asynchronous signals in
a state machine. If the misuse creeps back in, the same flaky
behavior pops back up (hopefully while still in the lab, and not after
having been deployed).
- State machine must be re-written to add a state (or two) if one
changes their mind and wants to have the additional insurance of a
second (or third) synchronizer.
- While only an aesthetics thing, synchronizing signals to a clock has
nothing to do with the state machine logic and does not 'belong'
there. Synchronized inputs are simply a prerequisite for the inputs
to the state machine.

How do you see what you wrote being 'better'?

KJ
 
On Sep 27, 11:15 pm, KJ <kkjenni...@sbcglobal.net> wrote:
I'm assuming that you're comparing what you outlined to that of
explicit synchronizers as had been pointed out...
snip...
How do you see what you wrote being 'better'?

KJ
No, I was trying to say it was better than jmiles' suggestion. I think
I replied to the wrong message...

Explicit synchronization is just as effective, and in many ways
superior to the method I suggested (especially WRT reviewing the
design).

One functional advantage of the method I suggested (I'll call it
embedded synchronization), compared to explicit synchronization, is
that multiple asynchronous inputs can be synchronized with one
synchronous flag register, so long as not more than one of those
asynchronous inputs can change simultaneously (which can cause
glitches in LUT based combinatorial logic).

Another functional advantage of embedded synchronization is that, if
you are re-using a synchronous output you already needed in the first
place (as was the case in my example), and that output needs to be
qualified with the state, then this method reduces the latency from
asynchronous input to synchronized, qualified output. Note that if
adequate timing slack is not available to allow potential
metastability to expire, then you will need an additional
synchronization stage (and additional latency). The qualification
could be combined in the meta-rejection stage by embedding that stage
only, while leaving the initial synchronization stage external to the
state machine.

Additional states are not required for additional stages of
synchronization. Simply add those register assignments to the existing
state (like a shift register), and only transition to the next state
when the last one is set.

Opinions vary on what "belongs" in a state machine. When operations
are simple, including them in a state machine is often easier to
follow than creating the status and control logic necessary to
interface with operations outside the state machine. In this case,
these are simple enough that there would be no extra status/control
specified, but you get the idea (e.g. embedding count logic in the
state machine, etc.) IMHO, if adding the logic in the state is simpler
and easier, then I do it. If it is long and tedious, and detracts from
the "flow" of the state machine, I don't. For instance, do you always
strip nested if statements out of state machines, and externally
generate the flags that will be used as "simple inputs" to the state
machine? I'm not saying either method is wrong or right for you or
anyone else, but I am saying which methods are preferable for me.

Andy
 

Welcome to EDABoard.com

Sponsor

Back
Top