Style of coding complex logic (particularly state machines)

rickman wrote:
KJ wrote:
"Eli Bendersky" <eliben@gmail.com> wrote in message
news:1156743235.737612.294510@i42g2000cwa.googlegroups.com...
KJ wrote:
Don't you run into fanout problems for that single flip-flop that
pushes the sync reset signal to all other FFs in the design, or does
the synthesis tool take care of this ? I tend to use async resets, but
my whole design is usually synchronized to the same clock so there are
no reset problems.

The fanout of the reset signal is the same regardless of whether you use
synchronous or asynchronous resets. In either case, the reset signal still
needs to be synchronized to the clock (see further down for more info) and
in both cases the reset signal itself must meet timing constraints. If the
reset signal doesn't meet timing constraints due to fanout (and the
synthesis tool didn't pick up on this and add the needed buffers
automatically) then most fitters for FPGAs give some method for limiting
fanout with some vendor specific attribute that can be added to the signal.

The fanout of an async reset in an FPGA is not an issue because the
signal is a dedicated net.
My point was that if timing is not met due to the large fanout, that
the typical fitter will allow for the fanout to be limited by the user
if necessary. But to directly answer the original question, 'no' I
haven't had reset signal fanout as a problem but if I did I know I
could fix it by limiting the fanout on the fitter side without having
to change the source code. But I also tend to reset only those things
that really need resetting which, by itself, cuts down on the fanout as
well.

The timing is an issue as all the FFs have
to be released in a way that does not allow counters and state machines
to run part of their FFs before the others. But this can be handled by
ways other than controlling the release of the reset. Typically these
circuits only require local synchronization which can be handled easily
by the normal enable in the circuit. For example most state machines
do nothing until an input arrives. So synchronization of the release
of the reset is not important if the inputs are not asserted. Of
course this is design dependant and you must be careful to analyze your
design in regards to the release of the reset.
I agree.

1. Forgetting (or not realizing) that the reset signal does in fact need to
be synchronized to the clock(s). Whether using async or sync resets in the
design, the timing of the trailing edge of reset must be synchronized to the
appropriate clock. Simply ask yourself, what happens when the reset signal
goes away just prior to the rising edge of the clock and violates the setup
time of a particular flip flop? The answer is that well...you can get
anything....and that each flip flop that gets this signal can respond
differently.....and then what would that state do to you think your 7 state,
one hot, state machine will be in after this clock? Quite possibly you
might find two hot states instead of just one.

That is what I addressed above. Whether the circuit will malfunction
depends on the circuit as well as the inputs preset. It is often not
hard to assure that one or the other prevents the circuit from changing
any state while the reset is released.

But simply synchronizing the reset in the first place will do that as
well...two different approaches to the problem, each equally valid.

Since the dedicated global reset can not be synchronized to a clock of
even moderately high speed, you can provide local synchronous resets to
any logic that actually must be brought out of reset cleanly. I
typically use thee FFs in a chain that are reset to zero and require
three clock cycles to clock a one through to the last FF.
Agreed, but one can also view these locally generated resets as simply
synchronized versions of the original reset. In fact, the synthesizer
would probably do just that seeing that you have (for example) 4 places
throughout the design where you've generated a local reset which is
simply the raw reset signal brought into a flip flop chain (I think
that's what you're describing). So it would take those four instances
and probably generate a single shift chain and local reset signal to
send to those 4 places. So all you've really done is written the
source code for the local reset 3 more times than is needed. Had you
taken the view that the reset input to those 4 places must be a
synchronized reset signal in the first place you probably would've
written the reset shift chain logic one time at a top level and
connected it up to those four inputs yourself and not written it on the
receiver side.

4. Overuse of just which signals really need to be 'reset'. This is
somewhat related to #3 and is also a function of the designer. Some feel
that every blasted flip flop needs to be reset...with no reason that can be
traced back to the specification for what the board is supposed to do, it's
just something 'they always do'. Inside an FPGA this may not matter much
since we're implicitly trusting the FPGA vendors to distribute a noise free
signal that we can use for the async reset, but on a board this can lead to
distributing 'reset' to a whole bunch of devices...which just gives that
signal much more opportunity to pick up the noise mentioned in #3. If
you're lucky, the part that gets the real crappy, noisy reset signal is the
one where you look at the function and realize that no, nothing in here
'really' needs to get reset when the 'reset' signal comes in. At worst
though, you see that yes the reset is needed, and you may start band-aiding
stuff on to the board to get rid of the noise or filter it digitally inside
the device if you can, etc. Bottom line though is that if more (some?)
thought had been put in up front, the reset signal wouldn't have been
distributed with such wild abandon in the first place.

This is not a problem when you use the dedicated reset net.
I agree, but I was referring more to the reset signal distribution on a
board rather than inside an FPGA.

Even though there are FFs that do not need a reset, it does not hurt to put
the entire device in a known state every time.
OK, it doesn't 'hurt', but it doesn't 'help' either in the sense that
both approaches would meet the exact same requirements of the
functional specification for that part.

It is not hard to miss a FF that needs to be reset otherwise.
Inside the FPGA it doesn't matter since if you discover something that
you now realize needs to be reset you re-route and get a new file. Not
routing it to a part on a board and then discovering you need it is a
bit more of an issue. Resolving that issue by routing reset to every
part and then using it asynchronously is where problems have come up
when there are a lot of parts on the board.

Personally I think the noise issue is a red herring.
If it's a red herring than I can safely say that I have slayed several
red herrings over my career...but actually not many of late....not
since a certain couple designers moved on to to greener pastures to be
brutally honest.

If you have noise
problems on the board, changing your reset to sync will not help in
general. You would be much better off designing a board so it does not
have noise problems.
Maybe. But remember the scenario when you're brought in to fix a
problem with an existing board that you trace back to some issue with
reset. In that situation, a programmable logic change is more likely
the more cost effective solution.

KJ
 
KJ wrote:
rickman wrote:
The fanout of an async reset in an FPGA is not an issue because the
signal is a dedicated net.
My point was that if timing is not met due to the large fanout, that
the typical fitter will allow for the fanout to be limited by the user
if necessary. But to directly answer the original question, 'no' I
haven't had reset signal fanout as a problem but if I did I know I
could fix it by limiting the fanout on the fitter side without having
to change the source code. But I also tend to reset only those things
that really need resetting which, by itself, cuts down on the fanout as
well.
I don't know exactly what you mean by fanout. If a sync reset has to
go to 100 FFs, then there is nothing you can do to tell the fitter to
change that. The async reset is free, or actually already paid for, so
if it does the job why not use it?


That is what I addressed above. Whether the circuit will malfunction
depends on the circuit as well as the inputs preset. It is often not
hard to assure that one or the other prevents the circuit from changing
any state while the reset is released.

But simply synchronizing the reset in the first place will do that as
well...two different approaches to the problem, each equally valid.
Both valid, but typically I find the async reset takes less effort and
resources. Only a small portion of my typical design has to be
controlled coming out of reset.


Agreed, but one can also view these locally generated resets as simply
synchronized versions of the original reset. In fact, the synthesizer
would probably do just that seeing that you have (for example) 4 places
throughout the design where you've generated a local reset which is
simply the raw reset signal brought into a flip flop chain (I think
that's what you're describing). So it would take those four instances
and probably generate a single shift chain and local reset signal to
send to those 4 places. So all you've really done is written the
source code for the local reset 3 more times than is needed. Had you
taken the view that the reset input to those 4 places must be a
synchronized reset signal in the first place you probably would've
written the reset shift chain logic one time at a top level and
connected it up to those four inputs yourself and not written it on the
receiver side.
Yes, that is exactly how I think of it, a local sync'd reset. Putting
it where it is needed is both very clear and saves resources. I never
use this in place of the async reset, but rather to supplement it for
synchronization. Much of the logic has to be reset, but very little of
it has to be synchronously released from reset.


This is not a problem when you use the dedicated reset net.
I agree, but I was referring more to the reset signal distribution on a
board rather than inside an FPGA.
I understand, but noise still can upset a sync reset. This is just not
a workable solution to noise.


Even though there are FFs that do not need a reset, it does not hurt to put
the entire device in a known state every time.
OK, it doesn't 'hurt', but it doesn't 'help' either in the sense that
both approaches would meet the exact same requirements of the
functional specification for that part.
I don't agree. By globally resetting the device, you have handled all
FFs so that if your requirement misses one, you don't find out about it
after the unit is in the field.


It is not hard to miss a FF that needs to be reset otherwise.
Inside the FPGA it doesn't matter since if you discover something that
you now realize needs to be reset you re-route and get a new file. Not
routing it to a part on a board and then discovering you need it is a
bit more of an issue. Resolving that issue by routing reset to every
part and then using it asynchronously is where problems have come up
when there are a lot of parts on the board.
The question is when do you find out about the missing reset? It is
easy for this sort of thing to slip totally through testing and only
show up in the users's hands.


Personally I think the noise issue is a red herring.
If it's a red herring than I can safely say that I have slayed several
red herrings over my career...but actually not many of late....not
since a certain couple designers moved on to to greener pastures to be
brutally honest.
I assume you mean board designers who were not producing quiet boards?


If you have noise
problems on the board, changing your reset to sync will not help in
general. You would be much better off designing a board so it does not
have noise problems.
Maybe. But remember the scenario when you're brought in to fix a
problem with an existing board that you trace back to some issue with
reset. In that situation, a programmable logic change is more likely
the more cost effective solution.
I am in a fairly long thread in comp.arch.embedded about how to design
boards so that you don't have SI and EMI issues. I think this sort of
problem should be dealt with before you make the board, not after it is
in the field. Too many engineers learn to cover their butts rather
than to produce good designs. I am tired of working that way and not
really knowing if my design will work before it is shipped. The one
universal rule I learned very early on is that you can not prove a
product works correctly by testing. It has to be designed to work
correctly by using design methods based on understanding what you are
doing. I have never seen a board noise issue that could be fixed by an
FPGA design change.
 
rickman wrote:
KJ wrote:
rickman wrote:
I don't know exactly what you mean by fanout. If a sync reset has to
go to 100 FFs, then there is nothing you can do to tell the fitter to
change that.
Yes you can. If for example, the timing analysis failed because of
reset then you can tell the fitter to limit fanout to say, 20. Then
what the fitter would do is replicate the flip flop that generates the
reset signal so that there are 5 of them and distribute those 5
(logically identical) resets to those 100 loads.

We can debate the extra resource usage of those 4 extra flops or that
maybe there wouldn't have been 100 in the first place, but I think
we've both made our points already.

The async reset is free, or actually already paid for, so
if it does the job why not use it?
At what point do you want to find out that the answer to the question
"if it does the job..." turns out to be "No, it doesn't do the job"
because the designer of some hunk of code that you're integrating in
didn't pay as close attention to resets as they should have and that
the way that the code 'used' the reset, while implying it could be
asynchronous really was not the case and that it needed to be
synchronous after all? (Either that or 'fix' the errent hunk of code
of course).

But simply synchronizing the reset in the first place will do that as
well...two different approaches to the problem, each equally valid.

Both valid, but typically I find the async reset takes less effort and
resources. Only a small portion of my typical design has to be
controlled coming out of reset.
And unless that small portion is actually 'zero' you'll need some
synchronizer somewhere. In that case, I've found that resources
differences is neglibile or non-existent. I'll accept that you may
have seen differences and I don't want to get into the nitty gritty but
I'll bet that those differences that you saw were pretty small as well.
If not, then to what did you attribute the large difference would be
interesting to know.

As for effort, the only effort I see in either case is the coding which
is identical. It's just a question of where you physically put the "if
(Reset = '1) then"....or is there some other effort that you mean?

This is not a problem when you use the dedicated reset net.
I agree, but I was referring more to the reset signal distribution on a
board rather than inside an FPGA.

I understand, but noise still can upset a sync reset. This is just not
a workable solution to noise.
I'm not sure what solution you're referring to. All I'm saying is that
use of a synchronous reset is less susceptible to a noise issue than an
asynchronous one because it requires the noise to be somewhat
coincident with the clock in order for it to have any effect. On a
given board design though that coincidence will tend to either be near
0 or near 100%....but those near 0 ones don't need to be fixed because
they're not broken if used synchronously.

Even though there are FFs that do not need a reset, it does not hurt to put
the entire device in a known state every time.
OK, it doesn't 'hurt', but it doesn't 'help' either in the sense that
both approaches would meet the exact same requirements of the
functional specification for that part.

I don't agree. By globally resetting the device, you have handled all
FFs so that if your requirement misses one, you don't find out about it
after the unit is in the field.
Only if your requirement is that the flip flop be set to the state that
you happened to have coded for it and not the other state. In any
case, it's not sporting to say that one design approach is better
because it has a chance that it just happened to code correctly for a
missing requirement. Actual reset signal requirements are usually
pretty benign and in many cases NOT coding it as a matter of course
could lead one to finding this missing requirement earlier....during
simulation. The scenario I'm thinking of here is that OK, the
functional requirements has an as yet unidentified reset state. Based
on that I code the design and do not do anything to signal 'ABC' as a
result of reset. During simulation I find that I just can't get signal
'ABC' into the proper state (since it is an unknown at the end of
reset) and that I need to because the logic tree that it feeds into
requires 'ABC' to be in the proper state. In that situation the
simulation has immediately hit on to the missing functional requirement
and you can investigate, whereas coding to a specific value you have
the chance of getting it right or not and not finding out until product
is in the field. Starting with 'U' states in simulation and seeing
your system simulation model drive the 'U' out as a result of signals
other than 'reset' is a good indicator of things I've found.

It is not hard to miss a FF that needs to be reset otherwise.
Inside the FPGA it doesn't matter since if you discover something that
you now realize needs to be reset you re-route and get a new file. Not
routing it to a part on a board and then discovering you need it is a
bit more of an issue. Resolving that issue by routing reset to every
part and then using it asynchronously is where problems have come up
when there are a lot of parts on the board.

The question is when do you find out about the missing reset? It is
easy for this sort of thing to slip totally through testing and only
show up in the users's hands.

Simulation and the 'U' value in the std_logic type is the key here I've
found to getting all initialization issues properly identified really
early on, long before prototypes.

Personally I think the noise issue is a red herring.
If it's a red herring than I can safely say that I have slayed several
red herrings over my career...but actually not many of late....not
since a certain couple designers moved on to to greener pastures to be
brutally honest.

I assume you mean board designers who were not producing quiet boards?

And that then had problems that needed to be fixed.

If you have noise
problems on the board, changing your reset to sync will not help in
general. You would be much better off designing a board so it does not
have noise problems.
Maybe. But remember the scenario when you're brought in to fix a
problem with an existing board that you trace back to some issue with
reset. In that situation, a programmable logic change is more likely
the more cost effective solution.

I am in a fairly long thread in comp.arch.embedded about how to design
boards so that you don't have SI and EMI issues. I think this sort of
problem should be dealt with before you make the board, not after it is
in the field.
Totally agree. But being realistic here, if you DO have boards out in
the field with this problem, there is also the issue of what is the
cost effective way to fix the problem from the perspective of both your
company and your customer?

Too many engineers learn to cover their butts rather
than to produce good designs. I am tired of working that way and not
really knowing if my design will work before it is shipped. The one
universal rule I learned very early on is that you can not prove a
product works correctly by testing.
Agreed.

It has to be designed to work
correctly by using design methods based on understanding what you are
doing. I have never seen a board noise issue that could be fixed by an
FPGA design change.
Here's a hypothetical one (but not far from what I've seen) for you
then. You've got a 'blip' on reset where it gets above threshold and
that lasts for...maybe 1 ns at the receiver. You trace it down and
find out exactly what output switching condition is causing the blip to
happen. You can also characterize and analyze it to say that it will
never be able to couple and cause this blip to exist for more than 2
ns. On the receiver you have a clocked device that receives this reset
signal.

The 'proper' solution of course is to re-route the board to get the
reset away from the noise initiator, guard it appropriately,
etc.....the 'soft' design change is to change the code in the receiving
device to ignore resets that last for only two clocks or less (or
whatever works for you). Granted, the reset response of the device has
been degraded (by that clock cycle or two) but in many cases, that's OK
as well. You need to investigate it of course to validate but under
the right circumstances it would work just as flawlessly as the PCB
re-route.

The point being that just because a solution does not tackle the root
cause does not necessarily imply that it is in any way less robust.
And I'll also accept that in some (possibly many) situations there may
be no 'soft' solution...if you'll also accept that in some (possibly
many) situations that there really might.

Now, you've got "N" boards in the field. What is the 'best' solution,
not only from the perspective of your company (presumably the 'soft'
update is easier to distribute) but from your customers as well (who
would have down time to swap out the PCBA....oops, that board is in a
deep sea sensor? On the way to Mars? Inside average Joe user's PC?)

KJ
 
Eli Bendersky wrote:

I also try to avoid variables for another reason (in addition to the
ones you stated). Somehow, when variables are used I can't be 100% sure
if the resulting code is synthesizable, because it can turn out not to
be.
If you mean that a variable does not always infer a register I agree.
If you mean that synthesis does not always produce a netlist that
simulates the same as the code, I disagree.

Additionally, since I do use signals, variables create the mixup of
"update now" and "update later" statements which make the process more
difficult to understand. With signals only it's all "update later".
I agree, and this is exactly why
I do not declare any signals for synthesis.

-- Mike Treseler
 
KJ wrote:
...
The drawback of signals is that take longer simulation time...wasted
time too. I'm trying to resurrect the test code that I had comparing
use of variables versus signals but I seem to remember about a 10% hit
for signals...
I would be interested in whether anyone has theories on why variables
would simulate faster than signals. And whether this behavior has been
seen on different simulators, or only Modelsim.
 
KJ wrote:
5. There was also a post either here or in comp.lang.vhdl in the past couple
months that talked about how using the generally listed template can result
in gated clocks getting synthesized when you have some signals that you want
reset, and other signals that you don't. Being in the same process and all,
the original poster found that gated clocks were being synthesized in order
to implement this logic. The correct form of the template (that rarely gets
used by anyone posting to either this group or the vhdl group) is of the
form
process(clk, reset)
begin
if rising_edge(clk) then
s1 <= Something;
s2 <= Something else;
end if;
if (reset = '1') then
s1 <= '0';
-- s2 does not need to be reset,
end if;
end process;

Again, the scenario here is that you have
- More than one signal being assigned in this process
- At least one of those signals is not supposed to change as a result of
reset (either this is by intent, or by unintentionally forgetting to put the
reset equation)

Depending on the synthesis tool, this could result in a gated clock getting
generated as the clock to signal 's2' in the above example.

KJ
KJ,

I may be the previous poster you are speaking of...

The standard template with "if reset then... elsif rising_edge(clk)
then ..." will not cause a gated clock, but rather a clock enabled
register, disabled during reset, for those signals not reset in the
reset clause. This is also independent of whether reset is coded as a
synchronous or an asynchronous input (because of the elsif). The
template you used above would allow the normal clocked statements to
execute, and then override those signals that are reset, leaving the
unreset ones to retain their normal clocked behavior, thus avoiding the
need to disable them during reset.

Other comments on this thread:

If one disables all retiming and other sequential optimizations, then
there is definite merit in a descriptive style that explicitly
describes combinatorial behavior separately from registered behavior
(i.e combinatorial processes or concurrent statements separate from
clocked processes). But once retiming, etc. are enabled, all bets are
off. In those cases, I believe one is better off focusing on the
behavioral (temporal and logical) description and getting it right, and
not paying so much attention to specific gates and registers which will
not exist in the final result anyway. Since I enable retiming by
default, I use single, clocked processes by default as well.

One aspect that has not been touched upon is data scoping. One
convenient aspect of using variables is that their scope limits their
visibility to within the same process. The comment about "related"
functions being described in the same process is important in this
aspect. There is no need for unrelated functions to have visibility to
unrelated signals. Within "one big process" for the whole architecture,
scoping can be implemented with blocks, functions, or procedures
declared within the process to create islands of related functionality,
with limited visibility. I generally prefer to separate unrelated
functions to separate processes, but all my processes are clocked.

State variables are one such scoping application. I generally don't
want any process but the state machine process itself to have any idea
of what states I am using, and what they mean (the concept of
"information hiding" comes to mind). If I need something external to
happen when the state machine is in a certain state, I do it from
within the state machine process, either by handling it directly (e.g.
adding one to a count), or "exporting" a signal to indicate when it
should happen. The same effect can be accomplished with local signal
declarations inside a block statement that contains the combinatorial
next state process, the output process (if applicable), and the state
register process.

Andy
 
"Martin Gagnon" <martin@yanos.No.SpAm.org> wrote in message
news:slrneeu6qa.8rj.martin@parrot.videotron.ca...
txgen_state_machine_proc:
process(clk, reset_n)
begin
if reset_n = '0' then
prev_state_buf <= st_idle ;
cur_state_buf <= st_idle ;

elsif rising_edge(clk) then
prev_state_buf <= cur_state_buf ;

case cur_state_buf is
when st_idle =
if sync = '1' then
cur_state_buf <= st_gotsync ;
else
cur_state_buf <= cur_state_buf;
end if;
...
You have a lot of unnecessary "else" clauses in your state machine code.
All you need is this:

...
case cur_state_buf is
when st_idle =
if sync = '1' then
cur_state_buf <= st_gotsync ;
end if;
...
In other words, if cur_state_buf=st_idle and sync='0', for example,
cur_state_buf will keep its current value. You don't have to explicitly
reload it with the current value. Synthesis tools will automatically
insert the proper gating logic.

Charles Bailey
 
Variables simulate faster because there is no scheduling of a later
value update, as with signals (signal values do not actually update
until after the assigning process suspends). If the signal has
processes that are sensitive to it (i.e. separate combinatorial and
registered processes), then there is the process invocation overhead as
well.

Most modern simulators also merge all processes that are sensitive to
the same signal(s), to avoid the duplicate overhead of separate process
invocations. Combinatorial processes, because of their widely varying
sensitivity lists, foil this optimization.

By using only clocked processes with variables, one can write
synthesizable RTL that simulates at speeds approaching that of
cycle-based code on cycle accurate simulators.

Andy


Duane Clark wrote:
KJ wrote:
...
The drawback of signals is that take longer simulation time...wasted
time too. I'm trying to resurrect the test code that I had comparing
use of variables versus signals but I seem to remember about a 10% hit
for signals...

I would be interested in whether anyone has theories on why variables
would simulate faster than signals. And whether this behavior has been
seen on different simulators, or only Modelsim.
 
Andy wrote:

I may be the previous poster you are speaking of...

The standard template with "if reset then... elsif rising_edge(clk)
then ..." will not cause a gated clock, but rather a clock enabled
register, disabled during reset, for those signals not reset in the
reset clause. This is also independent of whether reset is coded as a
synchronous or an asynchronous input (because of the elsif).
Exactly. Synthesis will go through asynchronous contortions
to *prevent* a register from being reset.
This is why I reset all registers the same way
and why I don't touch my process template
between _begin_ and _end_.

But once retiming, etc. are enabled, all bets are
off. In those cases, I believe one is better off focusing on the
behavioral (temporal and logical) description and getting it right, and
not paying so much attention to specific gates and registers which will
not exist in the final result anyway.
Well said.

-- Mike Treseler
 
Andy wrote:
KJ wrote:
5. There was also a post either here or in comp.lang.vhdl in the past couple
months that talked about how using the generally listed template can result
in gated clocks
snip
KJ,

I may be the previous poster you are speaking of...

The standard template with "if reset then... elsif rising_edge(clk)
then ..." will not cause a gated clock, but rather a clock enabled
register, disabled during reset, for those signals not reset in the
reset clause.
snip
No, you weren't the one Andy although you and I did discuss resets on
that thread as well. The one I'm referring to is from June 15 in
comp.lang.vhdl called "alternate synchronous process template" started
by "Jens" (all that just in case the link below doesn't work)
http://groups.google.com/group/comp.lang.vhdl/browse_frm/thread/77006ae7297b6e86/?hl=en#

At the time, nobody seemed to dispute Jen's claim that the gated clock
could be created....I dunno, don't use them async resets ;)

Other comments on this thread:

If one disables all retiming and other sequential optimizations, then
there is definite merit in a descriptive style that explicitly
describes combinatorial behavior separately from registered behavior
(i.e combinatorial processes or concurrent statements separate from
clocked processes).
I'm not sure what merit you see in that. I'm describing the
functionality of the entity. If there is some need for what amounts to
a combinatorial function of the current state I'll do it with a
concurrent statement whereas you and Mike T will do it with a variable.
In either case, we would be trying to implement the same function
whether optomizations were on or off.

But once retiming, etc. are enabled, all bets are
off. In those cases, I believe one is better off focusing on the
behavioral (temporal and logical) description and getting it right, and
not paying so much attention to specific gates and registers which will
not exist in the final result anyway.
The "focusing on the behavioral (temporal and logical) description..."
is what I'm focused on as well. I also couldn't care less about
"specific gates and registers which will
not exist in the final result anyway". I'm just trying to get the
function and timing to meet the goal, if it all gets mushed together in
the synthesis process that's fine...that's what I pay for the tool to
do....

Either that or I'm missing what your point is, I've been known to do
that.

One aspect that has not been touched upon is data scoping. One
convenient aspect of using variables is that their scope limits their
visibility to within the same process. The comment about "related"
functions being described in the same process is important in this
aspect. There is no need for unrelated functions to have visibility to
unrelated signals.
I'll agree but add that that is somewhat of a 'religious' statement.
If taken to the other extreme yes you have a huge mass of only global
signals (and I'm not advocating that) but if one breaks the problem
down into manageable sized entities you don't (or should I say, I
don't) tend to have hundreds of signals in the architecture either.
It's a managable size, say from 0 to 2 dozen as a rough guess.

Within "one big process" for the whole architecture,
scoping can be implemented with blocks, functions, or procedures
declared within the process to create islands of related functionality,
This wouldn't address the issue I brought up about the use of
Modelsim's Dataflow window as a debug aid, but OK....my islands of
related functionality are the multiple processes and the concurrent
statements.

with limited visibility. I generally prefer to separate unrelated
functions to separate processes, but all my processes are clocked.
As do I.

State variables are one such scoping application. I generally don't
want any process but the state machine process itself to have any idea
of what states I am using, and what they mean (the concept of
"information hiding" comes to mind).
I would consider that to be a 'religion' thing. I wouldn't draw the
somewhat arbitrary boundary, I consider all of the logic implemented in
an entity to be closely enough related that they can at least talk
amongst themselves if it is helpful to get the overall function
implemented. Not really disagreeing with you, just saying that there
is no reason that relates back to the functional spec that would
justify this hiding so I wouldn't necessarily break them apart unless
the 'process fits on a screen' fuzzy rule starts kicking up.

If I need something external to
happen when the state machine is in a certain state, I do it from
within the state machine process, either by handling it directly (e.g.
adding one to a count), or "exporting" a signal to indicate when it
should happen.
And that tends to muddy the waters somewhat for someone following the
code since they can't perceive the interaction between the state
machine and the outputs all in one fell swoop that they could if it was
put together (and it didn't violate the 'process fitting on a screen'
fuzzy rule.

Good points, I don't necessarily disagree with the idea of local
scoping and information hiding as a general guiding principle but it
can be taken as dogma too that results in hiding things from those who
have a need to know (i.e. those other statements, processes, etc. that
are all within the same entity/architecture).

If you view all of those statements and processes in an entity as being
part of the same 'team' doing their little bit to get the overall
function of the entity implemented none is really more important than
the other, they all live and die together. By that rather crude sports
analogy the idea of 'information hiding' should be taken with some
suspicion. And yes, I realize that VHDL has nothing to do with sports
just thought I'd toss out an unrelated analogy to break up the day.

But which approach one takes is definitely a function of just how 'big'
the function is being implemented. One with hundreds of signals would
be far worse than multiple processes with local variables all scoped
properly. But if you have hundreds of signals I'll bet you have
thousands of lines of code all within one architecture and I'll bet
would be a good candidate for some refactoring and breaking it down
into multiple subentities that could be understood individually instead
of only as some large collective.

KJ
 
Wow, I never even noticed he said "gated clock" in the OP of that other
thread. I have never seen that, just the clock-disabled registers
(which creates a problem when the reset asynchronously asserts, all
mine synchronously deassert anyway).

The synthesis tool is not just trying to keep those unreset registers
from resetting, it is keeping them from doing anything else while the
other registers are reset, which is exactly the way the code simulates,
because of the elsif. Avoiding the elsif by using a parallel if
statement (whether synchronous or asynchronous) at the end avoids the
clock-disables. The main place where I have run into this is when
inferring memories from arrays. The array cannot be reset, otherwise
you get a slew of registers. But if it is in a process with other
registers that do get reset, then that creates a problem, which is
solved by putting the reset clause in parallel, at the end.
Occasionally, resets cause optimization or routing problems when I'm
trying to squeeze the most performance from a design, and I'll remove
the reset from those registers as well if it is not needed. My general
preference is to reset everything though, and I generally use the
traditional form since it will give me a warning if something is not
reset.

I don't take data scoping to a religious level, but I do keep it in
mind, even below the architecture level.

When coding state machines and their outputs, I prefer to see
everything associated with one state in one place. If it is not there,
it does not have visibility of the state anyway, the way I code it.
That way if I change my mind about the organization or naming of the
states, the effects of such a change are limited to one place and one
place only. It is more for maintenance than anything else, to try to
limit the extent to which all those signals are interweaved, and
impossible to untangle. VHDL makes it relatively easy to see what all
the inputs are to a function, but finding all the places where a signal
goes is another matter. That's what the text search function is for...

As to when to isolate different processes in a separate
entity/architecture, that is a touchy-feely type of decision. I
usually know it when I see it, but trying to describe a set of rules
for it is much more difficult than just doing it. Because my coding
styles are generally more compact than those with separate processes
for combo and registered logic, I generally get more in an architecture
before it gets too big. So a lightweight scoping mechanism is useful to
deal with more complexity within a given architecture. Let's just say
it helps keep a borderline too-complex description from overflowing
into multiple entity/architectures.

I like your "what fits on a screen" standard for processes. That seems
to work well for me too. That could be extended to functions and
procedures too, although mine are not usually anywhere near that long,
and they are usually defined within the process anyway.

My point about merits of separate combinatorial and clocked processes
is that most proponents of that style like the fact that they can
easily visualize what is gates and what is registers. I try to
encourage them to lift their visual ceilings (and floors, to some
extent) and focus on behavior since, especially with retiming and other
sequential optimizations, their original description will have little
in common with the synthesis output, except for the behavior which is
often obscured by the separation of registers from gates in the first
place. The same argument applies to using variables for registers
and/or combinatorial logic.

Thanks for the ideas...

Andy


KJ wrote:
Andy wrote:
KJ wrote:
5. There was also a post either here or in comp.lang.vhdl in the past couple
months that talked about how using the generally listed template can result
in gated clocks
snip
KJ,

I may be the previous poster you are speaking of...

The standard template with "if reset then... elsif rising_edge(clk)
then ..." will not cause a gated clock, but rather a clock enabled
register, disabled during reset, for those signals not reset in the
reset clause.
snip
No, you weren't the one Andy although you and I did discuss resets on
that thread as well. The one I'm referring to is from June 15 in
comp.lang.vhdl called "alternate synchronous process template" started
by "Jens" (all that just in case the link below doesn't work)
http://groups.google.com/group/comp.lang.vhdl/browse_frm/thread/77006ae7297b6e86/?hl=en#

At the time, nobody seemed to dispute Jen's claim that the gated clock
could be created....I dunno, don't use them async resets ;)


Other comments on this thread:

If one disables all retiming and other sequential optimizations, then
there is definite merit in a descriptive style that explicitly
describes combinatorial behavior separately from registered behavior
(i.e combinatorial processes or concurrent statements separate from
clocked processes).
I'm not sure what merit you see in that. I'm describing the
functionality of the entity. If there is some need for what amounts to
a combinatorial function of the current state I'll do it with a
concurrent statement whereas you and Mike T will do it with a variable.
In either case, we would be trying to implement the same function
whether optomizations were on or off.

But once retiming, etc. are enabled, all bets are
off. In those cases, I believe one is better off focusing on the
behavioral (temporal and logical) description and getting it right, and
not paying so much attention to specific gates and registers which will
not exist in the final result anyway.
The "focusing on the behavioral (temporal and logical) description..."
is what I'm focused on as well. I also couldn't care less about
"specific gates and registers which will
not exist in the final result anyway". I'm just trying to get the
function and timing to meet the goal, if it all gets mushed together in
the synthesis process that's fine...that's what I pay for the tool to
do....

Either that or I'm missing what your point is, I've been known to do
that.

One aspect that has not been touched upon is data scoping. One
convenient aspect of using variables is that their scope limits their
visibility to within the same process. The comment about "related"
functions being described in the same process is important in this
aspect. There is no need for unrelated functions to have visibility to
unrelated signals.
I'll agree but add that that is somewhat of a 'religious' statement.
If taken to the other extreme yes you have a huge mass of only global
signals (and I'm not advocating that) but if one breaks the problem
down into manageable sized entities you don't (or should I say, I
don't) tend to have hundreds of signals in the architecture either.
It's a managable size, say from 0 to 2 dozen as a rough guess.

Within "one big process" for the whole architecture,
scoping can be implemented with blocks, functions, or procedures
declared within the process to create islands of related functionality,
This wouldn't address the issue I brought up about the use of
Modelsim's Dataflow window as a debug aid, but OK....my islands of
related functionality are the multiple processes and the concurrent
statements.

with limited visibility. I generally prefer to separate unrelated
functions to separate processes, but all my processes are clocked.
As do I.


State variables are one such scoping application. I generally don't
want any process but the state machine process itself to have any idea
of what states I am using, and what they mean (the concept of
"information hiding" comes to mind).
I would consider that to be a 'religion' thing. I wouldn't draw the
somewhat arbitrary boundary, I consider all of the logic implemented in
an entity to be closely enough related that they can at least talk
amongst themselves if it is helpful to get the overall function
implemented. Not really disagreeing with you, just saying that there
is no reason that relates back to the functional spec that would
justify this hiding so I wouldn't necessarily break them apart unless
the 'process fits on a screen' fuzzy rule starts kicking up.

If I need something external to
happen when the state machine is in a certain state, I do it from
within the state machine process, either by handling it directly (e.g.
adding one to a count), or "exporting" a signal to indicate when it
should happen.
And that tends to muddy the waters somewhat for someone following the
code since they can't perceive the interaction between the state
machine and the outputs all in one fell swoop that they could if it was
put together (and it didn't violate the 'process fitting on a screen'
fuzzy rule.

Good points, I don't necessarily disagree with the idea of local
scoping and information hiding as a general guiding principle but it
can be taken as dogma too that results in hiding things from those who
have a need to know (i.e. those other statements, processes, etc. that
are all within the same entity/architecture).

If you view all of those statements and processes in an entity as being
part of the same 'team' doing their little bit to get the overall
function of the entity implemented none is really more important than
the other, they all live and die together. By that rather crude sports
analogy the idea of 'information hiding' should be taken with some
suspicion. And yes, I realize that VHDL has nothing to do with sports
just thought I'd toss out an unrelated analogy to break up the day.

But which approach one takes is definitely a function of just how 'big'
the function is being implemented. One with hundreds of signals would
be far worse than multiple processes with local variables all scoped
properly. But if you have hundreds of signals I'll bet you have
thousands of lines of code all within one architecture and I'll bet
would be a good candidate for some refactoring and breaking it down
into multiple subentities that could be understood individually instead
of only as some large collective.

KJ
 
Mike Treseler wrote:
Eli Bendersky wrote:

I also try to avoid variables for another reason (in addition to the
ones you stated). Somehow, when variables are used I can't be 100% sure
if the resulting code is synthesizable, because it can turn out not to
be.

If you mean that a variable does not always infer a register I agree.
If you mean that synthesis does not always produce a netlist that
simulates the same as the code, I disagree.
Is all code using variables always synthesizable, and can you tell by a
single look how many clock cycles the update of all values take ? I'd
really appreciate a simple example or two.
Thanks in advance
 
Eli Bendersky wrote:
Mike Treseler wrote:
Eli Bendersky wrote:

I also try to avoid variables for another reason (in addition to the
ones you stated). Somehow, when variables are used I can't be 100% sure
if the resulting code is synthesizable, because it can turn out not to
be.

If you mean that a variable does not always infer a register I agree.
If you mean that synthesis does not always produce a netlist that
simulates the same as the code, I disagree.


Is all code using variables always synthesizable, and can you tell by a
single look how many clock cycles the update of all values take ? I'd
really appreciate a simple example or two.
Thanks in advance
In VHDL, a variable is a more "abstract" construct. Unlike a
signal, which is mapped to a wire or a wire with memory (i.e., a latch
or FF). There is no direct hardware counterpart for variable and the
synthesized circuit depends on the context in which the variable is
used.

The variable in VHDL is "static", which means that its value will
be kept between the process invocations. This implies a variable may
need to keep its previous value. Thus, a variable infers a latch or an
FF if it is "used before assigned a value" in a process and infers
a combinational circuit if it is "assigned a value before used".
For this aspect, a variable is usually synthesizable. I personally use
variable in a very restricted way:
- don't use variable to infer memory
- avoid self reference (e.g., n := n+1).
- use it as shorthand for a function.

Although I don't do it, this approach can even be used in a clocked
process and obtain combinational, unbuffered, output (see my previous
post on 1-process FSM example).

In synthesis, the problem is normally the abuse of sequential
statements, rather than the use of variable. I have seen people trying
to convert C segment into a VHDL process (you can have variables, for
loop, while loop, if, case, and even break inside a process) and
expecting synthesis software to figure out everything.

My 2 cents.

Mike G
 
Martin Gagnon wrote:
<snip>
Hi.. I've read this pdf and it's look very interesting.. it's how many
different type of state machine implementations etc.. But the way I code
my state machine is different of all of them and I don't know if it's
good and I'm not sure to which one mine is equivalent.
I'd classify it with the 'one process' state machine folks since it
doesn't involve a combinatorial process to compute the next state
followed by a synchronous process to transform next state into current
state. That's usually the litmus test between the 'one process' versus
'two process' folks.


Here's one of my state machines example.
snip
txgen_state_machine_proc:
process(clk, reset_n)
begin
if reset_n = '0' then
prev_state_buf <= st_idle ;
cur_state_buf <= st_idle ;

elsif rising_edge(clk) then
prev_state_buf <= cur_state_buf ;

case cur_state_buf is
when st_idle =
if sync = '1' then
cur_state_buf <= st_gotsync ;
else
cur_state_buf <= cur_state_buf;
end if;
snip
end case;
end if;
end process;
snip
what do you think about the way I do my state machine ?
Well since you asked...

1. Since the reset input into a state machine almost always needs to be
synchronized I would lean towards using a synchronous reset template
(i.e. take reset out of the sensitivity list and move the "if (reset_n
= '0')" inside the "if rising_edge(clk) then". I almost hesitate to
bring this up since it's already been debated on this thread and
others, so I'll leave it as that's my 2 cents on the reset.

2. The "else cur_state_buf <= cur_state_buf;" construct that shows up
on every case is redundant and makes the overall source code roughly
twice as big as it would otherwise need to be.

Now, it could be argued that adding the else branch makes it 'clearer'
about what state the state machine goes to next but the following
counterarguments could be made as well.
- Read up some more on VHDL (not necessarily you, but the reader that
you're trying to make it 'clearer' to). If left undefined, a signal
will retain it's current state. Add a comment if you need to if you're
trying to guide the new guy that doesn't realize this, but don't double
the code size.
- Even most software languages are defined this way (i.e. if not
explicitly assigned a new value, every variable retains it's current
state).

Some other things to keep in mind...
- Question yourself about which is 'clearer', the 50 line process with
the else branches or the 25 line process without?
- Since the synthesizer will output the exact same output whether you
have the redundant else branches or not, question yourself why are you
explicitly writing lines of code (which have some non-zero probability
of having an error) that will get chucked out the window on step #1 of
synthesis? Is that a good use of time?

Don't read this as harsh criticism, just read it as the 2 cents of
input that you asked for (and that nobody til now responded to that I
can see).

KJ
 
On 2006-08-29, KJ <Kevin.Jennings@Unisys.com> wrote:
Martin Gagnon wrote:
snip

Hi.. I've read this pdf and it's look very interesting.. it's how many
different type of state machine implementations etc.. But the way I code
my state machine is different of all of them and I don't know if it's
good and I'm not sure to which one mine is equivalent.
I'd classify it with the 'one process' state machine folks since it
doesn't involve a combinatorial process to compute the next state
followed by a synchronous process to transform next state into current
state. That's usually the litmus test between the 'one process' versus
'two process' folks.


Here's one of my state machines example.
snip
txgen_state_machine_proc:
process(clk, reset_n)
begin
if reset_n = '0' then
prev_state_buf <= st_idle ;
cur_state_buf <= st_idle ;

elsif rising_edge(clk) then
prev_state_buf <= cur_state_buf ;

case cur_state_buf is
when st_idle =
if sync = '1' then
cur_state_buf <= st_gotsync ;
else
cur_state_buf <= cur_state_buf;
end if;
snip
end case;
end if;
end process;
snip
what do you think about the way I do my state machine ?

Well since you asked...

1. Since the reset input into a state machine almost always needs to be
synchronized I would lean towards using a synchronous reset template
(i.e. take reset out of the sensitivity list and move the "if (reset_n
= '0')" inside the "if rising_edge(clk) then". I almost hesitate to
bring this up since it's already been debated on this thread and
others, so I'll leave it as that's my 2 cents on the reset.

2. The "else cur_state_buf <= cur_state_buf;" construct that shows up
on every case is redundant and makes the overall source code roughly
twice as big as it would otherwise need to be.

Now, it could be argued that adding the else branch makes it 'clearer'
about what state the state machine goes to next but the following
counterarguments could be made as well.
- Read up some more on VHDL (not necessarily you, but the reader that
you're trying to make it 'clearer' to). If left undefined, a signal
will retain it's current state. Add a comment if you need to if you're
trying to guide the new guy that doesn't realize this, but don't double
the code size.
- Even most software languages are defined this way (i.e. if not
explicitly assigned a new value, every variable retains it's current
state).

Some other things to keep in mind...
- Question yourself about which is 'clearer', the 50 line process with
the else branches or the 25 line process without?
- Since the synthesizer will output the exact same output whether you
have the redundant else branches or not, question yourself why are you
explicitly writing lines of code (which have some non-zero probability
of having an error) that will get chucked out the window on step #1 of
synthesis? Is that a good use of time?

Don't read this as harsh criticism, just read it as the 2 cents of
input that you asked for (and that nobody til now responded to that I
can see).

KJ
Thanks for your 2 cents.. I like to have that kind of feed back.. It's
true.. I specified the else to be explicit and because I was not
absolutly sure what the synthetizer ouput in that case. But I think you
are right.. I will probably cut some explicit code on my next
projects..

I'm more concerned about my single process state machine with ma
prev_state and current_state with everything clocked.. As oposed to what
is suggested in the PDF..

Thanks for your answer..

--
Martin
 
Eli Bendersky wrote:

Is all code using variables always synthesizable, and can you tell by a
single look how many clock cycles the update of all values take ?
The variables are updated every clock
but that "update" may be to keep the
same value.

I'd
really appreciate a simple example or two.
The advantages of a variable logic description
*increase* with complexity, so a persuasive
yet simple example is a challenge.


My favorite simple example is the
"clock enabled counters" source here:

http://home.comcast.net/~mike_treseler/

The focus is on updating values for simulation
rather than recipes for gates and flops.
The procedure "update_regs"
only describes value updates
required for the slow, medium and fast counts.
Note that I read carry bits and immediately clear
them without worrying about what that means in gates or flops.

Note in the RTL schematic view (object) that synthesis does
just fine working out how the carries and enables
work and where registers are not needed.
Also note that a process-per-block description using this view
would be more complicated than the example source.

-- Mike Treseler
 
I'm more concerned about my single process state machine with ma
prev_state and current_state with everything clocked.. As oposed to what
is suggested in the PDF..
The two process folks (those advocating the combinatorial 'next' state
process followed by the clocked process) and the one process folks
(those advocating what you've posted, plus possible use of concurrent
statements for combinatorial logic if required) all do agree on one
point:

Either method will produce identical functioning code that will
synthesize to the exact same output design.

With at least grudging agreement, the two process folks will also have
to agree that the one process approach requires less lines of source
code entry.

Take those two points as the two great truths in the 'great debate' and
draw your own conclusions.

All the other stuff about setting outputs a clock cycle earlier or
later, localizing references to signals, sim time using variables or
signals, etc. are just more interesting talk and tips but should have
no effect on which overall approach you adopt.

KJ
 
mikegurche@yahoo.com wrote:
In synthesis, the problem is normally the abuse of sequential
statements, rather than the use of variable. I have seen people trying
to convert C segment into a VHDL process (you can have variables, for
loop, while loop, if, case, and even break inside a process) and
expecting synthesis software to figure out everything.
Personally I think most problems in using HDLs in this way come not
directly from the way signals or variables are used, but rather from
the use of an HDL to describe the solution in an abstract way. I
nearly always design in terms of registers and "clouds" for the logic.
I get a feel for how large the design is and if I need to optimize at
this block diagram level. I can even get an idea of how complex the
logic part is by looking at the equations that describe it. Then I use
an HDL to "describe" the hardware rather than describing the
functionality and letting the tool decide what hardware to invoke.

If I know I want a register, I add the code that will infer a register.
If I need a certain logic, I can include those equations in the
register process or I can use combinatorial descriptions separately. I
never start writing the HDL before I have a clear understanding of what
the hardware should look like. To me the HDL is just the lowest level
description of a sucessive decomposition of the design. The HDL is
never used to "program" a solution. This seldom results in the types
of problems you are discussing.

Just for the record, I do use integer variables for memory or other
sequential logic like counters. Memories simulate much faster when
coded with integer variables. This is both because of the integer and
the variable, IIRC.
 
rickman wrote:
mikegurche@yahoo.com wrote:
In synthesis, the problem is normally the abuse of sequential
statements, rather than the use of variable. I have seen people trying
to convert C segment into a VHDL process (you can have variables, for
loop, while loop, if, case, and even break inside a process) and
expecting synthesis software to figure out everything.

Personally I think most problems in using HDLs in this way come not
directly from the way signals or variables are used, but rather from
the use of an HDL to describe the solution in an abstract way. I
nearly always design in terms of registers and "clouds" for the logic.
I get a feel for how large the design is and if I need to optimize at
this block diagram level. I can even get an idea of how complex the
logic part is by looking at the equations that describe it. Then I use
an HDL to "describe" the hardware rather than describing the
functionality and letting the tool decide what hardware to invoke.
I agreed with you completely. What I am trying to say is that variable
may not be synthesizable if you write the code with a "C
mentality."

Mike G.
 
rickman wrote:
mikegurche@yahoo.com wrote:
In synthesis, the problem is normally the abuse of sequential
statements, rather than the use of variable. I have seen people trying
to convert C segment into a VHDL process (you can have variables, for
loop, while loop, if, case, and even break inside a process) and
expecting synthesis software to figure out everything.

Personally I think most problems in using HDLs in this way come not
directly from the way signals or variables are used, but rather from
the use of an HDL to describe the solution in an abstract way. I
nearly always design in terms of registers and "clouds" for the logic.
I get a feel for how large the design is and if I need to optimize at
this block diagram level. I can even get an idea of how complex the
logic part is by looking at the equations that describe it. Then I use
an HDL to "describe" the hardware rather than describing the
functionality and letting the tool decide what hardware to invoke.
I agreed with you completely. What I am trying to say is that variable
may not be synthesizable if you write the code with a "C
mentality."

Mike G.
 

Welcome to EDABoard.com

Sponsor

Back
Top