Forking in One-Hot FSMs

K

Kevin Neilson

Guest
Having two bits hot in a one-hot FSM would normally be a bad thing. But
I was wondering if anybody does this purposely, in order to fork, which
might be a syntactically nicer way to have a concurrent FSM. This would
imply that multiple states in the FSM could be active at once. This
would be an example:

parameter STATE1=1, STATE2=2, STATE33,... // state defs
casex (state)
....
if (state[STATE1]) begin
if (condition)
begin
m <= a*b;
state[STATE3] <= 1; // fork into 2 new states
state[STATE4] <= 1;
state[STATE1] <= 0; // leave current state
end
end
if (state[STATE3]) begin // DSP48 Adder Stage
p <= m+c;
state[STATE3] <= 0; // this fork dies
end
if (state[STATE4]) begin
m <= a2*b2;
state[STATE3] <= 1; // fork into 2 new states
state[STATE5] <= 1;
state[STATE4] <= 0; // leave current state
end

In this case I have a pipeline (as in a DSP48) which I can keep
continuously fed. A separate fork of the SM runs the pipeline. I can
turn on two one-hot bits (essentially ORing the states) to fork into
multiple states. One fork eventually kills itself. This might be nicer
than having a separate concurrent FSM. There may be a better syntax
that still allows a case statement. I just wondered if this is a common
or useful technique.
-Kevin
 
Kevin Neilson wrote:
parameter STATE1=1, STATE2=2, STATE3=3,... // state defs
reg [31:0] state;
if (state[STATE1]) begin
if (condition)
begin
m <= a*b;
state[STATE3] <= 1; // fork into 2 new states
state[STATE4] <= 1;
state[STATE1] <= 0; // leave current state
end
end
if (state[STATE3]) begin // DSP48 Adder Stage
p <= m+c;
state[STATE3] <= 0; // this fork dies
end
if (state[STATE4]) begin
m <= a2*b2;
state[STATE3] <= 1; // fork into 2 new states
state[STATE5] <= 1;
state[STATE4] <= 0; // leave current state
end

Sorry; there was not supposed to be a case statement here.
 
Aiken wrote:
But why not combine these two states into one states?and let that
states to do the pipline stuff?
Your coding may let your design slower and may not be implemented as
state machine in the final design.

The example might not show this well, but you may want to fork from
several different states and the length of the fork, before it dies,
could be several states. So if I just have the single state, I would
have to manually branch out through the state tree and figure out all
states I could possibly be in while the "fork" would be operating and
then add the fork logic to all those states. If that makes sense.
Anyway, it's completely unmaintainable, because when you add a new state
to the machine you would have to figure out if the pipeline is supposed
to be full at that time and remember to add in the logic for that.
-Kevin
 
But why not combine these two states into one states?and let that
states to do the pipline stuff?
Your coding may let your design slower and may not be implemented as
state machine in the final design.

On May 2, 2:54 pm, Kevin Neilson <kevin_neil...@removethiscomcast.net>
wrote:
Kevin Neilson wrote:

parameter STATE1=1, STATE2=2, STATE3=3,... // state defs

 > reg [31:0] state;

if (state[STATE1]) begin
  if (condition)
    begin
      m             <= a*b;
      state[STATE3] <= 1;    // fork into 2 new states
      state[STATE4] <= 1;
      state[STATE1] <= 0;    // leave current state
    end
end
if (state[STATE3]) begin     // DSP48 Adder Stage
      p             <= m+c;
      state[STATE3] <= 0;    // this fork dies
end
if (state[STATE4]) begin
      m             <= a2*b2;
      state[STATE3] <= 1;    // fork into 2 new states
      state[STATE5] <= 1;
      state[STATE4] <= 0;    // leave current state
end

Sorry; there was not supposed to be a case statement here.
 
Kevin Neilson wrote:
Having two bits hot in a one-hot FSM would normally be a bad thing.
But I was wondering if anybody does this purposely, in order to fork,
which might be a syntactically nicer way to have a concurrent FSM.
DEC used that style of design in the PDP-16 Register Transfer Modules.
Possibly also in the control units of some of their asynchronous
processors such as the PDP-6 and KA10.
 
Brad Smallridge wrote:

Perhaps someone could suggest a better term than state
machine "forking"? And if there is some guidelines on how
to code and debug pipelined architecture. I'm with Kevin,
it gets real messy, real soon.
There is no requirement that a process/block must update only
a single register named 'State'.

When I look at large, textbook style state machine
examples, like the ones in this thread, I imagine
a much simpler process that updates several smaller registers.
Maybe an input reg, output reg, a couple of counters
and a few well-named booleans.

-- Mike Treseler
 
Eric Smith wrote:
Kevin Neilson wrote:
Having two bits hot in a one-hot FSM would normally be a bad thing.
But I was wondering if anybody does this purposely, in order to fork,
which might be a syntactically nicer way to have a concurrent FSM.

DEC used that style of design in the PDP-16 Register Transfer Modules.
Possibly also in the control units of some of their asynchronous
processors such as the PDP-6 and KA10.
That's interesting--I'm not even familiar with an "asynchronous
processor". What does that mean? -Kevin
 
On May 5, 12:13 pm, "Brad Smallridge" <bradsmallri...@dslextreme.com>
wrote:
You'll also find that changes (like switching the Nobl SRAM to DRAM as an
example) can be accomodated without having to change *everything*.

That has been on my mind because there is a DRAM on my board. Not only
will the DRAM require more cycles but perhaps too a varying number of
cycles depending on the sequentiality or randomness of the addressing.
Except for the most special case examples, DRAM access will be a
variable delay because of page changes and memory refresh.

Trying to design a state machine that is simply trying to *access*
memory for some algorithmic purpose would likely result in a difficult
to maintain design.

Designing a request/acknowledge interface to some other process or
entity (in this case the 'other' being a DRAM controller) results in a
much easier to maintain design.

Using the exact same interface signal functionality whether one is
talking to internal FPGA memory, NoBL or SDRAM or SPI results in a
design that can be reused, retargeted and improved upon if necessary.

Using the same signal naming functionality as an existing documented
specification (i.e. Avalon, Wishbone) allows others to (re)use your
design without getting bogged down in details that they are not
currently interested in and allows them (and you when you re-use the
design) to be more productive.

Figure out where you are and where you want to be in the design
productivity chain. The synthesis cost in terms of logic resource is
zero, the upfront learning cost will start to pay back in the form of
quicker debug and reusable designs.

Kevin Jennings
 
KJ wrote:
On May 5, 12:13 pm, "Brad Smallridge" <bradsmallri...@dslextreme.com
wrote:
You'll also find that changes (like switching the Nobl SRAM to DRAM as an
example) can be accomodated without having to change *everything*.
....
Designing a request/acknowledge interface to some other process or
entity (in this case the 'other' being a DRAM controller) results in a
much easier to maintain design.

Using the exact same interface signal functionality whether one is
talking to internal FPGA memory, NoBL or SDRAM or SPI results in a
design that can be reused, retargeted and improved upon if necessary.
....
Kevin Jennings
This is a great example, because switching from one type of RAM to
another means you *do* have to change everything, if you want the
controller to be good. You can certainly modularlize the code and make
concurrent SMs with handshaking and this is easy to maintain. And a lot
of DRAM controllers are designed this way. But here is the problem:
while you are waiting around for acknowledges, you have just wasted a
bunch of memory bandwidth. If you want to make better use of your
bandwidth, you can't use handshaking. You have to start another burst
while one is in the pipe. You have to look ahead in the command FIFO to
see if the next request is going to be in the same row/bank to see if
you need to close the row during this burst and precharge or if you can
continue in the same open row in a different bank, etc. If I do all
that with handshaking, I'm frittering away cycles. And to do this in a
way that doesn't fritter away cycles with standard methodology means
everything is so tightly bound together that to change from SDRAM to
some other type of RAM means I have to tear up most of the design.

Another issue I came up with today in the design of my current SM is
that I updated a value x and then in the next cycle realized I wanted
the old value of x. But I hadn't really updated x; I had issued a
request that gets put into a matching delay line and then goes to a
concurrent FSM which then updates x. So even though I had "updated" x,
I could still used the old value for a few cycles and didn't need a
temporary storage register. Again, I can't just send the request to
update x and then wait for an ack because the SM has to keep on
trucking. This is confusing, and I'd like to have some sort of
methodology that would be as efficient as what I'm doing but somewhat
more abstract.
-Kevin
 
Someone asked about state machines using encoding similar to one-hot but
with "forking" where multiple states make be active simultaneously,
and I wrote:
DEC used that style of design in the PDP-16 Register Transfer Modules.
Possibly also in the control units of some of their asynchronous
processors such as the PDP-6 and KA10.
Kevin Neilson wrote:
That's interesting--I'm not even familiar with an "asynchronous
processor". What does that mean? -Kevin
There's no central clock. At any given time, one particular "unit"
in the computer is active. When it completes its work, it sends a
pulse to the next unit that needs to do something, thus handing off
control.

In some situations, a unit might trigger two other units. Usually
in such a case, a later unit implements a "join" between the two paths,
by waiting for both to complete.

The logic implementing such a control system looks just like a flowchart.

There were quite a few asynchronous computers in the old days, but
the world settled on synchronous designs for various reasons. In recent
years there has been a resurgence of interest in asynchronous designs,
partly due to the possibility of power savings. There are still no
mainstream asynchronous processors, though.
 
Eric Smith wrote:
(snip)

Kevin Neilson wrote:

That's interesting--I'm not even familiar with an "asynchronous
processor". What does that mean? -Kevin

There's no central clock. At any given time, one particular "unit"
in the computer is active. When it completes its work, it sends a
pulse to the next unit that needs to do something, thus handing off
control.
Sometimes also known as "self timed logic", and probably easier
to search under that name.
(snip)

There were quite a few asynchronous computers in the old days, but
the world settled on synchronous designs for various reasons. In recent
years there has been a resurgence of interest in asynchronous designs,
partly due to the possibility of power savings. There are still no
mainstream asynchronous processors, though.
There are rumors of asynchronous functional modules, such as
multipliers or dividers. That might make more sense in current
systems than a completely asynchronous design.

-- glen
 
On May 7, 12:21 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
Eric Smith wrote:
There were quite a few asynchronous computers in the old days, but
the world settled on synchronous designs for various reasons. In recent
years there has been a resurgence of interest in asynchronous designs,
partly due to the possibility of power savings. There are still no
mainstream asynchronous processors, though.

There are rumors of asynchronous functional modules, such as
multipliers or dividers. That might make more sense in current
systems than a completely asynchronous design.
The industry trend for the last few years have been GALS, globally
Asynchronous, Locally Synchronous for many good reasons, including:
- managing the clock skew across a large design is hard, expensive,
and power hungry.
- it's a natural paradigm when you want to run islands a different
speeds, or even power it down for power savings.

I expect it could also be a life saver, isolating a speed path to just
its island rather than impacting the whole chip.

I don't know how common this is in FPGA design, but the LPRP reference
design uses a handful of clocks.

Tommy
 

Welcome to EDABoard.com

Sponsor

Back
Top