need a cheap student edition FPGA

Kevin,

Having two bits hot in a one-hot FSM would normally be a bad thing. But I
was wondering if anybody does this purposely, in order to fork, which
might be a syntactically nicer way to have a concurrent FSM.
I wonder about this too. I am currently doing a pipeline and
some code is shown below. So I wrote out the states without
an array so when the ModelSim comes up I don't have to expand
the states to see them. I also group signals that I want to
see associated with the states in the declarative region so
I don't have to futz too much with the ModelSim.

Some states stay on longer with the state_count condition.
I read one header record on state31 and follow it with three
of another type of data record with state32.

Now the "branching" happens because these states address
a NoBL SRAM and there is a two cycle lag between the
address and the data. Not show below, I also have clock
delays on these states, state32_1, state32_2, and so on
so when the address goes out on state32, I then have data
to process on state32_2.

In my Zen thinking about this,
I always have a When state associated with every What.

It actually gets deeper than that because there are FIFOs
involved as well. You'll need FIFOs in your design if
you are going to tackle a Sobel function. Here the trick
is to start thinking about your processses starting from
the READ data and figure out how many delays you need to
deliver an answer, then figure out where the WRITE data
should marry into the flow. I have now states out to _7.

Perhaps someone could suggest a better term than state
machine "forking"? And if there is some guidelines on how
to code ane debug pipelined architecture. I'm with Kevin,
it get's real messy, real soon.

Brad Smallridge
AiVision

inner_cell_state_machine:process(clk)
begin
if(clk'event and clk='1')then
inner_cell_restart <='0';
if(reset='1')then
state30<='0'; state31<='0'; state32<='0'; state33<='0';
state34<='0'; state35<='0'; state36<='0'; state37<='0';
state38<='0';state39<='0';
inner_cell_rd_ad <= std_logic_vector(to_unsigned(inner_cell_start,18));
inner_cell_wr_ad <= std_logic_vector(to_unsigned(inner_cell_start,18));
state_count <= (others=>'0');
elsif(state29='1')then -- state29 automatically turns off from
init_state_machine
state30<='1';
--State30 Initial inner cell state
elsif(state30='1')then
state30<='0';
state31<='1';
state_count <= (others=>'0');
-- State31 Read the inner cell
elsif(state31='1')then
state31 <='0';
state32<='1';
inner_cell_rd_ad <= inner_cell_rd_ad+1;
-- State32 Read the inner cell connections
elsif(state32='1')then
if(state_count=2)then
state32<='0';
state33<='1';
state_count <= (others=>'0');
else
state_count <= state_count+1;
end if;
inner_cell_rd_ad <= inner_cell_rd_ad+1;
-- State33 Wait for SRAM to deliver first connection
elsif(state33='1')then
state33<='0';
state34<='1';
state_count <= (others=>'0');
-- State34 Read connection
elsif(state34='1')then
if(state_count=2)then
state34<='0';
state35<='1';
state_count <= (others=>'0');
else
state_count <= state_count+1;
end if;
. . .
 
"Brad Smallridge" <bradsmallridge@dslextreme.com> wrote in message
news:Y9MSj.9245$sd4.3805@fe109.usenetserver.com...
Kevin,

Having two bits hot in a one-hot FSM would normally be a bad thing.
'One hot' is a particular implementation of a FSM, but from a logic
perspective (i.e. how you go about designing your state machine, the states
needed, the branching, etc.) means absolutely nothing.

But I was wondering if anybody does this purposely, in order to fork,
which might be a syntactically nicer way to have a concurrent FSM.
'Concurrent' state machines though is simply another way of saying state
machines that either are totally independent of one another, or only loosely
connected (i.e. there is some signalling going on between the two, but you
can *usually* futz with one without breaking the other).

As I mentioned in more detail in my response on 'Style for Highly-Pipelined
State Machines', I only really see two basic approaches:

- The first is some form of counting or adding states where you have a known
fixed number of states between 'doing this' and 'doing that'. This method
works but it really quickly leads to rather complicated code that is
difficult to understand and (because of the complexity) probably has some
logic holes as well that may take some time to surface. In certain designs
though this method is just fine and the results are fine and easy to
maintain. The problem though is when the realization sets in that the code
is getting out of control and how to manage it (which was the point of the
other thread).

- The second method uses request/acknowledge handshaking between the
'concurrent' state machines. This method scales very nicely from a design
perspective and is just as efficient from an implementation perspective as
well.

Bottom line here is to realize that a 'fork in an FSM' is really a call to
think of it as two separate state machines that have a
communication/signalling requirement and don't try to force your mental
model as being 'one' state machine. After all, the entire design can be
considered to be a single state machine...but it is generally of no use to
think of it that way from a design perspective.

<snip>
Now the "branching" happens because these states address
a NoBL SRAM and there is a two cycle lag between the
address and the data. Not show below, I also have clock
delays on these states, state32_1, state32_2, and so on
so when the address goes out on state32, I then have data
to process on state32_2.

In my Zen thinking about this,
I always have a When state associated with every What.

It actually gets deeper than that because there are FIFOs
involved as well. You'll need FIFOs in your design if
you are going to tackle a Sobel function. Here the trick
is to start thinking about your processses starting from
the READ data and figure out how many delays you need to
deliver an answer, then figure out where the WRITE data
should marry into the flow. I have now states out to _7.
Try looking at it now from a somewhat different perspective. Let's say you
have one state machine who's sole purpose is to generate read and write
commands and addresses to the memory but not to process the data at all. In
addition there is a second state machine who's sole purpose is to process
the data that gets read back from memory and produce some sort of result,
that maybe goes to memory, maybe goes somewhere else, it doesn't matter.

If the 'address generator' state machine needs the results from some
computation in order to procede on, then it sends a read request to the
'data processor' state machine, and waits until it gets the acknowledge. It
may sound queer, but what that does is allows you to design with any sort of
latency and does not require any apriori knowledge of how many clock ticks
it will take to get that result back, the address generator state machine is
waiting for the acknowledge back from the data processor state machine
(which in turn is waiting for the data to come back from the memory).

Now you could argue that the data processor state machine can't just
*process data*, it most likely needs to know what it is supposed to be doing
with it, and that knowledge likely lives in the address generator state
machine. Fair enough, but all that means is that the address generator
needs to be able to send commands over to the data processor. This could be
done in an ad hoc manner by setting some signal (or more likely multiple
signals) that are inputs to the data processor. This method works fine.
You can also view this interface somewhat more abstractly as the data
processor having a command port that is written to by the address
generator....and again, a simple request/acknowledge handshake is all you
need here as well. In many cases though the simple signal(s) is sufficient,
but in other cases, the two state machines interact a bit more closely and a
more well defined communication channel between the two will clear up a lot
from the perspective of getting to a functional design that is easy to
maintain.

Perhaps someone could suggest a better term than state
machine "forking"?
Separate state machines that signal each other in some fashion.

And if there is some guidelines on how
to code ane debug pipelined architecture. I'm with Kevin,
it get's real messy, real soon.
Ponder a bit on breaking up things as I've suggested starting with something
small and see how it comes out. It may take a bit to get used to, but the
end result is smaller easier to understand and debug state machines (albeit
more of them) that communicate over well defined internal interfaces.
You'll also find that changes (like switching the Nobl SRAM to DRAM as an
example) can be accomodated without having to change *everything*.

Kevin Jennings
 
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fvfm29$on81@cnn.xsj.xilinx.com...
KJ wrote:
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fv7i38$69n6@cnn.xsj.xilinx.com...
My question: what is the cleanest way to describe an FSM requiring
pipelining?
...
The other thing to consider is whether the latency being introduced by
this outsourced logic needs to be 'compensated for' in some fashion or is
it OK to simply wait for the acknowledge. In some instances, it is fine
for the FSM to simply wait in a particular state until the acknowledge
comes back. In others you need to be feeding new data into the
hunk-o-logic on every clock cycle even though you haven't got the results
from the first back. In that situation you still have the req/ack pair
but now the ack is simply saying that the request has been accepted for
processing, the actual results will be coming out later. Now the
hunk-o-logic needs an additional output to flag when the output is
actually valid. This output data valid signal would typically tend to
feed into a separate FSM or some other logic (i.e. 'usually' not the
first FSM). The first FSM controls feeding stuff in, the second FSM or
other processing logic is in charge of taking up the outputs and doing
something with it.

...

Kevin Jennings
In this case I do indeed have to continue to keep the pipe full, so
inserting wait states is not an option. And the latency of the "hunk of
logic", aka concurrent process, is actually significant because I have to
get the result and feed it back into the FSM. This example shows why:

STATE2: begin
if (condition)
begin
state <= STATE3;
y <= (a*b+c)*d;
end
end

I have to get the result (a*b+c) and then feed it back into the FSM so I
can multiply by d. Why not just let the concurrent process handle that?
Because I want to limit my resource usage to a single DSP48, so I have to
schedule the multiplications inside the FSM. But I'll have to check out
the Wishbone thing you're talking about.
Well, just the fact that you're time sharing the DSP48 means that you're not
processing something new on every clock cycle which just screams out to me
that you'd want to implement this with a request/acknowledge type of
framework. Consider having a black box that has two logical interfaces
called 'inp' and 'out'. The 'inp' interface will be written to by some
external thingy and provide 'a', 'b', 'c' and 'd' inputs. The black box
will compute "y <= (a*b+c)*d;" and make it available on the 'out'
interface via a read command. Using Avalon naming nomenclature then this
black box will have the following set of signals:

inp_write: input
inp_writedata: input vector
inp_waitrequest: output

out_read: input
out_readdata: output vector
out_waitrequest: output

What this black box would do is provide the output of the calculation "y <=
...." on the signal 'out_readdata' in response to a read request on
'out_read'. If the calculation has not been completed (maybe 'a', 'b' and
'c' aren't even available yet), the output signal 'out_waitrequest' would be
set. So from the perspective of someone trying to use this black box, they
would simply set 'out_read' active and wait until 'out_waitrequest' is not
active. On that one particular clock cycle, 'out_readdata' has the data.

Now in order to perform the calculation the black box needs 'a', 'b', 'c'
and 'd'. For simplicity, I'll assume that they all become available at the
same time. At that time the external thingy talking to the black box will
set 'inp_write' active and 'inp_writedata' to contain 'a', 'b', 'c' and 'd'
(don't constrain yourself into thinking of this as being 'bytes' or 'words',
etc.). If the black box in turn sets the 'inp_waitrequest' output to a 1
then it means that the external thingy needs to hold 'inp_write' and
'inp_writedata' without changing them because the black box, for whatever
reason, is not quite ready (maybe because the results of the previous
calculation have not yet been read as an example).

The 'external thingy' that controls the black box, is possibly your state
machine that is basically signalling the black box when it has new data to
process. The interface between these two then consists of the above set of
well defined signals. The state machine doesn't need to know or care
explicitly about what the actual latency is in getting through the black
box. In order for your system to operate properly, you may need to have
some latency requirement, but the point here is that the controlling state
machine can be oblivious to what that latency actually is.

Now turn to the black box. Since you want to share use of the DSP48 so that
it gets reused this means that there will be a state machine inside it in
some fashion. After receiving new data on the 'inp' interface, the black
box would set the inp_waitrequest active on the next clock cycle to prevent
subsequent writes from occurring until the black box is ready to accept more
data. Then you go through your calculation and a couple clock cycles later
you've finally computed the requested output....now what? You've got the
output needed for 'out_readdata' to send out with the result. Up until that
point, 'out_waitrequest' would be active to indicate that the output is not
available. But once it is, the black box would drop 'out_waitrequest'. If
'out_read' is active then the result has been passed along, if not then you
need to hold on to the result until 'out_read' does get set.

It all might sound complicated and that it will take up considerable logic
resources to implement but in fact it doesn't. When all the dust has
settled the logic resources used up in the part will be pretty much
identical to any other scheme you can come up with (such as counters or
extra states in a state machine). In exchange you'll get a much easier to
understand and maintain design with less chances for having some logic hole
that occurs under only very infrequent conditions which is very easy to get
when you have a convoluted difficult to follow single state machine.

Lastly, I've used the Avalon naming convention but Wishbone is logically
identical, instead of a 'waitrequest' they have 'acknowledge' which are
logically just the not() of one other. Avalon is better when it comes to
dealing with read cycle latency, it defines a specific signal for this
whereas Wishbone doesn't directly handle this at all but has some additional
signals that can be used for any purpose, one of which can be to handle read
cycle latency.

Kevin Jennings
 
"Jack" <love.theme@gmail.com> wrote in message
news:22b812fb-4c97-4af8-83bb-60c09189c5c9@i36g2000prf.googlegroups.com...
Hi ~
I have some question about assign.

ex)

output a;
output b;
output c;

reg a;
reg b;
reg [2:0] count;
wire c;

assign c = (a || b) ? count + 1 : 0

Here, a and b was used to input.
I think that it's not good. but some other people said that good.
I don't know that it's is right or wrong ?

If you declare them as outputs, and then don't drive them to any
value in particular, that's not likely to be good.
 
"Jack" <love.theme@gmail.com> wrote in message
news:22b812fb-4c97-4af8-83bb-60c09189c5c9@i36g2000prf.googlegroups.com...
Hi ~
I have some question about assign.

ex)

output a;
output b;
output c;

reg a;
reg b;
reg [2:0] count;
wire c;

assign c = (a || b) ? count + 1 : 0

Here, a and b was used to input.
I think that it's not good. but some other people said that good.
I don't know that it's is right or wrong ?

If you declare them as outputs, and then don't drive them to any
value in particular, that's not likely to be good.
 
You'll also find that changes (like switching the Nobl SRAM to DRAM as an
example) can be accomodated without having to change *everything*.
That has been on my mind because there is a DRAM on my board. Not only
will the DRAM require more cycles but perhaps too a varying number of
cycles depending on the sequentiality or randomness of the addressing.

I have seen controllers on the Xilinx site, but nothing, that talks about
several ports, and how the hand shaking is handled. My FAE has said that
some multiport examples are availble.

Brad Smallridge
AiVision
 
You'll also find that changes (like switching the Nobl SRAM to DRAM as an
example) can be accomodated without having to change *everything*.
That has been on my mind because there is a DRAM on my board. Not only
will the DRAM require more cycles but perhaps too a varying number of
cycles depending on the sequentiality or randomness of the addressing.

I have seen controllers on the Xilinx site, but nothing, that talks about
several ports, and how the hand shaking is handled. My FAE has said that
some multiport examples are availble.

Brad Smallridge
AiVision
 
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fvo2o9$p0m1@cnn.xsj.xilinx.com...
KJ wrote:
On May 5, 12:13 pm, "Brad Smallridge" <bradsmallri...@dslextreme.com
wrote:
You'll also find that changes (like switching the Nobl SRAM to DRAM as
an
example) can be accomodated without having to change *everything*.
...
Designing a request/acknowledge interface to some other process or
entity (in this case the 'other' being a DRAM controller) results in a
much easier to maintain design.

Using the exact same interface signal functionality whether one is
talking to internal FPGA memory, NoBL or SDRAM or SPI results in a
design that can be reused, retargeted and improved upon if necessary.
...
Kevin Jennings

This is a great example, because switching from one type of RAM to another
means you *do* have to change everything, if you want the controller to be
good.
The methodology I use makes use of every clock cycle, DRAMs are running full
tilt, transfers from fast FPGA through a PCI bus to some other processor,
etc., the whole 9 yards.

You can certainly modularlize the code and make concurrent SMs with
handshaking and this is easy to maintain. And a lot of DRAM controllers
are designed this way. But here is the problem: while you are waiting
around for acknowledges, you have just wasted a bunch of memory bandwidth.
Then you're waiting for the wrong acknowledgement. Taking the DRAM again as
an example, every data transfer consists of two parts: address/command and
data. During a memory write, all of this happens on the same clock cycle.
When the controller 'fills up' it sets the wait request to hold off until it
can accept more commands (reads or writes).

During a read though, the address/command portion happens on one clock
cycle, the actual delivery of the data back to the requestor occurs sometime
later. The state machine that requests the read does not necessarily have
to wait for the data to come back before starting up the next read. The
acknowledge that comes back from a 'memory read' command is that the request
to read has been accepted, another command (read or write) can now be
started. There are also situations where one really does need to wait until
the data is returned to continue on, but in many data processing
applications, the data can lag significantly with no real impact on
performance, the read requests can be queued up as fast as the controller
can accept them.

Although I've been using the DRAM as an example, nothing in the handshaking
or methodology is 'DRAM specific', it is simply having to do with
transmitting information (i.e. writing) and requesting information (i.e.
reading) and having a protocol that separates the request for information
from the delivery of that information (i.e. specifically allowing for
latency and allowing multiple commands to be queued up).

If you want to make better use of your bandwidth, you can't use
handshaking.
I disagree.

You have to start another burst while one is in the pipe.
That's correct...but you can't start one if the pipe is full (which can
happen when a memory refresh or a page hit occurs and the pipe fills up
waiting while those things get serviced). The handshake tells you that the
pipe is full and you absolutely need to have it. The 'pipe full' signal is
a handshake, when it is full, it says 'wait', when it is not full, it says
'got it'.

You have to look ahead in the command FIFO to see if the next request is
going to be in the same row/bank to see if you need to close the row
during this burst and precharge or if you can continue in the same open
row in a different bank, etc.
OK

If I do all that with handshaking, I'm frittering away cycles.
Then you're not doing it properly. It's called pipelining, not frittering.

And to do this in a way that doesn't fritter away cycles with standard
methodology means everything is so tightly bound together that to change
from SDRAM to some other type of RAM means I have to tear up most of the
design.
Latency can matter in certain situations, in others it doesn't. If there is
some situation where latency mattered, one would have to come up with a way
where the requestor could start up the read cycle earlier...but if there is
such a way to start it up earlier, then that change could be applied equally
well to the lower latency situation as well which means that you could have
a common design

Kevin Jennings
 
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fvo2o9$p0m1@cnn.xsj.xilinx.com...
KJ wrote:
On May 5, 12:13 pm, "Brad Smallridge" <bradsmallri...@dslextreme.com
wrote:
You'll also find that changes (like switching the Nobl SRAM to DRAM as
an
example) can be accomodated without having to change *everything*.
...
Designing a request/acknowledge interface to some other process or
entity (in this case the 'other' being a DRAM controller) results in a
much easier to maintain design.

Using the exact same interface signal functionality whether one is
talking to internal FPGA memory, NoBL or SDRAM or SPI results in a
design that can be reused, retargeted and improved upon if necessary.
...
Kevin Jennings

This is a great example, because switching from one type of RAM to another
means you *do* have to change everything, if you want the controller to be
good.
The methodology I use makes use of every clock cycle, DRAMs are running full
tilt, transfers from fast FPGA through a PCI bus to some other processor,
etc., the whole 9 yards.

You can certainly modularlize the code and make concurrent SMs with
handshaking and this is easy to maintain. And a lot of DRAM controllers
are designed this way. But here is the problem: while you are waiting
around for acknowledges, you have just wasted a bunch of memory bandwidth.
Then you're waiting for the wrong acknowledgement. Taking the DRAM again as
an example, every data transfer consists of two parts: address/command and
data. During a memory write, all of this happens on the same clock cycle.
When the controller 'fills up' it sets the wait request to hold off until it
can accept more commands (reads or writes).

During a read though, the address/command portion happens on one clock
cycle, the actual delivery of the data back to the requestor occurs sometime
later. The state machine that requests the read does not necessarily have
to wait for the data to come back before starting up the next read. The
acknowledge that comes back from a 'memory read' command is that the request
to read has been accepted, another command (read or write) can now be
started. There are also situations where one really does need to wait until
the data is returned to continue on, but in many data processing
applications, the data can lag significantly with no real impact on
performance, the read requests can be queued up as fast as the controller
can accept them.

Although I've been using the DRAM as an example, nothing in the handshaking
or methodology is 'DRAM specific', it is simply having to do with
transmitting information (i.e. writing) and requesting information (i.e.
reading) and having a protocol that separates the request for information
from the delivery of that information (i.e. specifically allowing for
latency and allowing multiple commands to be queued up).

If you want to make better use of your bandwidth, you can't use
handshaking.
I disagree.

You have to start another burst while one is in the pipe.
That's correct...but you can't start one if the pipe is full (which can
happen when a memory refresh or a page hit occurs and the pipe fills up
waiting while those things get serviced). The handshake tells you that the
pipe is full and you absolutely need to have it. The 'pipe full' signal is
a handshake, when it is full, it says 'wait', when it is not full, it says
'got it'.

You have to look ahead in the command FIFO to see if the next request is
going to be in the same row/bank to see if you need to close the row
during this burst and precharge or if you can continue in the same open
row in a different bank, etc.
OK

If I do all that with handshaking, I'm frittering away cycles.
Then you're not doing it properly. It's called pipelining, not frittering.

And to do this in a way that doesn't fritter away cycles with standard
methodology means everything is so tightly bound together that to change
from SDRAM to some other type of RAM means I have to tear up most of the
design.
Latency can matter in certain situations, in others it doesn't. If there is
some situation where latency mattered, one would have to come up with a way
where the requestor could start up the read cycle earlier...but if there is
such a way to start it up earlier, then that change could be applied equally
well to the lower latency situation as well which means that you could have
a common design

Kevin Jennings
 
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fvo47b$on52@cnn.xsj.xilinx.com...
KJ wrote:
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fvfm29$on81@cnn.xsj.xilinx.com...
KJ wrote:
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fv7i38$69n6@cnn.xsj.xilinx.com...
My question: what is the cleanest way to describe an FSM requiring

Well, just the fact that you're time sharing the DSP48 means that you're
not processing something new on every clock cycle which just screams out
to me that you'd want to implement this with a request/acknowledge type
of framework. ...

Kevin Jennings
But I *do* have to process something on every cycle.
You're not able to process a new set of 'a', 'b', 'c' and 'd' on every clock
cycle since the DSP48 is time shared (by your choice) and that was my point.
Time multiplexing the DSP48 to keep *it* busy on every clock cycle is not
the same thing.

Consider that I have to process these two equations:

y0 <= (a0*b0+c0)*d0;
y1 <= (a1*b1+c1);

Now, if you look at the structure of the DSP48, you can see that I can't
even process these two sequentially. I can send off (a0*b0+c0)*d0 to the
black box thingy you speak of, but this can't be processed without dead
cycles: I have to get the result of (a0*b0+c0) before I multiply it with
d0, and if the DSP48 is fully pipelined, that means that the multiplier is
unused for three cycles.
That's only true if the addition can't be done combinatorially. If it can
then the calculation of 'y0' takes two clock cycles and the DSP48 is fully
utilized. The answer pops out after two clock cycles of latency, the DSP48
hums along doing something useful on every tick.

It's similar to a superscalar process with dependencies. So I have to
reschedule: I put (a0*b0+c0) into the pipe, then put in (a1*b1+c1) (which
has no dependency on what is in the pipe), and then when the result of
(a0*b0+c0) pops out I can feed it back into the DSP48 and multiply it with
d0 to get y0. In the meantime y1 pops out. Without this intermixed
scheduling I end up with too many dead cycles and then I need to use too
many DSP48s.
And depending on just what the bottlenecks in the design are, one can do all
kinds of things. But no matter what, you still need to interface *to* that
thing, no matter what it does and no matter how wide of an input vector it
takes (i.e. a0, b0, c0, d0, a1, b1, c1...if that's what it takes). In other
words, a0, b0, c0, d0, a1, b1 and c1 all need to get in somehow; y0 and y1
both need to make it out and you need to flag when they are valid and that
flagging is functionally the same thing as handshaking.

Kevin Jennings
 
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fvo47b$on52@cnn.xsj.xilinx.com...
KJ wrote:
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fvfm29$on81@cnn.xsj.xilinx.com...
KJ wrote:
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
news:fv7i38$69n6@cnn.xsj.xilinx.com...
My question: what is the cleanest way to describe an FSM requiring

Well, just the fact that you're time sharing the DSP48 means that you're
not processing something new on every clock cycle which just screams out
to me that you'd want to implement this with a request/acknowledge type
of framework. ...

Kevin Jennings
But I *do* have to process something on every cycle.
You're not able to process a new set of 'a', 'b', 'c' and 'd' on every clock
cycle since the DSP48 is time shared (by your choice) and that was my point.
Time multiplexing the DSP48 to keep *it* busy on every clock cycle is not
the same thing.

Consider that I have to process these two equations:

y0 <= (a0*b0+c0)*d0;
y1 <= (a1*b1+c1);

Now, if you look at the structure of the DSP48, you can see that I can't
even process these two sequentially. I can send off (a0*b0+c0)*d0 to the
black box thingy you speak of, but this can't be processed without dead
cycles: I have to get the result of (a0*b0+c0) before I multiply it with
d0, and if the DSP48 is fully pipelined, that means that the multiplier is
unused for three cycles.
That's only true if the addition can't be done combinatorially. If it can
then the calculation of 'y0' takes two clock cycles and the DSP48 is fully
utilized. The answer pops out after two clock cycles of latency, the DSP48
hums along doing something useful on every tick.

It's similar to a superscalar process with dependencies. So I have to
reschedule: I put (a0*b0+c0) into the pipe, then put in (a1*b1+c1) (which
has no dependency on what is in the pipe), and then when the result of
(a0*b0+c0) pops out I can feed it back into the DSP48 and multiply it with
d0 to get y0. In the meantime y1 pops out. Without this intermixed
scheduling I end up with too many dead cycles and then I need to use too
many DSP48s.
And depending on just what the bottlenecks in the design are, one can do all
kinds of things. But no matter what, you still need to interface *to* that
thing, no matter what it does and no matter how wide of an input vector it
takes (i.e. a0, b0, c0, d0, a1, b1, c1...if that's what it takes). In other
words, a0, b0, c0, d0, a1, b1 and c1 all need to get in somehow; y0 and y1
both need to make it out and you need to flag when they are valid and that
flagging is functionally the same thing as handshaking.

Kevin Jennings
 
"John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote in message
news:eek:1e424d2h2uldtu4qm4589v667lu96hip8@4ax.com...
On Wed, 7 May 2008 12:19:40 -0700 (PDT), John_H
newsgroup@johnhandwork.com> wrote:

John Larkin wrote:

To Lattice:

We dumped Lattice over buggy compilers and dinky performance. Now that
you're spamming our group, I'll make the ban permanent.


To the group:

Whenever anybody spams us, please

1. Blackball them as a vendor

2. Say bad things about their companies and products, preferably with
lots of google-searchable keywords.

John

Was this really necessary?

If there were technical webcasts from any of the big vendors, I'd like
to know about them though preferably more than 8 minutes beforehand.
If the posts of this nature got to be more than a couple a month from
any one source I'd agree with the spam catagorization but it isn't
that frequent.

I'm disappointed that you had problems with them in the past and won't
trust them for future designs because of your history; competition is
almost always good. But is it reason to be publicly vocal?

Kill-lists are easy to manage if bart's messages offend you.

- John_H


If we don't discourage commercial posts, newsgroups will be flooded
with them. I can't kill-file the tens of thousands of companies who
would spam newsgroups if they thought it would pay off. So let's make
sure it *doesn't* pay off.

If they want to advertise, let them pay for it somewhere else.


John
For what it's worth, I agree with John.

It's a real shame that we, now, have to go out of our way to filter
commercial and sexual posts. There are proper places for both of those.
Usenet is not one of them, in my opinion.

Bob
--
== NOTE: I automatically delete all Google Group posts due to uncontrolled
SPAM ==
 
"BobW" <nimby_NEEDSPAM@roadrunner.com> wrote in message
news:q9KdnYE_8uqU2r_VnZ2dnUVZ_hmtnZ2d@giganews.com...
"John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote in
message news:eek:1e424d2h2uldtu4qm4589v667lu96hip8@4ax.com...
On Wed, 7 May 2008 12:19:40 -0700 (PDT), John_H
newsgroup@johnhandwork.com> wrote:

John Larkin wrote:

To Lattice:

We dumped Lattice over buggy compilers and dinky performance. Now that
you're spamming our group, I'll make the ban permanent.


To the group:

Whenever anybody spams us, please

1. Blackball them as a vendor

2. Say bad things about their companies and products, preferably with
lots of google-searchable keywords.

John

Was this really necessary?

If there were technical webcasts from any of the big vendors, I'd like
to know about them though preferably more than 8 minutes beforehand.
If the posts of this nature got to be more than a couple a month from
any one source I'd agree with the spam catagorization but it isn't
that frequent.

I'm disappointed that you had problems with them in the past and won't
trust them for future designs because of your history; competition is
almost always good. But is it reason to be publicly vocal?

Kill-lists are easy to manage if bart's messages offend you.

- John_H


If we don't discourage commercial posts, newsgroups will be flooded
with them. I can't kill-file the tens of thousands of companies who
would spam newsgroups if they thought it would pay off. So let's make
sure it *doesn't* pay off.

If they want to advertise, let them pay for it somewhere else.


John


For what it's worth, I agree with John.

It's a real shame that we, now, have to go out of our way to filter
commercial and sexual posts. There are proper places for both of those.
Usenet is not one of them, in my opinion.
Come on guys, get over it, really.
The heading clearly had "ANNC:" and what it was about clearly stated, so the
OP did the right thing.
It only takes a split second to scan the header to see if you are
interested. If you aren't interested then you shouldn't have even opened it.
I'd consider this ON TOPIC and not spam as it was a one-off announcement to
the correct groups with the correct formatting.
Some people might very well be interested, this is a professional design
group with many FPGA designers afer all.

Dave.
 
"David L. Jones" <altzone@gmail.com> wrote in message
news:4822f3a7$1@dnews.tpgi.com.au...
"BobW" <nimby_NEEDSPAM@roadrunner.com> wrote in message
news:q9KdnYE_8uqU2r_VnZ2dnUVZ_hmtnZ2d@giganews.com...

"John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote in
message news:eek:1e424d2h2uldtu4qm4589v667lu96hip8@4ax.com...
On Wed, 7 May 2008 12:19:40 -0700 (PDT), John_H
newsgroup@johnhandwork.com> wrote:

John Larkin wrote:

To Lattice:

We dumped Lattice over buggy compilers and dinky performance. Now that
you're spamming our group, I'll make the ban permanent.


To the group:

Whenever anybody spams us, please

1. Blackball them as a vendor

2. Say bad things about their companies and products, preferably with
lots of google-searchable keywords.

John

Was this really necessary?

If there were technical webcasts from any of the big vendors, I'd like
to know about them though preferably more than 8 minutes beforehand.
If the posts of this nature got to be more than a couple a month from
any one source I'd agree with the spam catagorization but it isn't
that frequent.

I'm disappointed that you had problems with them in the past and won't
trust them for future designs because of your history; competition is
almost always good. But is it reason to be publicly vocal?

Kill-lists are easy to manage if bart's messages offend you.

- John_H


If we don't discourage commercial posts, newsgroups will be flooded
with them. I can't kill-file the tens of thousands of companies who
would spam newsgroups if they thought it would pay off. So let's make
sure it *doesn't* pay off.

If they want to advertise, let them pay for it somewhere else.


John


For what it's worth, I agree with John.

It's a real shame that we, now, have to go out of our way to filter
commercial and sexual posts. There are proper places for both of those.
Usenet is not one of them, in my opinion.

Come on guys, get over it, really.
The heading clearly had "ANNC:" and what it was about clearly stated, so
the OP did the right thing.
It only takes a split second to scan the header to see if you are
interested. If you aren't interested then you shouldn't have even opened
it.
I'd consider this ON TOPIC and not spam as it was a one-off announcement
to the correct groups with the correct formatting.
Some people might very well be interested, this is a professional design
group with many FPGA designers afer all.

Dave.
The message was crossposted to five newsgroups, not just one. Are the
people who say accept it in the same newsgroup as the one who say don't?
 
"David L. Jones" <altzone@gmail.com> wrote in message
news:4822f3a7$1@dnews.tpgi.com.au...
"BobW" <nimby_NEEDSPAM@roadrunner.com> wrote in message
news:q9KdnYE_8uqU2r_VnZ2dnUVZ_hmtnZ2d@giganews.com...

"John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote in
message news:eek:1e424d2h2uldtu4qm4589v667lu96hip8@4ax.com...
On Wed, 7 May 2008 12:19:40 -0700 (PDT), John_H
newsgroup@johnhandwork.com> wrote:

John Larkin wrote:

To Lattice:

We dumped Lattice over buggy compilers and dinky performance. Now that
you're spamming our group, I'll make the ban permanent.


To the group:

Whenever anybody spams us, please

1. Blackball them as a vendor

2. Say bad things about their companies and products, preferably with
lots of google-searchable keywords.

John

Was this really necessary?

If there were technical webcasts from any of the big vendors, I'd like
to know about them though preferably more than 8 minutes beforehand.
If the posts of this nature got to be more than a couple a month from
any one source I'd agree with the spam catagorization but it isn't
that frequent.

I'm disappointed that you had problems with them in the past and won't
trust them for future designs because of your history; competition is
almost always good. But is it reason to be publicly vocal?

Kill-lists are easy to manage if bart's messages offend you.

- John_H


If we don't discourage commercial posts, newsgroups will be flooded
with them. I can't kill-file the tens of thousands of companies who
would spam newsgroups if they thought it would pay off. So let's make
sure it *doesn't* pay off.

If they want to advertise, let them pay for it somewhere else.


John


For what it's worth, I agree with John.

It's a real shame that we, now, have to go out of our way to filter
commercial and sexual posts. There are proper places for both of those.
Usenet is not one of them, in my opinion.

Come on guys, get over it, really.
The heading clearly had "ANNC:" and what it was about clearly stated, so
the OP did the right thing.
It only takes a split second to scan the header to see if you are
interested. If you aren't interested then you shouldn't have even opened
it.
I'd consider this ON TOPIC and not spam as it was a one-off announcement
to the correct groups with the correct formatting.
Some people might very well be interested, this is a professional design
group with many FPGA designers afer all.

Dave.
The message was crossposted to five newsgroups, not just one. Are the
people who say accept it in the same newsgroup as the one who say don't?
 
"CBFalconer" <cbfalconer@yahoo.com> wrote in message
news:4823ACED.97C41D24@yahoo.com...
John Larkin wrote:
CBFalconer <cbfalconer@yahoo.com> wrote:

... snip ...

Please snip the quotes on your replies.

Feel free to snip whatever you like.

The point of that request is to avoid burdoning all group users
with the burdon of paging down over irrelevant material, and to
reduce the overall load on the Usenet system.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
I get it, Chuck. It's okay to overload Usenet with spam but it's not okay to
overload Usenet by not trimming one's replies.

You should run for president. Next week, perhaps, you'll explain how to
reconcile the Arabs and the Jews.

Bob
--
== NOTE: I automatically delete all Google Group posts due to uncontrolled
SPAM ==
 
John_H wrote:
googler wrote:
I am trying to design an asynchronous FIFO that works with two clock
domains clk_A and clk_B. Writes to the FIFO happen on clk_A and reads
happen on clk_B. clk_A is faster than clk_B. I thought about using
gray-coded write and read pointers that is usually recommended for
designing async fifos, but the problem is that in this case the depth
of the FIFO is 688, which is not a power of 2. So it seems I cannot
use the Gray pointer technique. Is that right?
snip
Gray counters are often easier to design as binary counters with a
binary-to-gray conversion such that you make the increment from binary
343 to binary -344 and let the Gray conversion do its stuff.
But make sure that the Gray value comes directly out of a flop before it
crosses into the other clock domain. Don't put the conversion logic
in-between the clock domains.

John

--
John Penton, posting as an individual unless specifically indicated
otherwise.
 
googler wrote:
I am trying to design an asynchronous FIFO that works with two clock
domains clk_A and clk_B. Writes to the FIFO happen on clk_A and reads
happen on clk_B. clk_A is faster than clk_B. I thought about using
gray-coded write and read pointers that is usually recommended for
designing async fifos, but the problem is that in this case the depth
of the FIFO is 688, which is not a power of 2. So it seems I cannot
use the Gray pointer technique. Is that right?

Can this be done using binary pointers instead of using Gray pointers?
Maybe we can use handshaking technique, exchanging 'ready' and
'acknowledge' between the two sides whenever the write/read pointer is
to be sent across clock domain. So suppose we want to send the write
pointer from clk_A domain to clk_B domain. When write pointer in clk_A
domain changes, it generates a 'ready' that goes to clk_B domain. This
'ready' is synchronized in clk_B domain and then clk_B reads the value
of the write pointer. Then clk_B domain sends 'acknowledge' back to
clk_A domain (where it is synchronized). Only after clk_A domain gets
this 'acknowledge' back, it can change the write pointer value. Is
this concept correct? If yes, then I guess there is still a problem.
In the above example, the write side is on faster clock. If the write
pointer cannot change until write side gets back 'acknowledge', then
we probabaly need to stall writes to the FIFO, correct?

Please advise how this can be done. Thanks.
Others have commented that a Gray scheme can be made to work. Your scheme
above is also reasonable (if rather high latency). Of course you need a
ready and ack for both the pointers. FIFOs almost always need a method to
stall the input side - unless you can guarentee (by analysis) that it can
never fill up. However, to avoid the excessive delay that you envisage you
could have two sets of registers for each pointer. A current state -
recording how many items have been written or read, and then a state for
transmission - which captures the current state and holds it stable while
the handshake is ongoing.

A final thought I have is that 688 entries is pretty big. In particular, I
think it requires a 688 input mux (or muxes) to read out the data. I
suppose that is only 11 gates deep, but if the FIFO is particularly wide, it
will also involve a lot of routing (and of course, it is possible to break
timing across an asynchronous interface). I would be thinking about a
two-stage system, with a large synchronous FIFO on the front (possibly built
with a RAM if the data is wide) and a small asynchronous FIFO on the back.
I guess it all depends on the data characteristics and the design
requirements.

John

--
John Penton, posting as an individual unless specifically indicated
otherwise.
 
googler wrote:
I am trying to design an asynchronous FIFO that works with two clock
domains clk_A and clk_B. Writes to the FIFO happen on clk_A and reads
happen on clk_B. clk_A is faster than clk_B. I thought about using
gray-coded write and read pointers that is usually recommended for
designing async fifos, but the problem is that in this case the depth
of the FIFO is 688, which is not a power of 2. So it seems I cannot
use the Gray pointer technique. Is that right?

Can this be done using binary pointers instead of using Gray pointers?
Maybe we can use handshaking technique, exchanging 'ready' and
'acknowledge' between the two sides whenever the write/read pointer is
to be sent across clock domain. So suppose we want to send the write
pointer from clk_A domain to clk_B domain. When write pointer in clk_A
domain changes, it generates a 'ready' that goes to clk_B domain. This
'ready' is synchronized in clk_B domain and then clk_B reads the value
of the write pointer. Then clk_B domain sends 'acknowledge' back to
clk_A domain (where it is synchronized). Only after clk_A domain gets
this 'acknowledge' back, it can change the write pointer value. Is
this concept correct? If yes, then I guess there is still a problem.
In the above example, the write side is on faster clock. If the write
pointer cannot change until write side gets back 'acknowledge', then
we probabaly need to stall writes to the FIFO, correct?

Please advise how this can be done. Thanks.
Others have commented that a Gray scheme can be made to work. Your scheme
above is also reasonable (if rather high latency). Of course you need a
ready and ack for both the pointers. FIFOs almost always need a method to
stall the input side - unless you can guarentee (by analysis) that it can
never fill up. However, to avoid the excessive delay that you envisage you
could have two sets of registers for each pointer. A current state -
recording how many items have been written or read, and then a state for
transmission - which captures the current state and holds it stable while
the handshake is ongoing.

A final thought I have is that 688 entries is pretty big. In particular, I
think it requires a 688 input mux (or muxes) to read out the data. I
suppose that is only 11 gates deep, but if the FIFO is particularly wide, it
will also involve a lot of routing (and of course, it is possible to break
timing across an asynchronous interface). I would be thinking about a
two-stage system, with a large synchronous FIFO on the front (possibly built
with a RAM if the data is wide) and a small asynchronous FIFO on the back.
I guess it all depends on the data characteristics and the design
requirements.

John

--
John Penton, posting as an individual unless specifically indicated
otherwise.
 
SweetMusic <adi.cojo@gmail.com> wrote:
solved thx^^
How?
--
Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
 

Welcome to EDABoard.com

Sponsor

Back
Top