FDE vs latch?

M

MikeWhy

Guest
What's the big deal about a latch? Is it less efficient in floorspace than
an FDE? Or is it just some amount of combinatorial concerns? For example:

tsc_start <= tsc when sof_in_n = '0' and rising_edge(clk);
versus
tsc_start <= tsc when sof_in_n = '0';

What concerns are there with crossing clock domains with either? For
example, a 10 ns tick clock would be more directly meaningful than, say, a
125 MHz clock in this application. (The logic reading the value runs on a
different clock than the logic updating the value.)

Also, I had apparently missed the explanation of the semantic difference
between 'to' and 'downto'. Does simple intuition apply here, that with 'to',
low indexes reference the more significant bits or words?

Last, does the following infer a multiplier and adder in synthesis? I
wouldn't expect to see one, and didn't see one in the synthesis log or RTL
schematic, but the whole thing got really messy with simply adding the very
wide registers.

function w_tsc(val : std_logic_vector; i : natural) return
std_logic_vector is
variable lo : natural := i * 8;
variable hi : natural := lo + 7;
begin
return val(hi downto lo);
end function;

begin -- arch
with tx_state select
dout <= ....
w_tsc(tsc_start, 7) when SEND_TSC_START,
w_tsc(tsc_start, 6) when SEND_TSC_START_1,
w_tsc(tsc_start, 5) when SEND_TSC_START_2,
w_tsc(tsc_start, 4) when SEND_TSC_START_3,
w_tsc(tsc_start, 3) when SEND_TSC_START_4,
w_tsc(tsc_start, 2) when SEND_TSC_START_5,
...

Truly and finally last, is there a good way to generate the above, in the
midst of a selected assignment having other, unrelated states?

Thanks.
 
What's the big deal about a latch?
It's usually harder to meet Static Timing Analysis with latches.
They are as troublesome as a very troublesome thing.
If you think you need to use one, think again.


---------------------------------------
Posted through http://www.FPGARelated.com
 
"RCIngham" <robert.ingham@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote in
message news:kt-dnRe2Lb2cTi7SnZ2dnUVZ_rydnZ2d@giganews.com...
What's the big deal about a latch?


It's usually harder to meet Static Timing Analysis with latches.
They are as troublesome as a very troublesome thing.
If you think you need to use one, think again.
In this particular case, it's grabbing a rather large 100 ns tick count, to
inject into a message as a timestamp later. Timing isn't an issue. The only
concern was relative space efficiency. There seems no difference between a
latch and FDE, eating up a slice every 2 bits.

I "solved" the other problem of serializing the multi-word timestamps by
sending 8-bit counts after the first full timestamp. The question remains,
though, how to efficiently byte-serialize the large data word. I hate to
hand write a state machine to do this, for every word size I might
encounter. Shifting the latched value seems unnecessarily painful.

Rotating a single hot bit in a surrogate shadow word, one bit representing a
data byte, seems so far the best solution, but I can't dream up a way to
generate the required statements to go with it. Rotating the surrogate mask
effectively makes for a cheap, easy one-hot FSM, but I don't have the
language skill to generate the corresponding:

with vmask select
dout <= w(47 downto 40) when b"100000",
dout <= w(39 downto 32) when b"010000",
dout <= w(32 downto 24) when b"001000",
...

Writing one for each N-sized word is tedious and error prone. Any help or
ideas?
 
"Andy" <jonesandy@comcast.net> wrote in message
news:2afbb4b2-b865-4f40-b706-4cd60365535d@t35g2000yqd.googlegroups.com...
WRT your function. If the i argument is known at synthesis time (this
includes the index of a for-loop), then no hardware is synthesized at
all, just wires. Otherwise, it will just be multiplexers, with no
adder or multiplier (multiply/divide by power of two is just a shift,
and the addition of 7 to a number that is already known to have zeroes
in the least three bits does not take an adder, just a lut that always
outputs 1, which may be shared with others, and some more wires).
Doh! A for loop does seem the obvious answer. Does that synthesize in a
clocked process?

Thanks.
 
On Wed, 16 May 2012 15:16:32 -0500
"MikeWhy" <boat042-nospam@yahoo.com> wrote:

"Andy" <jonesandy@comcast.net> wrote in message
news:2afbb4b2-b865-4f40-b706-4cd60365535d@t35g2000yqd.googlegroups.com...
WRT your function. If the i argument is known at synthesis time (this
includes the index of a for-loop), then no hardware is synthesized at
all, just wires. Otherwise, it will just be multiplexers, with no
adder or multiplier (multiply/divide by power of two is just a shift,
and the addition of 7 to a number that is already known to have zeroes
in the least three bits does not take an adder, just a lut that always
outputs 1, which may be shared with others, and some more wires).

Doh! A for loop does seem the obvious answer. Does that synthesize in a
clocked process?

Thanks.
It does if it's synthesizable.

More specifically, if the loop can be unrolled into some mess of
combinational logic, then that logic can be placed in front of a
register, and the for loop can be synthesized. If there is no
combinational logic that would generate that function (or if there is,
but it's enormous, e.g. a divider) then you can't synthesize it.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
 
WRT latches: With FDE, all timing paths start and/or end at the FDE.
With a latch, they don't (e.g. when the latch is transparent). This
makes for many more false paths, etc that cause STA to be very
conservative, to the point that you often cannot meet timing unless
you use a lot of false path constraints. The problem with those is
that they (the constraints) are inherently very difficult to verify
(that they are correctly stated and applied) except by inspection and
manual analysis.

Also some (most) FPGA architectures do not have a latch primitive, but
use a macro built from one or more LUTs. These circuits are
notoriously glitchy in the presence of two or more inputs changing
simultaneously, especially likely since upstream logic is often merged
into the same LUT. Unless very carefully designed around, this can
cause latches to unlatch even though the enable did not change.

Short version: it hurts when you do that.

WRT your function. If the i argument is known at synthesis time (this
includes the index of a for-loop), then no hardware is synthesized at
all, just wires. Otherwise, it will just be multiplexers, with no
adder or multiplier (multiply/divide by power of two is just a shift,
and the addition of 7 to a number that is already known to have zeroes
in the least three bits does not take an adder, just a lut that always
outputs 1, which may be shared with others, and some more wires).

I do not understand your last question.


Andy
 
"Rob Gaddi" <rgaddi@technologyhighland.invalid> wrote in message
news:20120516132644.4ee18da4@rg.highlandtechnology.com...
On Wed, 16 May 2012 15:16:32 -0500
"MikeWhy" <boat042-nospam@yahoo.com> wrote:

"Andy" <jonesandy@comcast.net> wrote in message
news:2afbb4b2-b865-4f40-b706-4cd60365535d@t35g2000yqd.googlegroups.com...
WRT your function. If the i argument is known at synthesis time (this
includes the index of a for-loop), then no hardware is synthesized at
all, just wires. Otherwise, it will just be multiplexers, with no
adder or multiplier (multiply/divide by power of two is just a shift,
and the addition of 7 to a number that is already known to have zeroes
in the least three bits does not take an adder, just a lut that always
outputs 1, which may be shared with others, and some more wires).

Doh! A for loop does seem the obvious answer. Does that synthesize in a
clocked process?

Thanks.


It does if it's synthesizable.

More specifically, if the loop can be unrolled into some mess of
combinational logic, then that logic can be placed in front of a
register, and the for loop can be synthesized. If there is no
combinational logic that would generate that function (or if there is,
but it's enormous, e.g. a divider) then you can't synthesize it.
A counter and an array of bytes, rather than a shift mask and whatever, was
the trick. It synthesized to a counter and a big mux. Thanks for the help.

do_vcount : process (clk)
...

v_out <= word_bytes(vcount);

init_vword : for iword in 0 to NWORDS-1 generate
word_bytes(iword) <= vword(word, iword);
end generate;
 

Welcome to EDABoard.com

Sponsor

Back
Top