FIFO timing, the right way

P

Piotr Wyderski

Guest
Hi all,

I am working on a block that needs to accumulate (at least) K data items
and then consume them in a burst, while the next group of items might be
flowing in. As the items are not consumed sequentially, a very efficient
approach is to have a FIFO interface on the write side and a limited
lookahead random access interface one on the read side. The read side
works OK. The hard part turned out to be the FIFO full flag setting.
I would like to use the full capacity of the FIFO and stall the data
writer at the correct moment. Say I want to copy the content of a very
big ROM into a 4-elements deep FIFO, the basic logic would be as follows:

writer: process(clock) is begin

if (rising_edge(clock)) then

fifo_write_request <= '0';

if (fifo_full = '0') then

fifo_write_request <= '1';
rom_address <= rom_address + 1;

end if;

end if;

end process;

In the FIFO interface part I have a 3-bit counter SIZE that calculates
the number of items in the FIFO. I update it under the same "if
(rising_edge(clock))" condition to follow the instantaneous
fifo_write_request & fifo_read_request constellation ("00" => +0, "10"
=> +1, "01" => -1, "11" => +0).

And here comes the problem: simply setting fifo_full <= (SIZE = 4)
adds a cycle of latency and the writer thinks it is allowed to write one
more item, corrupting the data. To add insult to injury, SIZE becomes 5
and fifo_full is de-aserted, so the writer pumps in even more data.
Changing it to fifo_full <= (SIZE = 3) behaves equally wrong, just moves
the problem one cycle to the left.

If I move the following into the concurrent part:

fifo_full <= (SIZE = 4) or ((SIZE = 3) and (fifo_write_request = '1'))

(to account for a possible ongoing write), everything works great,
because the condition is very logical (no pun intended). But I think
this solution is lame due to its purely combinatorial nature. So, could
you please tell me either:

a) that it is not lame, if it works, it works and and my concerns are
based on a superstition.

b) or the correct way to setup the flag synchronously.

I don't care if the read side starts as soon as possible (including the
ongoing write) or one cycle later, so it can be excluded from the
write-side logic. This is to drive a string of WS2812 LEDs.

Best regards, Piotr
 
Hello

On 22/04/2019 08:42, Piotr Wyderski wrote:
[...]
In the FIFO interface part I have a 3-bit counter SIZE that calculates
the number of items in the FIFO. I update it under the same "if
(rising_edge(clock))" condition to follow the instantaneous
fifo_write_request & fifo_read_request constellation ("00" => +0, "10"
=> +1, "01" => -1, "11" => +0).

And here comes the problem: simply setting fifo_full <= (SIZE = 4)
adds a cycle of latency and the writer thinks it is allowed to write one
more item, corrupting the data. To add insult to injury, SIZE becomes 5
and fifo_full is de-aserted, so the writer pumps in even more data.
Changing it to fifo_full <= (SIZE = 3) behaves equally wrong, just moves
the problem one cycle to the left.

If I move the following into the concurrent part:

  fifo_full <= (SIZE = 4) or ((SIZE = 3) and (fifo_write_request = '1'))

Either you set the flag at the same time that the counter reaches its
max value in a sequential process :

fifo_full <= ((SIZE = 3) and (fifo_write_request = '1'));

which has a one cycle latency on the flag test side

Or you do it combinatorially like you did, because sometimes there is no
other way (and anyway, however you write your logic, it always ends up
with a bunch of combinatorial stuff followed by a flip-flop)
Instead of comparing SIZE = 4, you could use SIZE > 3 which takes care
of the case where SIZE goes beyond 4 and is very simple in terms of
logic because it only needs the most significant bit of SIZE.

BTW, note that in the expression above, fifo_full is of boolean type,
not std_logic)

Nicolas
 
Hi Nicholas,

Nicolas Matringe wrote:

Either you set the flag at the same time that the counter reaches its
max value in a sequential process :

fifo_full <= ((SIZE = 3) and (fifo_write_request = '1'));

which has a one cycle latency on the flag test side

It is deceptively simple, but unfortunately it doesn't work.
This is a write-only (no dequeue ops) simulation for a 4-entry FIFO:

https://i.postimg.cc/MHvqvmmt/sim1.png

All values are signals, i.e. the delay is included. It shows two bugs:
5 items have been accepted and no permanent stall is produced.

Or you do it combinatorially like you did, because sometimes there is no
other way (and anyway, however you write your logic, it always ends up
with a bunch of combinatorial stuff followed by a flip-flop)

The combinatorial solution works, but for some unknown reason I am not
particularly fond of it and wanted to consult it with experts here.

Instead of comparing SIZE = 4, you could use SIZE > 3 which takes care
of the case where SIZE goes beyond 4 and is very simple in terms of
logic because it only needs the most significant bit of SIZE.

SIZE > 4 means the disaster has already happened.

BTW, note that in the expression above, fifo_full is of boolean type,
not std_logic)

Sure, I just wanted to show the general idea, not to introduce the
otherwise required strong typing as an extra layer of obfuscation.

Best regards, Piotr
 
On 22/04/2019 12:29, Piotr Wyderski wrote:
Hi Nicholas,

Nicolas Matringe wrote:

Either you set the flag at the same time that the counter reaches its
max value in a sequential process :

fifo_full <= ((SIZE = 3) and (fifo_write_request = '1'));

which has a one cycle latency on the flag test side

It is deceptively simple, but unfortunately it doesn't work.
This is a write-only (no dequeue ops) simulation for a 4-entry FIFO:

https://i.postimg.cc/MHvqvmmt/sim1.png

You could generate an additional fifo_almost_full signal that is set
when SIZE reaches 3 and use it like this, along with fifo_full and
fifo_write_request :

....
if (fifo_full = 0) and ((fifo_almost_full and fifo_write_request) =
0) then
fifo_write_request <= '1';
rom_address <= rom_address + 1;
end if;
....


Instead of comparing SIZE = 4, you could use SIZE > 3 which takes care
of the case where SIZE goes beyond 4 and is very simple in terms of
logic because it only needs the most significant bit of SIZE.

SIZE > 4 means the disaster has already happened.

Inded but what about SIZE > 3, as I stated ? It gives you simple
comparison logic and disaster mitigating (stalling whatever the value is
once above the limit)

In general when designing these sorts of things I find it helps a huge
lot to draw the timing waveforms by hand before starting to code.

Nicolas
 
On Monday, April 22, 2019 at 2:42:29 AM UTC-4, Piotr Wyderski wrote:
Hi all,

I am working on a block that needs to accumulate (at least) K data items
and then consume them in a burst, while the next group of items might be
flowing in. As the items are not consumed sequentially, a very efficient
approach is to have a FIFO interface on the write side and a limited
lookahead random access interface one on the read side. The read side
works OK. The hard part turned out to be the FIFO full flag setting.
I would like to use the full capacity of the FIFO and stall the data
writer at the correct moment. Say I want to copy the content of a very
big ROM into a 4-elements deep FIFO, the basic logic would be as follows:

writer: process(clock) is begin

if (rising_edge(clock)) then

fifo_write_request <= '0';

if (fifo_full = '0') then

fifo_write_request <= '1';
rom_address <= rom_address + 1;

end if;

end if;

end process;

In the FIFO interface part I have a 3-bit counter SIZE that calculates
the number of items in the FIFO. I update it under the same "if
(rising_edge(clock))" condition to follow the instantaneous
fifo_write_request & fifo_read_request constellation ("00" => +0, "10"
=> +1, "01" => -1, "11" => +0).

And here comes the problem: simply setting fifo_full <= (SIZE = 4)
adds a cycle of latency and the writer thinks it is allowed to write one
more item, corrupting the data. To add insult to injury, SIZE becomes 5
and fifo_full is de-aserted, so the writer pumps in even more data.
Changing it to fifo_full <= (SIZE = 3) behaves equally wrong, just moves
the problem one cycle to the left.

If I move the following into the concurrent part:

fifo_full <= (SIZE = 4) or ((SIZE = 3) and (fifo_write_request = '1'))

(to account for a possible ongoing write), everything works great,
because the condition is very logical (no pun intended). But I think
this solution is lame due to its purely combinatorial nature. So, could
you please tell me either:

a) that it is not lame, if it works, it works and and my concerns are
based on a superstition.

b) or the correct way to setup the flag synchronously.

I don't care if the read side starts as soon as possible (including the
ongoing write) or one cycle later, so it can be excluded from the
write-side logic. This is to drive a string of WS2812 LEDs.

Best regards, Piotr

The usual procedure is to set full and empty inside a clocked process. I'm assuming that both read and write are clocked by a single clock. If read and write are in different clock domains then you need a dual clock fifo which would go beyond the scope of what you're asking for here.

Normally you want FIFO flag outputs to be clocked since that will maximize the amount of time available for the external logic that will use those flags. If you use 'just logic' to generate the flags then you run the risk of that FIFO flag needlessly becoming part of the critical timing path. In order to set the full flag synchronously, you simply set it when there are 'n-1' things in the FIFO and there is a write request and no read request. VHDL sample code would be:

if (Reset = '1') then
WrUsedw <= 0;
Full <= '0';
elsif (Write = '1') then
if (Read = '0') then
if (WrUsedw >= FIFO_DEPTH - 1) then
Full <= '1';
end if;
WrUsedw <= WrUsedw + 1;
end if;
elsif (Read = '1') then
Full <= '0';
WrUsedw <= WrUsedw - 1;
end if;
end if;

Kevin Jennings
 

Welcome to EDABoard.com

Sponsor

Back
Top