accumulator (again)

J

jmariano

Guest
Dear All,

I'm not an expert in VHDL, i'm just a curious trying to solve a
research problem with an FPGA.

I'm using a 32 bit accumulator in a IP, as part of a SoC project with
a microblaze, implemented in a Digilent Spartan-3 SKB ( the FPGA is a
Xilinx XC3S200). The code is included at the end of this message. The
input is a 32 bit signed integer coded in two's complement and the
output also a 32 bit signed integer. What I would like the accumulator
to do is to accumulate synchronously with the rising edge of clk when
enb=1 and maintain the result stable at the output when enb=0 ( enb is
a asynchronous signal generated elsewhere in the system)

But it does not work in this way, it behaves in a strange manner...

Some times I get the expected results but often I get strange values
(large when they should be small, often negative instead of positive,
etc.). If I look at the binary representation of the output, it looks
like if the output din't had time to sum and propagate to the output
again. In fact, the post place and route simulation shows that when
the enb signal goes to 0, the output stays in a undetermined condition
(you know, red line with XXXX).

I'm guessing I'm doing a very basic mistake that as something to do
with the timing of the enb signal, but after 3 days banging my had to
the wall, all I have is a a monumental headache.

Can some kind soul help me with this?

jmariano



================

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity int_accum is
port (clk:in std_logic;
clr:in std_logic;
enb:in std_logic;
d: in std_logic_vector(31 downto 0);
ovf:eek:ut std_logic; -- overflow
q: out std_logic_vector(31 downto 0));
end int_accum;

architecture archi of int_accum is

signal tmp : signed(32 downto 0);

begin

process(clk, clr)
begin
if (clr = '1') then
tmp <= (others => '0');
elsif (rising_edge (clk)) then
if (enb = '1') then
-- The result of the adder will be on 33 bits to keep the carry
tmp <= tmp + signed ('0'& d);
end if;
end if;
end process;

-- The carry is extracted from the most significant bit of the
result
ovf <= tmp(32);

-- The q output is the 32 least significant bits of sum
q <= std_logic_vector (tmp(31 downto 0));

end archi;
 
On Jul 2, 4:20 pm, jmariano <jmarian...@gmail.com> wrote:
Dear All,

I'm not an expert in VHDL, i'm just a curious trying to solve a
research problem with an FPGA.

I'm using a 32 bit accumulator in a IP, as part of a SoC project with
a microblaze, implemented in a Digilent Spartan-3 SKB ( the FPGA is a
Xilinx XC3S200). The code is included at the end of this message.  The
input is a 32 bit signed integer coded in two's complement and the
output also a 32 bit signed integer. What I would like the accumulator
to do is to accumulate synchronously with the rising edge of clk when
enb=1 and maintain the result stable at the output when enb=0 ( enb is
a asynchronous signal generated elsewhere in the system)

But it does not work in this way, it behaves in a strange manner...

Some times I get the expected results but often I get strange values
(large when they should be small, often negative instead of positive,
etc.). If I look at the binary representation of the output, it looks
like if the output din't had time to sum and propagate to the output
again. In fact, the post place and route simulation shows that when
the enb signal goes to 0, the output stays in a undetermined condition
(you know, red line with XXXX).

I'm guessing I'm doing a very basic mistake that as something to do
with the timing of the enb signal, but after 3 days banging my had to
the wall, all I have is a a monumental headache.

Can some kind soul help me with this?

jmariano

===============
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity int_accum is
  port  (clk:in  std_logic;
         clr:in  std_logic;
         enb:in  std_logic;
         d:  in  std_logic_vector(31 downto 0);
         ovf:eek:ut std_logic;      -- overflow
         q:  out std_logic_vector(31 downto 0));
end int_accum;

architecture archi of int_accum is

  signal tmp : signed(32 downto 0);

  begin

  process(clk, clr)
  begin
        if (clr = '1') then
           tmp <= (others => '0');
   elsif (rising_edge (clk)) then
        if (enb = '1') then
                -- The result of the adder will be on 33 bits to keep the carry
                tmp <= tmp + signed ('0'& d);
    end if;
   end if;
  end process;

  -- The carry is extracted from the most significant bit of the
result
  ovf <= tmp(32);

  -- The q output is the 32 least significant bits of sum
  q <= std_logic_vector (tmp(31 downto 0));

end archi;
This is the key to your problem:

enb is a asynchronous signal generated elsewhere in the system
You can't expect to take an asynchronous signal into multiple (32 in
this case) registers in a synchronous domain and expect that it will
work reliably. You need to first synchronize the asynchronous input
to the synchronous clock domain before you can use it.

Ed McGettigan
--
Xilinx Inc.
 
On Mon, 02 Jul 2012 17:19:59 -0700, Ed McGettigan wrote:

On Jul 2, 4:20 pm, jmariano <jmarian...@gmail.com> wrote:
Dear All,

I'm not an expert in VHDL, i'm just a curious trying to solve a
research problem with an FPGA.

I'm using a 32 bit accumulator in a IP, as part of a SoC project with a
microblaze, implemented in a Digilent Spartan-3 SKB ( the FPGA is a
Xilinx XC3S200). The code is included at the end of this message.  The
input is a 32 bit signed integer coded in two's complement and the
output also a 32 bit signed integer. What I would like the accumulator
to do is to accumulate synchronously with the rising edge of clk when
enb=1 and maintain the result stable at the output when enb=0 ( enb is
a asynchronous signal generated elsewhere in the system)

But it does not work in this way, it behaves in a strange manner...

Some times I get the expected results but often I get strange values
(large when they should be small, often negative instead of positive,
etc.). If I look at the binary representation of the output, it looks
like if the output din't had time to sum and propagate to the output
again. In fact, the post place and route simulation shows that when the
enb signal goes to 0, the output stays in a undetermined condition (you
know, red line with XXXX).

I'm guessing I'm doing a very basic mistake that as something to do
with the timing of the enb signal, but after 3 days banging my had to
the wall, all I have is a a monumental headache.

Can some kind soul help me with this?

jmariano

================

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity int_accum is
  port  (clk:in  std_logic;
         clr:in  std_logic;
         enb:in  std_logic;
         d:  in  std_logic_vector(31 downto 0);
         ovf:eek:ut std_logic;      -- overflow q:  out
         std_logic_vector(31 downto 0));
end int_accum;

architecture archi of int_accum is

  signal tmp : signed(32 downto 0);

  begin

  process(clk, clr)
  begin
        if (clr = '1') then
           tmp <= (others => '0');
   elsif (rising_edge (clk)) then
        if (enb = '1') then
                -- The result of the adder will be on 33 bits
                to keep the carry tmp <= tmp + signed ('0'& d);
    end if;
   end if;
  end process;

  -- The carry is extracted from the most significant bit of the
result
  ovf <= tmp(32);

  -- The q output is the 32 least significant bits of sum q <=
  std_logic_vector (tmp(31 downto 0));

end archi;

This is the key to your problem:

enb is a asynchronous signal generated elsewhere in the system

You can't expect to take an asynchronous signal into multiple (32 in
this case) registers in a synchronous domain and expect that it will
work reliably. You need to first synchronize the asynchronous input to
the synchronous clock domain before you can use it.
Which means that you should latch enb in a register, with the same clock
that you're using to twiddle your accumulator, and use the output of that
register as your enable signal.

Paranoid logic designers will have a string of two or three registers to
avoid metastability, but I've been told that's not necessary. (I'm not
much of a logic designer).

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com
 
On Mon, 02 Jul 2012 16:20:52 -0700, jmariano wrote:

Dear All,

I'm not an expert in VHDL, i'm just a curious trying to solve a research
problem with an FPGA.

I'm using a 32 bit accumulator in a IP,
.... The
input is a 32 bit signed integer coded in two's complement and the
output also a 32 bit signed integer.

But it does not work in this way, it behaves in a strange manner...
You have one likely answer from Ed and Tim : unless you KNOW that the
input signals "enb" and "d" are already synchronous with "clk" you MUST
synchronise them.

But there is another problem:

tmp <= tmp + signed ('0'& d);

This is NOT how to add a leading bit to d.
It will convert a small negative d to a very large positive value!


Instead you must replicate d's sign bit (MSB) into the leading bit.
tmp <= tmp + signed (d(d'high) & d);
(Or look for "resize" functions in numeric_std to do this for you).

This is far more likely to be the problem, especially if you are
detecting these errors at behavioural simulation (as you should be)

Incidentally, unless this is the top level of your design, I would
consider making the D and Q ports signed. Apart from keeping the type
conversions to a minimum, this means the external view of the design (the
entity specification) better reflects (or documents) what the design
does; preventing surprises when someone re-uses it with unsigned data...

- Brian
 
On Monday, July 2, 2012 10:24:02 PM UTC-7, Tim Wescott wrote:
On Mon, 02 Jul 2012 17:19:59 -0700, Ed McGettigan wrote:

On Jul 2, 4:20 pm, jmariano <jmarian...@gmail.com> wrote:
Dear All,

I'm not an expert in VHDL, i'm just a curious trying to solve a
research problem with an FPGA.

I'm using a 32 bit accumulator in a IP, as part of a SoC project with a
microblaze, implemented in a Digilent Spartan-3 SKB ( the FPGA is a
Xilinx XC3S200). The code is included at the end of this message.  The
input is a 32 bit signed integer coded in two's complement and the
output also a 32 bit signed integer. What I would like the accumulator
to do is to accumulate synchronously with the rising edge of clk when
enb=1 and maintain the result stable at the output when enb=0 ( enb is
a asynchronous signal generated elsewhere in the system)

But it does not work in this way, it behaves in a strange manner...

Some times I get the expected results but often I get strange values
(large when they should be small, often negative instead of positive,
etc.). If I look at the binary representation of the output, it looks
like if the output din't had time to sum and propagate to the output
again. In fact, the post place and route simulation shows that when the
enb signal goes to 0, the output stays in a undetermined condition (you
know, red line with XXXX).

I'm guessing I'm doing a very basic mistake that as something to do
with the timing of the enb signal, but after 3 days banging my had to
the wall, all I have is a a monumental headache.

Can some kind soul help me with this?

jmariano

===============
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity int_accum is
  port  (clk:in  std_logic;
         clr:in  std_logic;
         enb:in  std_logic;
         d:  in  std_logic_vector(31 downto 0);
         ovf:eek:ut std_logic;      -- overflow q:  out
         std_logic_vector(31 downto 0));
end int_accum;

architecture archi of int_accum is

  signal tmp : signed(32 downto 0);

  begin

  process(clk, clr)
  begin
        if (clr = '1') then
           tmp <= (others => '0');
   elsif (rising_edge (clk)) then
        if (enb = '1') then
                -- The result of the adder will be on 33 bits
                to keep the carry tmp <= tmp + signed ('0'& d);
    end if;
   end if;
  end process;

  -- The carry is extracted from the most significant bit of the
result
  ovf <= tmp(32);

  -- The q output is the 32 least significant bits of sum q <> >>   std_logic_vector (tmp(31 downto 0));

end archi;

This is the key to your problem:

enb is a asynchronous signal generated elsewhere in the system

You can't expect to take an asynchronous signal into multiple (32 in
this case) registers in a synchronous domain and expect that it will
work reliably. You need to first synchronize the asynchronous input to
the synchronous clock domain before you can use it.

Which means that you should latch enb in a register, with the same clock
that you're using to twiddle your accumulator, and use the output of that
register as your enable signal.

Paranoid logic designers will have a string of two or three registers to
avoid metastability, but I've been told that's not necessary. (I'm not
much of a logic designer).

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com
It isn't just the paranoid logic designer, it should be every logic designer.

A single register only partially solves the problem of an asynchronous input with multiple register destinations, but it does not solve the very real metastability problem. At least two registers should be used to ensure that the metastability condition has resolved and with increasing clock frequency and finer process nodes using three or more stages may be necessary.

Ed McGettigan
--
Xilinx Inc.
 
On Jul 3, 5:45 pm, Ed McGettigan <ed.mcgetti...@xilinx.com> wrote:
On Monday, July 2, 2012 10:24:02 PM UTC-7, Tim Wescott wrote:
On Mon, 02 Jul 2012 17:19:59 -0700, Ed McGettigan wrote:

On Jul 2, 4:20 pm, jmariano <jmarian...@gmail.com> wrote:
Dear All,

I'm not an expert in VHDL, i'm just a curious trying to solve a
research problem with an FPGA.

I'm using a 32 bit accumulator in a IP, as part of a SoC project with a
microblaze, implemented in a Digilent Spartan-3 SKB ( the FPGA is a
Xilinx XC3S200). The code is included at the end of this message.  The
input is a 32 bit signed integer coded in two's complement and the
output also a 32 bit signed integer. What I would like the accumulator
to do is to accumulate synchronously with the rising edge of clk when
enb=1 and maintain the result stable at the output when enb=0 ( enb is
a asynchronous signal generated elsewhere in the system)

But it does not work in this way, it behaves in a strange manner...

Some times I get the expected results but often I get strange values
(large when they should be small, often negative instead of positive,
etc.). If I look at the binary representation of the output, it looks
like if the output din't had time to sum and propagate to the output
again. In fact, the post place and route simulation shows that when the
enb signal goes to 0, the output stays in a undetermined condition (you
know, red line with XXXX).

I'm guessing I'm doing a very basic mistake that as something to do
with the timing of the enb signal, but after 3 days banging my had to
the wall, all I have is a a monumental headache.

Can some kind soul help me with this?

jmariano

===============
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity int_accum is
  port  (clk:in  std_logic;
         clr:in  std_logic;
         enb:in  std_logic;
         d:  in  std_logic_vector(31 downto 0);
         ovf:eek:ut std_logic;      -- overflow q:  out
         std_logic_vector(31 downto 0));
end int_accum;

architecture archi of int_accum is

  signal tmp : signed(32 downto 0);

  begin

  process(clk, clr)
  begin
        if (clr = '1') then
           tmp <= (others => '0');
   elsif (rising_edge (clk)) then
        if (enb = '1') then
                -- The result of the adder will be on 33 bits
                to keep the carry tmp <= tmp + signed ('0'& d);
    end if;
   end if;
  end process;

  -- The carry is extracted from the most significant bit of the
result
  ovf <= tmp(32);

  -- The q output is the 32 least significant bits of sum q <> > >>   std_logic_vector (tmp(31 downto 0));

end archi;

This is the key to your problem:

 enb is a asynchronous signal generated elsewhere in the system

You can't expect to take an asynchronous signal into multiple (32 in
this case) registers in a synchronous domain and expect that it will
work reliably.  You need to first synchronize the asynchronous input to
the synchronous clock domain before you can use it.

Which means that you should latch enb in a register, with the same clock
that you're using to twiddle your accumulator, and use the output of that
register as your enable signal.

Paranoid logic designers will have a string of two or three registers to
avoid metastability, but I've been told that's not necessary.  (I'm not
much of a logic designer).

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

It isn't just the paranoid logic designer, it should be every logic designer.

A single register only partially solves the problem of an asynchronous input with multiple register destinations, but it does not solve the very real metastability problem.  At least two registers should be used to ensure that the metastability condition has resolved and with increasing clock frequency and finer process nodes using three or more stages may be necessary..

Ed McGettigan
--
Xilinx Inc.
Hi Ed. They way it was explained to me, I believe from Peter Alfke,
is that what really resolves metastability is the slack time in a
register to register path. Over the years FPGA process has resulted
in FFs which only need a couple of ns to resolve metastability to 1 in
a million operation years or something like that (I don't remember the
metric, but it was good enough for anything I do). It doesn't matter
that you have logic in that path, you just need those few ns in every
part of the path. In theory, even if you use multiple registers with
no logic, what really matters is the slack time in the path and that
is not guaranteed even with no logic. So the design protocol should
be to assure the slack time from the input register to all subsequent
registers have sufficient slack time.

Do you remember how much time that needs to be? I want to say 2 ns,
but it might be more like 5 ns, I just can't recall. Of course it
depends on your clock rates, but I believe Peter picked some more
aggressive speeds like 100 MHz for his example.

Rick
 
Dear All,

Thank you very much for your input and sorry for the late reply.
It is really great to be able to get the opinion of such experts,
specially since, at my current location and in a radius of some 200
km, I must be the only person working with FPGA and VHDL! I'm also
glad that the discussion as evolved to levels of complexity far beyond
my knowledge.

I was hoping that by now I would be able to say that the thing was
working as expected but, unfortunately, no.

I've synchronized the enable signal, as suggested by Ed and Tim, using
3 FF (I'm not paranoid, I just have room). Also, following Brian
suggestions, I've clean up the code regarding type conversions. All
this as allow me to isolate the remaining source of error, thank you
very much.

Here's the full story: I'm implementing a gated integrator, as a part
of a boxcar averager. This is the standard noise reduction technique
used in nuclear magnetic resonance (nmr). This is research, not a
commercial product! The module gets is data from 4 8 bits ADC's at 5
MHz (adc0, adc90, adc180, adc270) and accumulates wile enb=1. enb is
generated in a different module. The module does this:
1 - generates the acquisition clock (adc_clk) by division by 10 of the
S3-SKB 50 MHz main clock
2- generates the accumulation clock (acc_clk) by inverting adc_clk.
In this way, there is a delay of 100 ns from the moment the ADC's
receive the rising edge of the clock to the moment when the data gets
registered at the output.
3 - converts the data from the adc's to excess 128 (bipolar adc) and
extends to 32 bit signed
4 - calculates u = adc0-adc180 and v=adc90-adc270. u and v go through
a switch and emerge as r and i, to be delivered to 2 alike
accumulators.
Of course, 3 and 4 must occur in less than 100 ns.

The switch unit is very simple: It has a control signal, s[1:0] that
comes from a different module, and the following table: 00 -> r=u,
i=v; 01 -> r=v, i=-u; 10 -> r=-v, i=u; 11 -> r=-v, i=-u. The s signal
is generated in a different clock domain and is stable 500 us before
the enb. enb has a typical duration of 10 us. The code is at the end
of this message.

I continue to get errors, specially when the input values are closed
to zero, which means that the result is changing from say FFFFFFFF to
00000001, so lots of bits to change.

I have (i think!) trace the source of error to the switch_unit
because, if I tie the s signal to a fixed value, 11 for example, the
unit works well, but if I connect to a real s signal, I get errors. So
I thought, this must be because the real s is noisy and r and i change
during the acquisition period (1mm ns) so I have synchronized s with
acc_clk, but the problem persists. What is more strange is that, if I
do s <= "01" inside the synchronization process, I also get the same
type of errors.

Really, don't now what to do next.

jmariano

=================
architecture archi of int_su is
begin
process(u, v, s)
begin
case s is
when "00" =>
r <= u;
i <= v;
when "01" =>
r <= v;
i <= -u;
when "10" =>
r <= -u;
i <= -v;
when "11" =>
r <= -v;
i <= u;
when others =>
r <= (others => 'X');
i <= (others => 'X');
end case;
end process;
end archi;
============
 
On Jul 5, 7:44 am, jmariano <jmarian...@gmail.com> wrote:
Dear All,

Thank you very much for your input and sorry for the late reply.
It is really great to be able to get the opinion of such experts,
specially since, at my current location and in a radius of some 200
km, I must be the only person working with FPGA and VHDL! I'm also
glad that the discussion as evolved to levels of complexity far beyond
my knowledge.

I was hoping that by now I would be able to say that the thing was
working as expected but, unfortunately, no.

I've synchronized the enable signal, as suggested by Ed and Tim, using
3 FF (I'm not paranoid, I just have room). Also, following Brian
suggestions, I've clean up the code regarding type conversions. All
this as allow me to isolate the remaining source of error, thank you
very much.

Here's the full story: I'm implementing a gated integrator, as a part
of a boxcar averager.  This is the standard noise reduction technique
used in nuclear magnetic resonance (nmr). This is research, not a
commercial product! The module gets is data from 4 8 bits ADC's at 5
MHz (adc0, adc90, adc180, adc270) and accumulates wile enb=1. enb is
generated in a different module. The module does this:
1 - generates the acquisition clock (adc_clk) by division by 10 of the
S3-SKB 50 MHz main clock
2-  generates the accumulation clock (acc_clk) by inverting adc_clk.
In this way, there is a delay of 100 ns from the moment the ADC's
receive the rising edge of the clock to the moment when the data gets
registered at the output.
3 - converts the data from the adc's to excess 128 (bipolar adc) and
extends to 32 bit signed
4 - calculates u = adc0-adc180 and v=adc90-adc270. u and v go through
a switch and emerge as r and i, to be delivered to 2 alike
accumulators.
Of course, 3 and 4 must occur in less than 100 ns.

The switch unit is very simple: It has a control signal, s[1:0] that
comes from a different module, and the following table: 00 -> r=u,
i=v; 01 -> r=v, i=-u; 10 -> r=-v, i=u; 11 -> r=-v, i=-u. The s signal
is generated in a different clock domain and is stable 500 us before
the enb. enb has a typical duration of 10 us. The code is at the end
of this message.

I continue to get errors, specially when the input values are closed
to zero, which means that the result is changing from say FFFFFFFF to
00000001, so lots of bits to change.

I have (i think!) trace the source of error to the switch_unit
because, if I tie the s signal to a fixed value, 11 for example, the
unit works well, but if I connect to a real s signal, I get errors. So
I thought, this must be because the real s is noisy and r and i change
during the acquisition period (1mm ns) so I have synchronized s with
acc_clk, but the problem persists.  What is more strange is that, if I
do s <= "01" inside the synchronization process, I also get the same
type of errors.

Really, don't now what to do next.

jmariano

================> architecture archi of int_su is
begin
        process(u, v, s)
        begin
                case s is
                when "00" =
                        r <=  u;
                        i <=  v;
                when "01" =
                        r <=  v;
                        i <= -u;
                when "10" =
                        r <= -u;
                        i <= -v;
                when "11" =
                        r <= -v;
                        i <=  u;
                when others =
                        r <= (others => 'X');
                        i <= (others => 'X');
                end case;
        end process;
end archi;
===========
I'm not real clear on your description of your design, but if you are
really generating clocks from the 50 MHz, I recommend that inside the
FPGA you instead use a single clock and generate clock enables for the
various functions. When you use multiple clocks in a circuit you have
to do extra work for every signal that crosses a clock domain. Could
that be your problem?

I don't see anything in your original post about simulation. Do you
simulate your modules? I highly recommend that you write a test
benche for each and every module you code. You may think this takes
too much time, but I believe it pays off in the end with shorter
integration time.

Rick
 
On Wednesday, July 4, 2012 12:49:07 PM UTC-7, rickman wrote:
On Jul 3, 5:45 pm, Ed McGettigan <ed.mcgetti...@xilinx.com> wrote:
On Monday, July 2, 2012 10:24:02 PM UTC-7, Tim Wescott wrote:
On Mon, 02 Jul 2012 17:19:59 -0700, Ed McGettigan wrote:

On Jul 2, 4:20 pm, jmariano <jmarian...@gmail.com> wrote:
Dear All,

I'm not an expert in VHDL, i'm just a curious trying to solve a
research problem with an FPGA.

I'm using a 32 bit accumulator in a IP, as part of a SoC project with a
microblaze, implemented in a Digilent Spartan-3 SKB ( the FPGA is a
Xilinx XC3S200). The code is included at the end of this message.  The
input is a 32 bit signed integer coded in two's complement and the
output also a 32 bit signed integer. What I would like the accumulator
to do is to accumulate synchronously with the rising edge of clk when
enb=1 and maintain the result stable at the output when enb=0 ( enb is
a asynchronous signal generated elsewhere in the system)

But it does not work in this way, it behaves in a strange manner....

Some times I get the expected results but often I get strange values
(large when they should be small, often negative instead of positive,
etc.). If I look at the binary representation of the output, it looks
like if the output din't had time to sum and propagate to the output
again. In fact, the post place and route simulation shows that when the
enb signal goes to 0, the output stays in a undetermined condition (you
know, red line with XXXX).

I'm guessing I'm doing a very basic mistake that as something to do
with the timing of the enb signal, but after 3 days banging my had to
the wall, all I have is a a monumental headache.

Can some kind soul help me with this?

jmariano

===============
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity int_accum is
  port  (clk:in  std_logic;
         clr:in  std_logic;
         enb:in  std_logic;
         d:  in  std_logic_vector(31 downto 0);
         ovf:eek:ut std_logic;      -- overflow q:  out
         std_logic_vector(31 downto 0));
end int_accum;

architecture archi of int_accum is

  signal tmp : signed(32 downto 0);

  begin

  process(clk, clr)
  begin
        if (clr = '1') then
           tmp <= (others => '0');
   elsif (rising_edge (clk)) then
        if (enb = '1') then
                -- The result of the adder will be on 33 bits
                to keep the carry tmp <= tmp + signed ('0'& d);
    end if;
   end if;
  end process;

  -- The carry is extracted from the most significant bit of the
result
  ovf <= tmp(32);

  -- The q output is the 32 least significant bits of sum q <> > > >>   std_logic_vector (tmp(31 downto 0));

end archi;

This is the key to your problem:

 enb is a asynchronous signal generated elsewhere in the system

You can't expect to take an asynchronous signal into multiple (32 in
this case) registers in a synchronous domain and expect that it will
work reliably.  You need to first synchronize the asynchronous input to
the synchronous clock domain before you can use it.

Which means that you should latch enb in a register, with the same clock
that you're using to twiddle your accumulator, and use the output of that
register as your enable signal.

Paranoid logic designers will have a string of two or three registers to
avoid metastability, but I've been told that's not necessary.  (I'm not
much of a logic designer).

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

It isn't just the paranoid logic designer, it should be every logic designer.

A single register only partially solves the problem of an asynchronous input with multiple register destinations, but it does not solve the very real metastability problem.  At least two registers should be used to ensure that the metastability condition has resolved and with increasing clock frequency and finer process nodes using three or more stages may be necessary.

Ed McGettigan
--
Xilinx Inc.

Hi Ed. They way it was explained to me, I believe from Peter Alfke,
is that what really resolves metastability is the slack time in a
register to register path. Over the years FPGA process has resulted
in FFs which only need a couple of ns to resolve metastability to 1 in
a million operation years or something like that (I don't remember the
metric, but it was good enough for anything I do). It doesn't matter
that you have logic in that path, you just need those few ns in every
part of the path. In theory, even if you use multiple registers with
no logic, what really matters is the slack time in the path and that
is not guaranteed even with no logic. So the design protocol should
be to assure the slack time from the input register to all subsequent
registers have sufficient slack time.

Do you remember how much time that needs to be? I want to say 2 ns,
but it might be more like 5 ns, I just can't recall. Of course it
depends on your clock rates, but I believe Peter picked some more
aggressive speeds like 100 MHz for his example.

Rick
I'm glad to see that one of my 5-6 attempts to post was finally accepted by Google. I have got to switch to something else.

Peter Alfke's publications on metastability definitely fall into the seminal category, but you must be careful to extrapolate the original data to the latest technology nodes, circuits and design requirements. There are two major factors that impact the metastability equations, the tau or metastability decay rate and the settling time.

The tau value is an inherent characteristic of the circuit and technology node and for a long time the expectation was that this is would decrease with each generation, but this has stopped being true.

The settling time, Ts, is dependent on the design and is under the user's control. Ts is a factor of the destination clock frequency and the timing slack between registers. If you have 100 MHz clock frequency, but you use up 9.5nS to get to the destination your slack is only 500pS. Adding register stages allows for maximum use of the clock period increasing the settling time and for each stage it increases again.

Ed McGettigan
--
Xilinx Inc.
 
Hi Rick, tanks for your help.

I'm not real clear on your description of your design, but if you are
really generating clocks from the 50 MHz, I recommend that inside the
FPGA you instead use a single clock and generate clock enables for the
various functions.
Yes, I generate a 5 MHz clock inside the module from the main 50 MHz clock by simple division by 10 because I need a 5 MHz adc clock. I can't use clock enable because the AD9058 adc does not have a enable input, just clock.

When you use multiple clocks in a circuit you have
to do extra work for every signal that crosses a clock domain. Could
that be your problem?
What is the extra work? Have no idea! Synchronization?

I don't see anything in your original post about simulation. Do you
simulate your modules? I highly recommend that you write a test
benche for each and every module you code. You may think this takes
too much time, but I believe it pays off in the end with shorter
integration time.
Sorry about that, I did, in fact, simulate each module and the top entity. The behavior simulation gives the expected results, the post and place simulation gives same errors that I could not understand, but I'll run the simulations again and post the results here.

jmariano
 
On Jul 5, 11:04 am, jmariano <jmarian...@gmail.com> wrote:
Hi Rick, tanks for your help.

I'm not real clear on your description of your design, but if you are
really generating clocks from the 50 MHz, I recommend that inside the
FPGA you instead use a single clock and generate clock enables for the
various functions.

Yes, I generate a 5 MHz clock inside the module from the main 50 MHz clock by simple division by 10 because I need a 5 MHz adc clock. I can't use clock enable because the AD9058 adc does not have a enable input, just clock.

When you use multiple clocks in a circuit you have
to do extra work for every signal that crosses a clock domain.  Could
that be your problem?

What is the extra work? Have no idea! Synchronization?

I don't see anything in your original post about simulation.  Do you
simulate your modules?  I highly recommend that you write a test
benche for each and every module you code.  You may think this takes
too much time, but I believe it pays off in the end with shorter
integration time.

Sorry about that, I did, in fact, simulate each module and the top entity.. The behavior simulation gives the expected results, the post and place simulation gives same errors that I could not understand, but I'll run the simulations again and post the results here.

jmariano
The good news here is that you have a simulation that shows the same
behavior in hardware. Looking at these simulation runs should tell
you exactly what the problem is. I don't think that anyone here will
be able to the same with the full source code for the design.

Ed McGettigan
--
Xilinx Inc.
 
On Jul 5, 2:03 pm, Ed McGettigan <ed.mcgetti...@xilinx.com> wrote:
On Wednesday, July 4, 2012 12:49:07 PM UTC-7, rickman wrote:
On Jul 3, 5:45 pm, Ed McGettigan <ed.mcgetti...@xilinx.com> wrote:
On Monday, July 2, 2012 10:24:02 PM UTC-7, Tim Wescott wrote:

Paranoid logic designers will have a string of two or three registers to
avoid metastability, but I've been told that's not necessary.  (I'm not
much of a logic designer).

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

It isn't just the paranoid logic designer, it should be every logic designer.

A single register only partially solves the problem of an asynchronous input with multiple register destinations, but it does not solve the very real metastability problem.  At least two registers should be used to ensure that the metastability condition has resolved and with increasing clock frequency and finer process nodes using three or more stages may be necessary.

Ed McGettigan
--
Xilinx Inc.

Hi Ed.  They way it was explained to me, I believe from Peter Alfke,
is that what really resolves metastability is the slack time in a
register to register path.  Over the years FPGA process has resulted
in FFs which only need a couple of ns to resolve metastability to 1 in
a million operation years or something like that (I don't remember the
metric, but it was good enough for anything I do).  It doesn't matter
that you have logic in that path, you just need those few ns in every
part of the path.  In theory, even if you use multiple registers with
no logic, what really matters is the slack time in the path and that
is not guaranteed even with no logic.  So the design protocol should
be to assure the slack time from the input register to all subsequent
registers have sufficient slack time.

Do you remember how much time that needs to be?  I want to say 2 ns,
but it might be more like 5 ns, I just can't recall.  Of course it
depends on your clock rates, but I believe Peter picked some more
aggressive speeds like 100 MHz for his example.

Rick

I'm glad to see that one of my 5-6 attempts to post was finally accepted by Google.  I have got to switch to something else.

Peter Alfke's publications on metastability definitely fall into the seminal category, but you must be careful to extrapolate the original data to the latest technology nodes, circuits and design requirements.  There are two major factors that impact the metastability equations, the tau or metastability decay rate and the settling time.

The tau value is an inherent characteristic of the circuit and technology node and for a long time the expectation was that this is would decrease with each generation, but this has stopped being true.

The settling time, Ts, is dependent on the design and is under the user's control. Ts is a factor of the destination clock frequency and the timing slack between registers. If you have 100 MHz clock frequency, but you use up 9.5nS to get to the destination your slack is only 500pS. Adding register stages allows for maximum use of the clock period increasing the settling time and for each stage it increases again.

Ed McGettigan
--
Xilinx Inc.
The info I am referring to are posts that were made here and pertained
to the "current" generation of some six or eight years ago. At that
time Peter made the point that the "tau" as you call it, had gotten so
fast that the impact was negligible for all but the most stringent
designs and only a small amount of slack time is needed.

A quick search found these two posts about V2Pro devices. I assume
your newer devices are at least as good as 10 year old technology.
Note that Peter makes a point that the capture window T0, which is a
product in the formula, is not an important parameter. Tau is an
exponent (in ratio with Tslack) in the formula and so makes much
larger contribution to the result. The same is true for the two clock
frequencies, they are just products in the formula and so don't make
huge changes to the MTBF.

So it seems like not much would have changed in 10 years in how a
designer should deal with metastability. Leaving 2 ns of slack time
in the first register to register path should make literally all
designs extremely robust regardless of how many registers are
receiving the first register output or if there is logic in the path.
Just make sure there is 2 ns slack time and your designs should be
good for many, many years!

Rick


===================================Peter Alfke comp.arch.fpga Oct 10 2002, 8:40 pm

You mentioned metastability, and that caught my attention.

Metastability is a reality, but it (and the fear of it) is highly
overrated.
We recently tested Virtex-IIPro flip-flops, made on 130 nm technology.
You
might call that cutting edge technology, but not exotic.
When a 330 MHz clock synchronized a ~50 MHz input, there was a 200 ps
extra
metastable delay ( causing a clock-to-out + short routing + set-up
total of 1.5
ns) once every second. That translates into a metastable capture
window that
has a width of 3 ns divided by 100 million ( since we looked at both
edges of
the 50 MHz signal).
So the window for a 200 ps extra delay is 0.03 femtoseconds.
If you can tolerate 500 ps more, the MTBF increases 100 000 times, and
the
capture window gets that much smaller.
Metastability is a real, but highly overrated problem.

Peter Alfke, Xilinx Applications
===================================
===================================Peter Alfke comp.arch.fpga Oct 15 2002, 1:11 pm

Here are the K2 values for Virtex-IIPro:

CLB @1.50V: K2 = 27.2, i.e. 1/K2 = tau = 36.8 picoseconds
CLB @1.35V: K2 = 23.3, i.e. 1/K2 = tau = 42.9 picoseconds
CLB @1.65V: K2 = 35.7, i.e. 1/K2 = tau = 28.0 picoseconds

IOB @1.50V: K2 = 24.4, i.e. 1/K2 = tau = 41.0 picoseconds
IOB @1.35V: K2 = 19.24, i.e. 1/K2 = tau = 52.0 picoseconds
IOB @1.65V: K2 = 44.05, i.e. 1/K2 = tau = 22.7 picoseconds

For each extra 100 ps of acceptable metastable delay,
the MTBF increases by a factor 10.3 for CLB @ 1.35 V,
or a factor 6.85 for IOB @ 1.35 V.
Much better values, of course, at nominal or high Vcc.

Klick on
http://support.xilinx.com/support/techxclusives/techX-home.htm
in early November.

Here is the worst-case data point:

50 MHz asynchronous data rate, 330 MHz clock , single-stage
synchronizer in IOB,
Vcc = 1.35 V:
clock-to-Q + short routing + set-up time + metastable delay exceeds
clock period
once per 30,000 years.

At nominal Vcc: once per 100 million years.

At a 250 MHz clock rate, delay exceeds clock period less often than
once per
billion years.

Peter Alfke, Xilinx Applications
====================================
 
On Jul 5, 8:04 pm, jmariano <jmarian...@gmail.com> wrote:
Hi Rick, tanks for your help.

I'm not real clear on your description of your design, but if you are
really generating clocks from the 50 MHz, I recommend that inside the
FPGA you instead use a single clock and generate clock enables for the
various functions.

Yes, I generate a 5 MHz clock inside the module from the main 50 MHz clock by simple division by 10 because I need a 5 MHz adc clock. I can't use clock enable because the AD9058 adc does not have a enable input, just clock.
you could just have a state machine running at 50MHz that grap data
and set/clear the clock

which I guess is partly what you have in you divide by 10

-Lasse
 
On Thu, 05 Jul 2012 11:04:36 -0700, jmariano wrote:

Hi Rick, tanks for your help.

I'm not real clear on your description of your design, but if you are
really generating clocks from the 50 MHz, I recommend that inside the
FPGA you instead use a single clock and generate clock enables for the
various functions.

Yes, I generate a 5 MHz clock inside the module from the main 50 MHz
clock by simple division by 10 because I need a 5 MHz adc clock. I can't
use clock enable because the AD9058 adc does not have a enable input,
just clock.
That's OK.

But you need to register the AD9058 outputs, inside the FPGA, to your
internal 50MHz clock. I would also register the S input and the U,V
outputs from the switch. (In fact I would make the switch a synch process
with only "clk" in its sensitivity list - it will effectively register
the switch outputs for you)

All these can be combined into a single synchronous process.

-- assuming u,v,r,i,adcnn are all signed!
process(clk)
begin
if rising_edge(clk) then
-- First pipe stage... synchronise the inputs
if adc_enable then -- 10 MHz, when ADC is stable
adc0_int <= adc0;
...
end if;
-- Second pipe stage... add the (synchronised inputs)
u <= adc0_int - adc90_int;
v <= ...
-- I assume "s" has to be synchronised to "adcnn"
-- so pipeline it to the same depth (also syncs it)
s_int <= s;
s_int2 <= s_int;
-- Third pipe stage ... the switch
case s_int2 is when "00" =>
r <= u;
i <= v;
when "01" =>
...
end case;
-- etc
end if;
end process;

Addition at 50MHz in a Spartan-3 should be no problem.

As your sample rate is 1/10 of the clock rate, I would expect you can
afford a few cycles for internal processing. (If this is not the case you
need to think carefully about how you pipeline the design)

I don't see anything in your original post about simulation.
Sorry about that, I did, in fact, simulate each module and the top
entity. The behavior simulation gives the expected results, the post and
place simulation gives same errors that I could not understand,
Excellent.
Before changing the design, I would sim with low level zero-crossing
signals, and see which inputs (s, ADCs) and internal signals (U,V, R,I)
are unstable whenever the large unexpected outputs are occurring.

Then what you need to do to fix will be clear.

You can also install multiple versions in the testbench, asserting their
outputs are the same, and reporting any difference.

- Brian
 
In article <nZ-dnch1rrNvHG_SnZ2dnUVZ_qSdnZ2d@web-ster.com>,
Tim Wescott <tim@seemywebsite.please> writes:

Paranoid logic designers will have a string of two or three registers to
avoid metastability, but I've been told that's not necessary. (I'm not
much of a logic designer).
Ahh, but are they paranoid enough?

The key is settling time.

In the old days of TTL chips, a pair of FFs (with no logic in between)
got you settling time of as much logic as the worst case delay for
the rest of the system. In practice, that was enough.

With FPGAs, routhing is important. A pair of FFs close together
is probably good enough. If you put them on opposite sides of a big
chip, the routing delays may match the long path of the logic delays
and eat up all of your slack time.

Have any FPGA vendors published recent metastability info?
(Many thanks to Peter Alfke for all his good work in this area.)

I'm not a silicon wizard. Is it reasonable to simulate this stuff?
I'd like to know worst case rather than typicals. It should be possible
to do something like verify simulations with lab typicals and then
use simulations to find the numbers for the nasty corners.

--
These are my opinions. I hate spam.
 
On Jul 6, 10:00 pm, hal-use...@ip-64-139-1-69.sjc.megapath.net (Hal
Murray) wrote:
In article <nZ-dnch1rrNvHG_SnZ2dnUVZ_qSdn...@web-ster.com>,
Tim Wescott <t...@seemywebsite.please> writes:

Paranoid logic designers will have a string of two or three registers to
avoid metastability, but I've been told that's not necessary. (I'm not
much of a logic designer).

Ahh, but are they paranoid enough?

The key is settling time.

In the old days of TTL chips, a pair of FFs (with no logic in between)
got you settling time of as much logic as the worst case delay for
the rest of the system. In practice, that was enough.

With FPGAs, routhing is important. A pair of FFs close together
is probably good enough. If you put them on opposite sides of a big
chip, the routing delays may match the long path of the logic delays
and eat up all of your slack time.

Have any FPGA vendors published recent metastability info?
(Many thanks to Peter Alfke for all his good work in this area.)

I'm not a silicon wizard. Is it reasonable to simulate this stuff?
I'd like to know worst case rather than typicals. It should be possible
to do something like verify simulations with lab typicals and then
use simulations to find the numbers for the nasty corners.
I'm not sure what you would want to simulate. Metastability is
probabilistic. There is For a given length of settling time there is
some probability of it happening. Increasing the settling time
reduces the probability but it will never be zero meaning there is no
max length of time it takes for the output of a metastable ff to
settle.

Is that what you are asking?

Rick
 
Ed McGettigan <ed.mcgettigan@xilinx.com> wrote:
(snip)
But it does not work in this way, it behaves in a strange manner...

Some times I get the expected results but often I get strange values
(large when they should be small, often negative instead of positive,
etc.). If I look at the binary representation of the output, it looks
like if the output din't had time to sum and propagate to the output
again. In fact, the post place and route simulation shows that when the
enb signal goes to 0, the output stays in a undetermined condition (you
know, red line with XXXX).
(snip)
It isn't just the paranoid logic designer, it should be every
logic designer.

A single register only partially solves the problem of an
asynchronous input with multiple register destinations, but it
does not solve the very real metastability problem.
At least two registers should be used to ensure that the
metastability condition has resolved and with increasing
clock frequency and finer process nodes using three or more
stages may be necessary.
Metastability can be a problem, but often the problem is clocking
multiple FFs off the same clock edge, with different delays on
either the clock or data. (The chance of the delays being exactly
equal is close to zero.) The two effects are different.

Note, for example, the common FIFO implementation using a
gray code counter (or binary to gray code converter).
That avoids the clock edge problem, as either value will
work correctly.

Metastability is a different problem, but one that also occurs
when using asynchronous input values.

-- glen
 
rickman <gnuarm@gmail.com> wrote:
(snip)

Paranoid logic designers will have a string of two or three registers to
avoid metastability, but I've been told that's not necessary.  (I'm not
much of a logic designer).
(snip)
Hi Ed. They way it was explained to me, I believe from Peter Alfke,
is that what really resolves metastability is the slack time in a
register to register path. Over the years FPGA process has resulted
in FFs which only need a couple of ns to resolve metastability to 1 in
a million operation years or something like that (I don't remember the
metric, but it was good enough for anything I do). It doesn't matter
that you have logic in that path, you just need those few ns in every
part of the path. In theory, even if you use multiple registers with
no logic, what really matters is the slack time in the path and that
is not guaranteed even with no logic. So the design protocol should
be to assure the slack time from the input register to all subsequent
registers have sufficient slack time.
I suppose that is true, but really it shouldn't be a problem.
It is usual for many systems to clock as fast as you can,
consistent with the critical path delay. As metastability
is exponential, even a slightly shorter delay is usually enough
to make enough difference in the exponent.

That assumes that there is a FF to FF path that is faster than
the FF logic FF path. I believe that is usual for FPGAs, but
if you manage to get a critical path with only one LUT, then
I am not so sure. But that is pretty hard in most real systems.

Do you remember how much time that needs to be? I want to say 2 ns,
but it might be more like 5 ns, I just can't recall. Of course it
depends on your clock rates, but I believe Peter picked some more
aggressive speeds like 100 MHz for his example.
I would expect most systems to have at least a 10% margin.
That is, the clock period is at least 10% longer than the
critical path delay. Probably closer to 20%, but maybe 10%.
So, with a 10ns clock there might be only 1ns slack.
Assuming some delay, say 1ns minimum from FF to FF, that
has nine times the slack, and that is in an exponent.

-- glen
 
jmariano <jmariano65@gmail.com> wrote:

(snip)
Thank you very much for your input and sorry for the late reply.
It is really great to be able to get the opinion of such experts,
specially since, at my current location and in a radius of some 200
km, I must be the only person working with FPGA and VHDL! I'm also
glad that the discussion as evolved to levels of complexity far beyond
my knowledge.

I was hoping that by now I would be able to say that the thing was
working as expected but, unfortunately, no.
(snip)

Here's the full story: I'm implementing a gated integrator, as a part
of a boxcar averager. This is the standard noise reduction technique
used in nuclear magnetic resonance (nmr). This is research, not a
commercial product! The module gets is data from 4 8 bits ADC's at 5
MHz (adc0, adc90, adc180, adc270) and accumulates wile enb=1. enb is
generated in a different module. The module does this:
(big snip)

I believe that most FPGA families have FFs with clock enable.

Be sure that you are writing your logic in such a way that
the tools figure that out. In most cases, I believe that means
not writing it as a gated clock. Write it as FF's with enable.

(I know how to write it in verilog but not VHDL.)

-- glen
 
Hal Murray <hal-usenet@ip-64-139-1-69.sjc.megapath.net> wrote:

(snip)
With FPGAs, routhing is important. A pair of FFs close together
is probably good enough. If you put them on opposite sides of a big
chip, the routing delays may match the long path of the logic delays
and eat up all of your slack time.
That is a good question. I usually assume that they won't have
a long route, but that might not be a good assumption.

Some time ago, I was working on a small design in a very large FPGA.
Expanding to fill the available space, things were very far apart.
(And, as I had so much space, I put three FFs in to synchronize,
but with long enough routes even that could fail.)

Have any FPGA vendors published recent metastability info?
(Many thanks to Peter Alfke for all his good work in this area.)

I'm not a silicon wizard. Is it reasonable to simulate this stuff?
I'd like to know worst case rather than typicals. It should be possible
to do something like verify simulations with lab typicals and then
use simulations to find the numbers for the nasty corners.
As I noted previously, though, often the problem isnt' metastabilty
but multiple FFs on the same asynchronous clock. Different problem.

-- glen
 

Welcome to EDABoard.com

Sponsor

Back
Top