Why is this code persnickety?

K

KK6GM

Guest
I'm a software guy with a decent hardware understanding, just
beginning to teach myself VHDL and FPGAs. I've got a Nexys2 board and
in playing with the multiplexed 7-seg displays I've run across a
situation where my code to delay and switch between digits runs for
some count values and doesn't run for others. Here's the code section
in question:

prescale_to_100us: process (clk_50M)
variable counter : natural range 0 to 2500 := 1;
begin
if (rising_edge(clk_50M)) then
counter := counter - 1;
if (counter = 0) then
counter := 2500;
clk_100us <= not clk_100us;
end if;
end if;
end process;


drive_digits: process (clk_100us, sw)
constant Scan_Count : natural := 300;
variable counter : natural range 0 to Scan_Count := 1;
variable digs7_var : std_logic_vector (3 downto 0) := "1110";
begin
if (rising_edge(clk_100us)) then
counter := counter - 1;
if (counter = 0) then
case sw is
when "001" => counter := Scan_Count / 8;
when "010" => counter := Scan_Count / 4;
when "100" => counter := Scan_Count / 2;
when others => counter := Scan_Count;
end case;

case digs7_var is
when "1110" => digs7_var := "1101";
when "1101" => digs7_var := "1011";
when "1011" => digs7_var := "0111";
when "0111" => digs7_var := "1110";
when others => digs7_var := "1110";
end case;

digs7 <= digs7_var;
end if;
end if;
end process;

This code consistently works for e.g. Scan_Count = 300, 302 or 305,
but fails (no multiplexing, just a solid 0 in the LSD position) for
301, 303 and 304. Nothing else is changed, just the value of
Scan_Count. Naturally I assume that the problem lies in my code, but
I have no idea what it might be. Any ideas? Thanks.

BTW, do I need "sw" in the sensitivity list for drive_digits?

Mike
 
On Sun, 10 Oct 2010 09:38:00 -0700 (PDT), KK6GM wrote:

I'm a software guy with a decent hardware understanding, just
beginning to teach myself VHDL and FPGAs. I've got a Nexys2 board and
in playing with the multiplexed 7-seg displays I've run across a
situation where my code to delay and switch between digits runs for
some count values and doesn't run for others. Here's the code section
in question:

prescale_to_100us: process (clk_50M)
variable counter : natural range 0 to 2500 := 1;
begin
if (rising_edge(clk_50M)) then
counter := counter - 1;
if (counter = 0) then
counter := 2500;
clk_100us <= not clk_100us;
end if;
end if;
end process;


drive_digits: process (clk_100us, sw)
constant Scan_Count : natural := 300;
variable counter : natural range 0 to Scan_Count := 1;
variable digs7_var : std_logic_vector (3 downto 0) := "1110";
begin
if (rising_edge(clk_100us)) then
counter := counter - 1;
if (counter = 0) then
......... reload scan counter, update multiplex output
end if;
end if;
end process;

This code consistently works for e.g. Scan_Count = 300, 302 or 305,
but fails (no multiplexing, just a solid 0 in the LSD position) for
301, 303 and 304. Nothing else is changed, just the value of
Scan_Count. Naturally I assume that the problem lies in my code, but
I have no idea what it might be. Any ideas? Thanks.
Your code looks broadly OK to me (have you simulated it?).
It's good that you hide locals inside the processes as variables.
Symbolic names for the various constants would help readability
and maintainability, I'd suggest, and (see your latter question)
no, you don't want 'sw' in the clocked process's sensitivity
list - but it shouldn't do any harm, because the whole body
of the process runs only on rising_edge.

Carefully check out the synthesis report. My first suspicion
would be that the clk_100us signal is not being routed on a
clock net. Possibly it hasn't been recognized as a clock
because of that bad sensitivity list - I don't know - but
if the tools didn't recognise the clock then the results
you see are not too surprising because you'll get clock
skew between the various FFs of the second process's
counter and this can give rise to hold time violations.

As a sanity check on this, consider reworking the clock
divider to generate a clock enable so that you can clock
both processes from the common 50MHz.

Also, note that your "if counter=0" tests (in both
processes) are testing NEXT-STATE values of the counter,
and therefore will give slower logic than if you had
tested "counter" before decrementing it. Try this:

prescale_to_100us: process (clk_50M)
variable counter: natural range 0 to 2500 := 1;
begin
if counter=1 then -- test first, tests current state
counter := 2500;
enable_100us <= '1';
else
counter := counter-1;
enable_100us <= '0';
end if;
end process;

drive_digits: process (clk_50M)
constant Scan_Count : natural := 300;
variable counter : natural range 0 to Scan_Count := 1;
variable digs7_var : std_logic_vector (3 downto 0) := "1110";
begin
if (rising_edge(clk_50M)) then
if (enable_100us) then
-- same logic as before.

Don't try to roll the two "if" tests into a single test;
some synthesis tools are rather picky about keeping the
rising_edge test in a separate statement.

BTW, do I need "sw" in the sensitivity list for drive_digits?
No, definitely not - see above.

Hope this helps, but I fear it may not...
--
Jonathan Bromley
 
On Oct 10, 11:46 am, Jonathan Bromley <s...@oxfordbromley.plus.com>
wrote:
On Sun, 10 Oct 2010 09:38:00 -0700 (PDT), KK6GM wrote:
I'm a software guy with a decent hardware understanding, just
beginning to teach myself VHDL and FPGAs.  I've got a Nexys2 board and
in playing with the multiplexed 7-seg displays I've run across a
situation where my code to delay and switch between digits runs for
some count values and doesn't run for others.  Here's the code section
in question:

   prescale_to_100us: process (clk_50M)
           variable counter : natural range 0 to 2500 := 1;
   begin
           if (rising_edge(clk_50M)) then
                   counter := counter - 1;
                   if (counter = 0) then
                           counter := 2500;
                           clk_100us <= not clk_100us;
                   end if;
           end if;
   end process;

   drive_digits: process (clk_100us, sw)
           constant Scan_Count : natural := 300;
           variable counter : natural range 0 to Scan_Count := 1;
           variable digs7_var : std_logic_vector (3 downto 0) := "1110";
   begin
           if (rising_edge(clk_100us)) then
                   counter := counter - 1;
                   if (counter = 0) then

........ reload scan counter, update multiplex output

                   end if;
           end if;
   end process;

This code consistently works for e.g. Scan_Count = 300, 302 or 305,
but fails (no multiplexing, just a solid 0 in the LSD position) for
301, 303 and 304.  Nothing else is changed, just the value of
Scan_Count.  Naturally I assume that the problem lies in my code, but
I have no idea what it might be.  Any ideas?  Thanks.

Your code looks broadly OK to me (have you simulated it?).  
It's good that you hide locals inside the processes as variables.  
Symbolic names for the various constants would help readability
and maintainability, I'd suggest, and (see your latter question)
no, you don't want 'sw' in the clocked process's sensitivity
list - but it shouldn't do any harm, because the whole body
of the process runs only on rising_edge.

Carefully check out the synthesis report.  My first suspicion
would be that the clk_100us signal is not being routed on a
clock net.  Possibly it hasn't been recognized as a clock
because of that bad sensitivity list - I don't know - but
if the tools didn't recognise the clock then the results
you see are not too surprising because you'll get clock
skew between the various FFs of the second process's
counter and this can give rise to hold time violations.

As a sanity check on this, consider reworking the clock
divider to generate a clock enable so that you can clock
both processes from the common 50MHz.

Also, note that your "if counter=0" tests (in both
processes) are testing NEXT-STATE values of the counter,
and therefore will give slower logic than if you had
tested "counter" before decrementing it.  Try this:

  prescale_to_100us: process (clk_50M)
    variable counter: natural range 0 to 2500 := 1;
  begin
    if counter=1 then  -- test first, tests current state
      counter := 2500;
      enable_100us <= '1';
    else
      counter := counter-1;
      enable_100us <= '0';
    end if;
  end process;

  drive_digits: process (clk_50M)
    constant Scan_Count : natural := 300;
    variable counter : natural range 0 to Scan_Count := 1;
    variable digs7_var : std_logic_vector (3 downto 0) := "1110";
  begin
    if (rising_edge(clk_50M)) then
      if (enable_100us) then
        -- same logic as before.

Don't try to roll the two "if" tests into a single test;
some synthesis tools are rather picky about keeping the
rising_edge test in a separate statement.

BTW, do I need "sw" in the sensitivity list for drive_digits?

No, definitely not - see above.

Hope this helps, but I fear it may not...
--
Jonathan Bromley- Hide quoted text -

- Show quoted text -
That did indeed solve the problem - many thanks! Clearly I need to
get a better understanding of the proper idioms for good, reliable
clocking. Guess I'm lucky to have stumbled across this whole issue
early on.

Mike
 
In some of my code I had a divided clock. I found that it was easier
to detect and latch the end count 1 cycle before into a single bit and
then use that single bit on the next cycle to do the clock inversion.
This speeded up the design by over 4 times. I think this is because
the clock enable of the divided clock FF was the output of an FF
instead of 2 layers of 4LUTs.
 
As has been mentioned, your code compares count to zero after it has
been decremented, meaning both combinatorial circuits must settle in
series before you get a valid answer, and that takes more time than
would otherwise be necessary.

There are a couple of ways around this. The first is to check and then
only decrement if safe to do so:

if counter = 0 then -- parallel comparison of each bit
counter := half_100us_period; -- use a constant (hint: 2500 - 1)!
clk_100us <= not clk_100us;
else
counter := counter - 1;
end if;

This method compares the output of the count register to zero, before
it is decremented, thus avoiding the serial delays of decrement and
compare within one clock cycle.

The second is related to this, but uses a trait of integer arithmetic
to avoid the parallel comparison of the count to zero:

if count - 1 < 0 then -- checked the carry bit
count := half_100us_period;
clk_100us <= not clk_100us;
else
count := count - 1; -- share decrementer with the comparison
end if;

The above ONLY WORKS WITH INTEGER SUBTYPE COUNTERS! No matter what the
integer subtype range of the operands, the - operator (or any other
integer operator) returns a full integer range result. That is not an
error. It is only an error if you try to store the result in a
subtype, and it is outside of that subtype's range. So, even though
count cannot be less than zero, count - 1 can be less than zero, and
can be used to check the carry bit.

What's more, the synthesis tool will share the two decrementers (the
one for the comparison, and the one for the counting action),
resulting in only one decrementer being produced.

In real life, if the counter is plenty fast (and/or the clock is
plenty slow), use the method that is most easily understood by the
reader, that also still meets timing. That reader could be a reviewer,
a verification engineer, a test engineer, or another developer that
needs to update the design, and could even be you in another 6 weeks
after you have forgotten why you did it the way you did.

Andy
 
On Oct 11, 9:42 am, Andy <jonesa...@comcast.net> wrote:
As has been mentioned, your code compares count to zero after it has
been decremented, meaning both combinatorial circuits must settle in
series before you get a valid answer, and that takes more time than
would otherwise be necessary.
Right, this is obvious now that it has been pointed out to me. :)

To ask a question related to my OP, what is the preferred method (or
methods) of driving FFs at a rate derived from a master clock? For
example, suppose I want to clock some FFs at 1kHz from a 50MHz clock.
Is it to use the 1-clock-out-of-N enable as was discussed above? Are
there other "correct" ways as well? I guess the question is, do FFs
being updated at rates of MasterClock/N normally have their clock
inputs driven by MasterClock with suitable enables, or is it common to
actually derive other clocks from MasterClock?
 
On Oct 11, 9:56 pm, KK6GM <mjsi...@scriptoriumdesigns.com> wrote:

To ask a question related to my OP, what is the preferred method (or
methods) of driving FFs at a rate derived from a master clock?  
In FPGAs or CPLDs, generate a clock enable. In ASICs, an actual gated
clock might be considered appropriate. The reason gated clocks are
generally not acceptable in FPGAs or CPLDs is because then the design
becomes sensitive to *minimum* propogation delays through logic due to
the inevitable need to reliably transfer signals from one clock domain
to the other, but there are no resources to control this delay. This
will create hold time issues where the signals from one clock domain
can switch prior to and be accepted by the gated clock one clock cycle
'early' or can cause incorrect logic because hold times were
violated. While a timing analysis will reveal the problem, the only
recourse is to change a random number seed and try to re-route and get
something different and hope. Every iteration of the design will
require this same sort of hand holding...most people have better
things to do with their time.

For
example, suppose I want to clock some FFs at 1kHz from a 50MHz clock.
Is it to use the 1-clock-out-of-N enable as was discussed above?  
Yes

Are
there other "correct" ways as well?  
If the division is fixed and known at design time, then there are
better forms of counters. As an example, an LFSR counter would
minimize the amount of logic needed to create the counter.

Another technique is to break the counter up into smaller pieces to
improve the clock cycle performance. Consider a 24 bit counter.
Rather than having to decode all 24 bits to decide that the count has
rolled over, another technique would be to break the counter up into
two (or three) sections of 8-12 bits. When the 'more significant'
bits sections counts down it sets a flop. Then the final rollover
occurs when all higher order sections have their bit set, and the
'least significant' bit section also counts down to 0.

I guess the question is, do FFs
being updated at rates of MasterClock/N normally have their clock
inputs driven by MasterClock with suitable enables,
In CPLDs and FPGAs...yes

or is it common to
actually derive other clocks from MasterClock?
In ASICs...yes (clock enables also work here also)

Kevin Jennings
 
On Mon, 11 Oct 2010 18:56:54 -0700 (PDT), KK6GM <mjsilva@scriptoriumdesigns.com>
wrote:

On Oct 11, 9:42 am, Andy <jonesa...@comcast.net> wrote:
As has been mentioned, your code compares count to zero after it has
been decremented, meaning both combinatorial circuits must settle in
series before you get a valid answer, and that takes more time than
would otherwise be necessary.

Right, this is obvious now that it has been pointed out to me. :)

To ask a question related to my OP, what is the preferred method (or
methods) of driving FFs at a rate derived from a master clock? For
example, suppose I want to clock some FFs at 1kHz from a 50MHz clock.
Is it to use the 1-clock-out-of-N enable as was discussed above? Are
there other "correct" ways as well? I guess the question is, do FFs
being updated at rates of MasterClock/N normally have their clock
inputs driven by MasterClock with suitable enables, or is it common to
actually derive other clocks from MasterClock?
In FPGAs it is normal to use the 1-of-N clock enable. If your design meets
timings without any struggle that way (as I would expect at 50MHz), forget about
going any further and concentrate your energy elsewhere. It's simplest; it's
clean, and life is short.

Otherwise...

If you can identify signal paths that are entirely within the slow domain (i.e.
both source and dest run off the enable) you can add multi-cycle timing
constraints to them, so that P&R doesn't waste effort improving their speed.
Then it can concentrate on improving what needs to be fast.

(If you don't follow me here, try increasing your target clock rate 10MHz at a
time. You will probably see P&R time explode at some point...)


If you have a real reason for wanting to reduce the clock speed there are ways
to do it. The main reason would be to reduce power consumption ( e.g. for
battery operated equipment) or device temperature in more complex designs.

But recognise that FPGAs don't offer as much scope for power saving through
redesign as ASICS. There are low power device families, though.


The canonical way to change clock speed is to use the FPGA's clock generation
blocks (DCM in Xilinx, PLL in Altera or newer Xilinx). These offer frequency
synthesis options including programmable division ratios, variable phase shift,
automatically align the divided (or multiplied!) clock edges with the source
clock, and so on.

But more importantly, the timing analysis tools know about them. So timing
constraints are correctly propagated through the DCM with appropriate margins
for jitter and error, and output clocks will route using clock resources.

Contrast this with the mess you get simply dividing a clock yourself.

Typically you wouldn't want to change the source clock frequency by more than an
order of magnitude this way (cascading DCMs is a BIG no-no; each adds its own
jitter until the last one can't achieve lock) so you might divide your clock
down to 1 or 10MHz and use 1-of-n enables from there. Beyond that point, the
static power consumption will dominate, so there is no further saving.

- Brian
 
OK, things are much clearer now. Thanks for all the good info.

Mike
 

Welcome to EDABoard.com

Sponsor

Back
Top