Counter clocks on both edges sometimes, but not when differe

Mr.CRC · May 14, 2011

Hi:

I'm using a Xilinx Spartan 3E FPGA (on the Digilent NEXYS2 500k board)
to implement a quadrature encoder simulator, among other things.

The qep_sim.v code is shown below. The clock input to qep_sim() is
multiplexed from one of two buffered and isolated external world inputs.

The problem is that when the multiplexer selects one clock input, the
qep_sim() occasionally counts on a negedge. While, if the other input
is used, it never glitches like this and performs correctly.

The 288kHz clocks at the point where they enter the NEXYS2 board have
been probed and are clean, including when the counting glitch occurs.

What is stranger is that if I try to send a copy of the multiplexed
clock out to an IO pin to scope it, then the glitching goes away no
matter which clock source is selected.

Thanks for ideas on what might be wrong. I suspect it has something to
do with non-ideal choices of IO pins for the clock inputs. I didn't
have time to test this today, but I suspect if I simply move the
ext_sim_clk to a different pin, the problem will go away.

I wouldn't be satisfied with this however, as I wish to understand the
real cause of the problem.

Here is the multiplexing code (excerpted from a longer module):

module shaft (
// Begin QEP related ports
// Switch SW0 selects: 0 = ext_sim_clk, 1 = dsp_sim_clk
input clk_mux_sel,
// Note: signal dsp_sim_clk is being temporarily supplied for
// troubleshooting from an identical external buffer circuit
// as the ext_sim_clk:
input dsp_sim_clk, // sim clk generated by DSP.
input ext_sim_clk, // external sim clk input
// diagnostic sim clk output. Problem goes away if this is used:
// output sim_clk_o,
[edited out many lines]
);

wire sim_clk; // muxed sim clk to feed to qep_sim.v

// Select the simulation clock source:
assign sim_clk = clk_mux_sel ? dsp_sim_clk : ext_sim_clk;

// Instantiate the QEP simulator. Source for qep_maxcnt is not shown
// here for brevity, and since this isn't being used in qs1 at present.
qep_sim
qs1( .clk(sim_clk), .maxcnt(qep_maxcnt),
.a(qep_a_sim), .b(qep_b_sim), .m(qep_mex_sim) );

// The outputs of qs1 go to another mux, to select the simulator vs.
// a real encoder to send to the DSP's QEP counter peripheral. This
// code is not shown.

endmodule

Here's the relevant excerpt from my .ucf file (the sim_clk_o is
commented out when the problem is happening):

NET "dsp_sim_clk" LOC = "R18" | IOSTANDARD = LVCMOS33; # JB-2
NET "sim_clk_o" LOC = "G15" | IOSTANDARD = LVCMOS33 | SLEW = FAST; # JC-1
NET "ext_sim_clk" LOC = "H16" | IOSTANDARD = LVCMOS33; # JC-4

------------------------------------------------------------------------
// This module simulates the outputs of a BEI incremental
// (quadrature) encoder
module qep_sim(
input clk,
input [15:0] maxcnt, // not used at present, until problem diagnosed
output a,
output b,
output m
);

parameter MAX_CNT = 31; // using fixed period during troubleshooting

reg [15:0] cnt;

always @ (posedge clk) begin
if (cnt == MAX_CNT)
cnt = 0;
else
cnt = cnt + 1;
end

assign a = ~cnt[1];
assign b = cnt[1] ^ cnt[0];
assign m = ~b & (cnt == 0 || cnt == MAX_CNT);
// index pulse m is high
// straddling max count and zero. Why the redundant '&' with ~b is
// performed? I forget. Maybe this is unnecessary.
endmodule
------------------------------------------------------------------------

--
_____________________
Mr.CRC
crobcBOGUS@REMOVETHISsbcglobal.net
SuSE 10.3 Linux 2.6.22.17

Joel Williams · May 14, 2011

Thanks for ideas on what might be wrong. I suspect it has something to
do with non-ideal choices of IO pins for the clock inputs. I didn't
have time to test this today, but I suspect if I simply move the
ext_sim_clk to a different pin, the problem will go away.

Rather than using these signals as clocks, why not sample the incoming
signal with the Nexys2's 50 MHz clock?

Register the input and check for rising edges:

reg [1:0] ext_clk;

always @(posedge clk50)
ext_clk[1:0] <= {ext_clk[0], sim_clk};

wire rising_edge = ext_clk[1] & ~ext_clk[0];

and then count these:

always @(posedge clk50)
if (rising_edge)
if (cnt == MAX_CNT) ...

Joel

Andrew Holme · May 14, 2011

"Mr.CRC" <crobcBOGUS@REMOVETHISsbcglobal.net> wrote in message
news:iqkvcv01c5r@news6.newsguy.com...

Hi:

I'm using a Xilinx Spartan 3E FPGA (on the Digilent NEXYS2 500k board)
to implement a quadrature encoder simulator, among other things.

The qep_sim.v code is shown below. The clock input to qep_sim() is
multiplexed from one of two buffered and isolated external world inputs.

The problem is that when the multiplexer selects one clock input, the
qep_sim() occasionally counts on a negedge. While, if the other input
is used, it never glitches like this and performs correctly.

The 288kHz clocks at the point where they enter the NEXYS2 board have
been probed and are clean, including when the counting glitch occurs.

What is stranger is that if I try to send a copy of the multiplexed
clock out to an IO pin to scope it, then the glitching goes away no
matter which clock source is selected.

Thanks for ideas on what might be wrong. I suspect it has something to
do with non-ideal choices of IO pins for the clock inputs. I didn't
have time to test this today, but I suspect if I simply move the
ext_sim_clk to a different pin, the problem will go away.

I wouldn't be satisfied with this however, as I wish to understand the
real cause of the problem.

Here is the multiplexing code (excerpted from a longer module):

module shaft (
// Begin QEP related ports
// Switch SW0 selects: 0 = ext_sim_clk, 1 = dsp_sim_clk
input clk_mux_sel,
// Note: signal dsp_sim_clk is being temporarily supplied for
// troubleshooting from an identical external buffer circuit
// as the ext_sim_clk:
input dsp_sim_clk, // sim clk generated by DSP.
input ext_sim_clk, // external sim clk input
// diagnostic sim clk output. Problem goes away if this is used:
// output sim_clk_o,
[edited out many lines]
);

wire sim_clk; // muxed sim clk to feed to qep_sim.v

// Select the simulation clock source:
assign sim_clk = clk_mux_sel ? dsp_sim_clk : ext_sim_clk;

// Instantiate the QEP simulator. Source for qep_maxcnt is not shown
// here for brevity, and since this isn't being used in qs1 at present.
qep_sim
qs1( .clk(sim_clk), .maxcnt(qep_maxcnt),
.a(qep_a_sim), .b(qep_b_sim), .m(qep_mex_sim) );

// The outputs of qs1 go to another mux, to select the simulator vs.
// a real encoder to send to the DSP's QEP counter peripheral. This
// code is not shown.

endmodule

Here's the relevant excerpt from my .ucf file (the sim_clk_o is
commented out when the problem is happening):

NET "dsp_sim_clk" LOC = "R18" | IOSTANDARD = LVCMOS33; # JB-2
NET "sim_clk_o" LOC = "G15" | IOSTANDARD = LVCMOS33 | SLEW = FAST; # JC-1
NET "ext_sim_clk" LOC = "H16" | IOSTANDARD = LVCMOS33; # JC-4

------------------------------------------------------------------------
// This module simulates the outputs of a BEI incremental
// (quadrature) encoder
module qep_sim(
input clk,
input [15:0] maxcnt, // not used at present, until problem diagnosed
output a,
output b,
output m
);

parameter MAX_CNT = 31; // using fixed period during troubleshooting

reg [15:0] cnt;

always @ (posedge clk) begin
if (cnt == MAX_CNT)
cnt = 0;
else
cnt = cnt + 1;
end

assign a = ~cnt[1];
assign b = cnt[1] ^ cnt[0];
assign m = ~b & (cnt == 0 || cnt == MAX_CNT);
// index pulse m is high
// straddling max count and zero. Why the redundant '&' with ~b is
// performed? I forget. Maybe this is unnecessary.
endmodule

This module qep_sim will have glitches in the middle of the high pulse on
output b when cnt[1] changes picoseconds before cnt[0] as it can do,
depending on internal routing.

Mr.CRC · May 14, 2011

Andrew Holme wrote:

"Mr.CRC" <crobcBOGUS@REMOVETHISsbcglobal.net> wrote in message
news:iqkvcv01c5r@news6.newsguy.com...
reg [15:0] cnt;

always @ (posedge clk) begin
if (cnt == MAX_CNT)
cnt = 0;
else
cnt = cnt + 1;
end

assign a = ~cnt[1];
assign b = cnt[1] ^ cnt[0];
assign m = ~b & (cnt == 0 || cnt == MAX_CNT);
// index pulse m is high
// straddling max count and zero. Why the redundant '&' with ~b is
// performed? I forget. Maybe this is unnecessary.
endmodule

This module qep_sim will have glitches in the middle of the high pulse on
output b when cnt[1] changes picoseconds before cnt[0] as it can do,
depending on internal routing.

Yes, I see. Not sure why I didn't think of that. Now this reminds me
that the reason why m includes the & ~b is to prevent a glitch on m when
the cnt goes from MAX_CNT->0. But of course, that presumes b doesn't
have glitches...

I will try to work this out.

BTW, the potential glitching you describe has never actually been
observed from this module. Though with my slow slew rate outputs, it
may have just been unobservable externally.

So this is a seperate issue from the original problem, which is that cnt
changes on the wrong edge of the clock sometimes. This results in
shortened pulses for a, b, and less frequently, m, since the phenomenon
is random so only in 2/(1+MAX_CNT) of cases does it overlap with m.

It is also true that when the clocking glitch occurs, the correct phase
of the a and b signals is preserved, So the encoder "speeds up"
effectively by half a clock cycle.

Thanks for the reply.

--
_____________________
Mr.CRC
crobcBOGUS@REMOVETHISsbcglobal.net
SuSE 10.3 Linux 2.6.22.17

Mr.CRC · May 14, 2011

Joel Williams wrote:

Thanks for ideas on what might be wrong. I suspect it has something to
do with non-ideal choices of IO pins for the clock inputs. I didn't
have time to test this today, but I suspect if I simply move the
ext_sim_clk to a different pin, the problem will go away.

Rather than using these signals as clocks, why not sample the incoming
signal with the Nexys2's 50 MHz clock?

Register the input and check for rising edges:

reg [1:0] ext_clk;

always @(posedge clk50)
ext_clk[1:0] <= {ext_clk[0], sim_clk};

wire rising_edge = ext_clk[1] & ~ext_clk[0];

and then count these:

always @(posedge clk50)
if (rising_edge)
if (cnt == MAX_CNT) ...

I could probably do something like this.

BTW, doesn't the above produce 20ns pulses on "rising_edge" when sim_clk
goes H->L, ie. on the falling edge?

Also, I wonder about the use of a once-registered asynchronous input
signal. Shouldn't the sim_clk be registered at least twice before
engaging in combinatorial functions? Ie, it seems that if ext_clk[0]
waffles for a few extra ns due to metastability, then the pulse on
"rising_edge" could be shorter than 20ns and not reliable.

Wouldn't it make more sense to simply use:

wire rising_edge = ext_clk[1] & ext_clk[0]; // ???

In which case I think it is safe even employing the once-registered
ext_clk[0] ?

There is still an unresolved engineering principles question here
though. Do FPGA logic designers ALWAYS sync asynchronous inputs to the
internal clock? If there is a circuit which is to be clocked by an
external source, and it is not going to interact with other process on
different clocks, then why bother syncing this clock input to the 50MHz
on-board clock? Ie, my qep_sim.v exists in its own clock domain, albeit
there is still the mux to choose which external clock to use.

This also doesn't answer the question of why the behavior changes vs.
input pin.

I have gathered that when clocks are to be brought in to the FPGA, it is
highly recommended to use a GCLK IO pin, so the signal may be buffered
onto a global clock routing line. I have to see if I can rearrange my
IOs to get these external inputs onto GCLK IOs, but there are two of
them and the NEXYS2 isn't very liberal about providing GCLKs on the pin
headers. Some other GCLKs are available on another connector, but I
don't yet have the mating for that.

Of course, when muxing external clock sources, if there are a lot of
them, one could eat into the supply of GCLKs quickly, so this is
undesirable.

A more interesting question is then, is it possible to take a GP IO
input pin, and internally buffer it onto an internal clock routing line?

Thanks for input and clarification.

--
_____________________
Mr.CRC
crobcBOGUS@REMOVETHISsbcglobal.net
SuSE 10.3 Linux 2.6.22.17

Andrew Holme · May 14, 2011

"Mr.CRC" <crobcBOGUS@REMOVETHISsbcglobal.net> wrote in message
news:iqm4ri02prr@news4.newsguy.com...

Andrew Holme wrote:
"Mr.CRC" <crobcBOGUS@REMOVETHISsbcglobal.net> wrote in message
news:iqkvcv01c5r@news6.newsguy.com...
reg [15:0] cnt;

always @ (posedge clk) begin
if (cnt == MAX_CNT)
cnt = 0;
else
cnt = cnt + 1;
end

assign a = ~cnt[1];
assign b = cnt[1] ^ cnt[0];
assign m = ~b & (cnt == 0 || cnt == MAX_CNT);
// index pulse m is high
// straddling max count and zero. Why the redundant '&' with ~b is
// performed? I forget. Maybe this is unnecessary.
endmodule

This module qep_sim will have glitches in the middle of the high pulse on
output b when cnt[1] changes picoseconds before cnt[0] as it can do,
depending on internal routing.

Yes, I see. Not sure why I didn't think of that. Now this reminds me
that the reason why m includes the & ~b is to prevent a glitch on m when
the cnt goes from MAX_CNT->0. But of course, that presumes b doesn't
have glitches...

I will try to work this out.

BTW, the potential glitching you describe has never actually been
observed from this module. Though with my slow slew rate outputs, it
may have just been unobservable externally.

So this is a seperate issue from the original problem, which is that cnt
changes on the wrong edge of the clock sometimes.

If it's an external clock, slow rise and fall times plus additive noise can
cause multiple counts if it slews too slowly through the logic threshold.
If the external clock rate is low, the best solution is to sample it at a
rate comparable to the rise/fall time.

rickman · May 14, 2011

On May 14, 11:27 am, "Mr.CRC" <crobcBO...@REMOVETHISsbcglobal.net>
wrote:

Joel Williams wrote:
Thanks for ideas on what might be wrong. I suspect it has something to
do with non-ideal choices of IO pins for the clock inputs. I didn't
have time to test this today, but I suspect if I simply move the
ext_sim_clk to a different pin, the problem will go away.

Rather than using these signals as clocks, why not sample the incoming
signal with the Nexys2's 50 MHz clock?

Register the input and check for rising edges:

reg [1:0] ext_clk;

always @(posedge clk50)
ext_clk[1:0] <= {ext_clk[0], sim_clk};

wire rising_edge = ext_clk[1] & ~ext_clk[0];

and then count these:

always @(posedge clk50)
if (rising_edge)
if (cnt == MAX_CNT) ...

I could probably do something like this.

BTW, doesn't the above produce 20ns pulses on "rising_edge" when sim_clk
goes H->L, ie. on the falling edge?

Yes. The expression should be ~ext_clk[1] & ext_clk[0] for a rising
edge detect.

Also, I wonder about the use of a once-registered asynchronous input
signal. Shouldn't the sim_clk be registered at least twice before
engaging in combinatorial functions? Ie, it seems that if ext_clk[0]
waffles for a few extra ns due to metastability, then the pulse on
"rising_edge" could be shorter than 20ns and not reliable.

Are you using rising_edge as a pulse with a defined width or is it an
enable to the next FF? You can resolve metastability anywhere along
this chain, as long as you do it before the signal branches out to
more than one place.

Wouldn't it make more sense to simply use:

wire rising_edge = ext_clk[1] & ext_clk[0]; // ???

No, this does not detect an edge. This resolves metastability and
eliminates glitches, but you still need to detect the edge of the
clock signal. The idea of detecting the edge is to both bring the
signal into the clk50 domain and also create a one clock period wide
enable to use in place of a separate clock.

In which case I think it is safe even employing the once-registered
ext_clk[0] ?

No, this doesn't even resolve metastability unless you provide more
slack time to the next FF.

There is still an unresolved engineering principles question here
though. Do FPGA logic designers ALWAYS sync asynchronous inputs to the
internal clock?

"Always" is a big statement. I nearly always pick one fast clock
domain as the "internal" clock and sync slow clocks to this fast one.
Each "external" clock becomes a clock enable internally. There are
times when I use the external clock directly, for example the serial
clock on an SPI port directly clocks the shift register in a recent
SPI design. Then I have lots of time to sync the parallel word to the
internal clock. But this could be done either way. Sometimes it
depends on the relative rate of the two clocks. The closer in speed
they are the harder it can be to fully analyze the timing.

If there is a circuit which is to be clocked by an
external source, and it is not going to interact with other process on
different clocks, then why bother syncing this clock input to the 50MHz
on-board clock? Ie, my qep_sim.v exists in its own clock domain, albeit
there is still the mux to choose which external clock to use.

If it is completely independent as if it were being done on a separate
chip, then yes, there is no reason to sync it to the internal clock as
long as you can get the clock to a clock input and even that is not
essential. The tools have to do less work to make a clock on a clock
tree meet setup and hold times.

This also doesn't answer the question of why the behavior changes vs.
input pin.

I suspect that has to do with your clock signal. Does it have a slow
rise/fall time? I suspect a bit of noise is making it double clock.
When you route it differently (since it is not on a clock IO pin) the
glitch can get filtered out depending on how it is routed. It only
takes a very tiny glitch to cause this sort of double clocking and
such a tiny glitch can be filtered by the random routing. Putting it
on a global clock tree should make it fail more often.

I have gathered that when clocks are to be brought in to the FPGA, it is
highly recommended to use a GCLK IO pin, so the signal may be buffered
onto a global clock routing line. I have to see if I can rearrange my
IOs to get these external inputs onto GCLK IOs, but there are two of
them and the NEXYS2 isn't very liberal about providing GCLKs on the pin
headers. Some other GCLKs are available on another connector, but I
don't yet have the mating for that.

Don't try to fix a problem if you don't understand the cause. Why
would a GLK IO pin eliminate your problem?

Of course, when muxing external clock sources, if there are a lot of
them, one could eat into the supply of GCLKs quickly, so this is
undesirable.

Bingo! That is a big reason for treating slow clocks as clock enables
on an internal clock.

A more interesting question is then, is it possible to take a GP IO
input pin, and internally buffer it onto an internal clock routing line?

Isn't that what is already happening in your case?

Thanks for input and clarification.

Let us know what you find. It is hard to analyze problems like this
without doing tests. I may be completely off base in this case.

Rick

KJ · May 14, 2011

On May 14, 11:27 am, "Mr.CRC" <crobcBO...@REMOVETHISsbcglobal.net>
wrote:

There is still an unresolved engineering principles question here
though. Do FPGA logic designers ALWAYS sync asynchronous inputs to the
internal clock?

They do if they want to guarantee it will work and be maintainable and/
or reusable. Either that or they come back here posting about a
design that 'used to work just fine when...blah, blah, and now it does
not'

If there is a circuit which is to be clocked by an
external source, and it is not going to interact with other process on
different clocks, then why bother syncing this clock input to the 50MHz
on-board clock?

You wouldn't. The point of synchronizing to a clock is when you're
*changing* clock domains.

Ie, my qep_sim.v exists in its own clock domain, albeit
there is still the mux to choose which external clock to use.

The output of the clock mux is a new clock domain, different than
either of the two input clock domains. Thinking that the selected
input clock and the 'output of the mux clock' are the same clock
domain is a mistake that will come back to bite you.

This also doesn't answer the question of why the behavior changes vs.
input pin.

Because sometimes you get 'lucky'. There are probably lots of things
(such as moving pins, hand routing, etc.) that you can do to appear to
'fix' the problem. If you're lucky you'll find that none of these
things really works well and that you still occasionally get
glitches. If you're not lucky you won't find this out until much
later when it will get more and more difficult and expensive to fix if
you've deployed many of these boards.

If you'd rather be smart than lucky you'll stop using gated internal
clocks and adopt synchronous design practices.

I have gathered that when clocks are to be brought in to the FPGA, it is
highly recommended to use a GCLK IO pin, so the signal may be buffered
onto a global clock routing line.

It is even more recommended that you not generate your own internal
clocks with logic.

I have to see if I can rearrange my
IOs to get these external inputs onto GCLK IOs, but there are two of
them and the NEXYS2 isn't very liberal about providing GCLKs on the pin
headers. Some other GCLKs are available on another connector, but I
don't yet have the mating for that.

This would be some of the tricks that I mentioned earlier that, if
you're lucky, you'll find are not robust without spending too much
time. Or maybe your luck will run out, the design will appear to
work, you'll think you're home free...until a couple of months later
when you're handed a stack of boards that aren't qute working and you
need to fix them.

Of course, when muxing external clock sources, if there are a lot of
them, one could eat into the supply of GCLKs quickly, so this is
undesirable.

Yep...you shouldn't do that.

A more interesting question is then, is it possible to take a GP IO
input pin, and internally buffer it onto an internal clock routing line?

I may be an interesting question...but you'll get to the end line much
sooner if you use strictly synchronous practices and only cross clock
domains either with dual clock fifos or proper synchronization logic.

Kevin Jennings

Joel Williams · May 14, 2011

BTW, doesn't the above produce 20ns pulses on "rising_edge" when sim_clk
goes H->L, ie. on the falling edge?

Yes. The expression should be ~ext_clk[1]& ext_clk[0] for a rising
edge detect.

My mistake, sorry!

I'm just glad that everyone agreed with me about the method

Joel

Mr.CRC · May 15, 2011

Andrew Holme wrote:

If it's an external clock, slow rise and fall times plus additive noise can
cause multiple counts if it slews too slowly through the logic threshold.
If the external clock rate is low, the best solution is to sample it at a
rate comparable to the rise/fall time.

Thanks for the reply.

Ugh. I could have sworn a few minutes ago I found a spec of 500ns for
the min slew rate of an input in the hideous ds312.pdf "Spartan-3E FPGA
Family: Data Sheet" but now I can't find it again.

Anyway, getting a clean enough clock to a logic device really isn't that
hard. I'll scope it again Monday and see if I can't put it up somewhere.

--
_____________________
Mr.CRC
crobcBOGUS@REMOVETHISsbcglobal.net
SuSE 10.3 Linux 2.6.22.17

Mr.CRC · May 15, 2011

Andrew Holme wrote:

"Mr.CRC" <crobcBOGUS@REMOVETHISsbcglobal.net> wrote in message
news:iqkvcv01c5r@news6.newsguy.com...
assign a = ~cnt[1];
assign b = cnt[1] ^ cnt[0];
assign m = ~b & (cnt == 0 || cnt == MAX_CNT);
// index pulse m is high
// straddling max count and zero. Why the redundant '&' with ~b is
// performed? I forget. Maybe this is unnecessary.
endmodule

This module qep_sim will have glitches in the middle of the high pulse on
output b when cnt[1] changes picoseconds before cnt[0] as it can do,
depending on internal routing.

Actually, on second thought, if the counter is synchronous, then I would
expect it to be safe to perform combinatorics on the output. This is a
classic approach with discrete logic, which works because even if there
is process skew (or even sub ns delays due to actual paths on a discrete
wired logic circuit with typical devices switching in the >=4ns range)
between the individual bits, the resulting differences in timing are
much shorter than gates of the same process can detect.

Thus, the simple rule that if you want to perform combinatorial
functions on the output of a counter, use a synchronous counter (where
all outputs change *effectively* at the same time within the timescales
that the gates can switch), and things will work out just fine. I have
always observed this to be correct in practice.

Where you would run into trouble, is if you do something silly like
using a slow 4000 CMOS counter to feed a combinatorial comparator made
out of fast HC or AC gates.

Thus, unless the combinatorials in the FPGA can actually switch on a few
ps glitch in the output data of the counter, then this is really not an
issue.

As a design principle though, what I suspect is that again, there is a
paradigm shift going from the discrete logic world to the FPGA, where in
the latter, everything is done synchronously.

So, in order to make this synchronous, do I simply register the output
of the combinatorial functions?

--
_____________________
Mr.CRC
crobcBOGUS@REMOVETHISsbcglobal.net
SuSE 10.3 Linux 2.6.22.17

Mr.CRC · May 15, 2011

KJ wrote:

On May 14, 11:27 am, "Mr.CRC" <crobcBO...@REMOVETHISsbcglobal.net
wrote:
If there is a circuit which is to be clocked by an
external source, and it is not going to interact with other process on
different clocks, then why bother syncing this clock input to the 50MHz
on-board clock?

You wouldn't. The point of synchronizing to a clock is when you're
*changing* clock domains.

Ie, my qep_sim.v exists in its own clock domain, albeit
there is still the mux to choose which external clock to use.

The output of the clock mux is a new clock domain, different than
either of the two input clock domains. Thinking that the selected
input clock and the 'output of the mux clock' are the same clock
domain is a mistake that will come back to bite you.

Ok, so then what we have here is that we need to input two different
clocks, and use one or the other. Whichever one we use, after
multiplexing is a unique clock domain. The question then is what is the
proper way to multiplex clock sources, and once multiplexed, to
distribute that clock properly?

This also doesn't answer the question of why the behavior changes vs.
input pin.

Because sometimes you get 'lucky'. There are probably lots of things
(such as moving pins, hand routing, etc.) that you can do to appear to
'fix' the problem. If you're lucky you'll find that none of these
things really works well and that you still occasionally get
glitches. If you're not lucky you won't find this out until much
later when it will get more and more difficult and expensive to fix if
you've deployed many of these boards.

If you'd rather be smart than lucky you'll stop using gated internal
clocks and adopt synchronous design practices.

Recall I said: "but I suspect if I simply move the ext_sim_clk to a
different pin, the problem will go away. I wouldn't be satisfied with
this however, as I wish to understand the real cause of the problem."

You stated that once a clock is muxed, then it is a new clock domain.
You also stated that one wouldn't bother syncing a clock unless changing
clock domains. There is no data moving across clock domains here. The
data would be generated in the post-mux counter.

Are you trying to say that the only way to do this correctly is sample
the two clocks with another clock???

What if they are of similar frequency to the available sampling clock?
What if I don't want the sampling jitter? What if the result of the
counter that is to be clocked by one of two different sources must
remain in the same clock domain as the clock source outside the FPGA?
Ok then, the mux becomes a problem. But in this case, the delay of a
mux is insignificant. But the jitter of sampling might be significant.
There might be a legitimate need to keep the counter synchronous with
an external process.

So the question remains quite simply: what is the proper way to
multiplex clock sources, and once multiplexed, to distribute that clock
properly?

There are discussions on the Altera forum of resources in the Altera
FPGAs for muxing clock sources and properly distributing them. So it is
clear that the chip maker sees the need for being able to do this correctly.

For ex:
"Ripple and gated clocks: clock dividers, clock muxes, and other
logic-driven clocks"
http://www.alteraforum.com/forum/showthread.php?t=2388

I think my situation conforms with this guideline:

Guideline #2: Have no synchronous data paths going to/from the derived
clock domain

Thus, this post seems to pertain:

"Gated clocks: On/off gating and muxing of clocks in clock control
blocks or logic"

A more interesting question is then, is it possible to take a GP IO
input pin, and internally buffer it onto an internal clock routing line?

I may be an interesting question...but you'll get to the end line much
sooner if you use strictly synchronous practices and only cross clock
domains either with dual clock fifos or proper synchronization logic.

Kevin Jennings

What if the clock that I want to feed into a pin, which is not a GCLK,
defines a clock domain? Then this is not crossing domains. Can a
regular IO pin buffer onto a proper clock routing network?

Thanks for input.

--
_____________________
Mr.CRC
crobcBOGUS@REMOVETHISsbcglobal.net
SuSE 10.3 Linux 2.6.22.17

Mr.CRC · May 15, 2011

rickman wrote:

Also, I wonder about the use of a once-registered asynchronous input
signal. Shouldn't the sim_clk be registered at least twice before
engaging in combinatorial functions? Ie, it seems that if ext_clk[0]
waffles for a few extra ns due to metastability, then the pulse on
"rising_edge" could be shorter than 20ns and not reliable.

Are you using rising_edge as a pulse with a defined width or is it an
enable to the next FF? You can resolve metastability anywhere along
this chain, as long as you do it before the signal branches out to
more than one place.

Wouldn't it make more sense to simply use:

wire rising_edge = ext_clk[1] & ext_clk[0]; // ???

No, this does not detect an edge. This resolves metastability and
eliminates glitches, but you still need to detect the edge of the
clock signal. The idea of detecting the edge is to both bring the
signal into the clk50 domain and also create a one clock period wide
enable to use in place of a separate clock.

In which case I think it is safe even employing the once-registered
ext_clk[0] ?

No, this doesn't even resolve metastability unless you provide more
slack time to the next FF.

Ok, I am beginning to have my light bulb turn on slowly here. The clock
enable concept is new to me, but I think I get it. By sampling my
external clock and creating pulses in the clk50 domain, I can use those
to enable the counter.

The only thing I'm still unsure about is, that if these rising_edge
pulses are generated by logic, then they are delayed somewhat from
clk50. Do they enable count on the NEXT clk50 edge? (And then
disappear a few ns after that clock edge?)

Would it make more sense in this type of situation to sample with a 180
deg phase shifted ~clk50, so that the clock enable pulses would be
centered on the clk50 counting edges?

There is still an unresolved engineering principles question here
though. Do FPGA logic designers ALWAYS sync asynchronous inputs to the
internal clock?

"Always" is a big statement. I nearly always pick one fast clock
domain as the "internal" clock and sync slow clocks to this fast one.
Each "external" clock becomes a clock enable internally. There are
times when I use the external clock directly, for example the serial
clock on an SPI port directly clocks the shift register in a recent
SPI design. Then I have lots of time to sync the parallel word to the
internal clock. But this could be done either way. Sometimes it
depends on the relative rate of the two clocks. The closer in speed
they are the harder it can be to fully analyze the timing.

Yes, I see. And I also see how this becomes quite messy if the external
clock is near the speed of the internal one. But only if data needs to
cross between the two domains, right?

If there is a circuit which is to be clocked by an
external source, and it is not going to interact with other process on
different clocks, then why bother syncing this clock input to the 50MHz
on-board clock? Ie, my qep_sim.v exists in its own clock domain, albeit
there is still the mux to choose which external clock to use.

If it is completely independent as if it were being done on a separate
chip, then yes, there is no reason to sync it to the internal clock as
long as you can get the clock to a clock input and even that is not
essential. The tools have to do less work to make a clock on a clock
tree meet setup and hold times.

Ok, so while I have learned something so far about how to sync external
(slow) clocks to an internal fast clock, it is the case here that this
qep_sim() is a unique clock domain, so should be able to be clocked by
its external source directly.

This also doesn't answer the question of why the behavior changes vs.
input pin.

I suspect that has to do with your clock signal. Does it have a slow
rise/fall time? I suspect a bit of noise is making it double clock.
When you route it differently (since it is not on a clock IO pin) the
glitch can get filtered out depending on how it is routed. It only
takes a very tiny glitch to cause this sort of double clocking and
such a tiny glitch can be filtered by the random routing. Putting it
on a global clock tree should make it fail more often.

I have to probe it again Monday. I recall Friday that I was a bit
surprised that it had a rounder rising edge near the upper level than I
would have expected. However, I think it is still in the <200ns range.
It originates from a TI ISO7220C, which specify 1ns output rise/fall.
Good grief, that can't be right. I'ts slowed down a little with a
resistor in there for various reasons related to power sequencing of the
IO buffer panel, vs. the DSP that it usually feeds, or optionally the FPGA.

I tried to find the min rise time spec in the Spartan 3E datasheet last
night, and I could swear I found a place where it said 500ns, and now
for the life of me I can't find it again in that hideously long document.

I will look into this again with the scope. I'm almost sure it's not
noise though, since a time magnification of the clock when a counting
glitch occurs is perfectly smooth.

An important fact is that the counting error always happens on a falling
edge of this ext_sim_clk, not at random times. So it's not picking up
junk from other traces which may carry asynchronous signals, of which
there are a few.

I have gathered that when clocks are to be brought in to the FPGA, it is
highly recommended to use a GCLK IO pin, so the signal may be buffered
onto a global clock routing line. I have to see if I can rearrange my
IOs to get these external inputs onto GCLK IOs, but there are two of
them and the NEXYS2 isn't very liberal about providing GCLKs on the pin
headers. Some other GCLKs are available on another connector, but I
don't yet have the mating for that.

Don't try to fix a problem if you don't understand the cause. Why
would a GLK IO pin eliminate your problem?

Oh, I don't have any intention of fixing it until I have understood it
(I stated this in the OP)!

Of course, when muxing external clock sources, if there are a lot of
them, one could eat into the supply of GCLKs quickly, so this is
undesirable.

Bingo! That is a big reason for treating slow clocks as clock enables
on an internal clock.

Well in this case, it also gets muxed. So it would be a waste of GCLK
inputs if it gets fed through logic and becomes a "derived clock" anyway.

A more interesting question is then, is it possible to take a GP IO
input pin, and internally buffer it onto an internal clock routing line?

Isn't that what is already happening in your case?

I have no idea at this point until I learn more about constraints and
how to interpret the hundreds of pages of jibberish that the tools
report when I make my bit file.

As I mentioned in another post, there seems to be considerable
discussion on the Altera forum about using constraints to control how a
clock is distributed.

There is also a discussion of how muxing clocks can be dangerous. I am
very suspicous that this might be the real cause of my problem, rather
than a signal integrity issue (I'm pretty good with signal integrity):

"Gated clocks: On/off gating and muxing of clocks in clock control
blocks or logic"
http://www.alteraforum.com/forum/showpost.php?p=8506&postcount=7

I have to learn more about how these things are managed on the Xilinx
device as well.

Thanks for input and clarification.

Let us know what you find. It is hard to analyze problems like this
without doing tests. I may be completely off base in this case.

Rick

What do you make of the test that I did where I took a copy of the muxed
clock, and routed it back outside, then the glitches disappeared?

Since this was not perturbing the ext_sim_clk input path, and yet the
problem disappeared, it argues strongly that the glitches are
originating internally, possibly in the mux, where perhaps a change in
loading by adding the new output path makes the glitches go away.

Thanks very much for your feedback!

--
_____________________
Mr.CRC
crobcBOGUS@REMOVETHISsbcglobal.net
SuSE 10.3 Linux 2.6.22.17

KJ · May 15, 2011

On May 15, 2:40 pm, "Mr.CRC" <crobcBO...@REMOVETHISsbcglobal.net>
wrote:

KJ wrote:
The output of the clock mux is a new clock domain, different than
either of the two input clock domains. Thinking that the selected
input clock and the 'output of the mux clock' are the same clock
domain is a mistake that will come back to bite you.

Ok, so then what we have here is that we need to input two different
clocks, and use one or the other.

1. What exactly is the 'need'?
2. What exactly needs to be clocked?
3. What exactly are you sampling with these clocks that really does
need to be sampled with those clocks?

If you step back a bit, I believe you'll find (at least based on what
you've posted here) are the following answers:
1. Nothing
2. A counter
3. Nothing

Whichever one we use, after
multiplexing is a unique clock domain. The question then is what is the
proper way to multiplex clock sources, and once multiplexed, to
distribute that clock properly?

In this case, the solution is not to clock anything with your muxed
'clocks' since there is nothing that actually needs to be clocked with
those signals. Instead you should mux those input 'clocks' to create
a logic signal that you then synchronize to your FPGA's clock (I'm
assuming here that there is some clock used for clocking logic in the
FPGA that has nothing to do with these input clocks...maybe that's not
what you have, will get to that later). Now takes this this muxed
logic signal, synchronize it and delay it and then simply look for a
clock cycle of '0' followed by a clock cycle of '1' to create a clock
enable for the counter. The counter is clocked by the FPGA's clock.

If there is no other FPGA clock as I assumed, then more info would be
needed about just what you have. Based on what you've described
though, the only situation where you would have to actually mux a
clock and use it as a muxed clock would be if those two input clocks
need to be selected AND the unselected clock is not guaranteed to be
there.

This also doesn't answer the question of why the behavior changes vs.
input pin.

Because sometimes you get 'lucky'. There are probably lots of things
(such as moving pins, hand routing, etc.) that you can do to appear to
'fix' the problem. If you're lucky you'll find that none of these
things really works well and that you still occasionally get
glitches. If you're not lucky you won't find this out until much
later when it will get more and more difficult and expensive to fix if
you've deployed many of these boards.

If you'd rather be smart than lucky you'll stop using gated internal
clocks and adopt synchronous design practices.

Recall I said: "but I suspect if I simply move the ext_sim_clk to a
different pin, the problem will go away. I wouldn't be satisfied with
this however, as I wish to understand the real cause of the problem."

Find the documentation that guarantees that the device, when
generating an internally generated clock signal, will distribute that
clock to arrive at each destination flop simultaneously enough to
guarantee that any logic input signal to that flop will not violate
the setup/hold window. Over all temperature ranges? Really? Or is
the guarantee only that an input clock pin (or PLL/DLL output) can be
freely distributed anywhere about the device and be guaranteed that
any logic input will meet setup/hold times? (Hint: That is likely
the thing that is guaranteed).

You stated that once a clock is muxed, then it is a new clock domain.
You also stated that one wouldn't bother syncing a clock unless changing
clock domains. There is no data moving across clock domains here. The
data would be generated in the post-mux counter.

Right, so there is no need in your case to use either of your input
clock signals as a signals that clocks anything in your device. All
you're doing is looking for edges on the muxed signal and when you
find it, use that as the clock enable to the counter (after having
synchronized that muxed signal).

Are you trying to say that the only way to do this correctly is sample
the two clocks with another clock???

Not quite. What I'm saying is that if you have another clock that
will always be running and is fast enough compare with your two input
clocks, then I'm saying that this is the best way in an FPGA.

What if they are of similar frequency to the available sampling clock?
What if I don't want the sampling jitter?

Then the clock mux should be external to the FPGA where you can
guarantee better defined performance.

What if the result of the
counter that is to be clocked by one of two different sources must
remain in the same clock domain as the clock source outside the FPGA?

Since the input to and output from a mux are not the same, this clock
mux (wherever it is located) would define different clock
domains...the counter would not be operating in the same clock
domain. However, what you're really trying to say, but not
expressing, is that there might be a latency requirement from the
rising edge of the input clocks to the mux until the counter output is
valid. That stated requirement, by itself, does *not* imply that
there must be a clock mux.

Assuming for the moment that the mux is external, than this simply
says you have a Tco requirement on the counter relative to the FPGA's
input clock. To calculate the required Tco for the FPGA, you would
have to subtract off the max prop delay through the external mux.

Ok then, the mux becomes a problem. But in this case, the delay of a
mux is insignificant. But the jitter of sampling might be significant.

You're not actually sampling anything here.

There might be a legitimate need to keep the counter synchronous with
an external process.

When you come across that need, you'll have a different description of
the problem at hand.

So the question remains quite simply: what is the proper way to
multiplex clock sources, and once multiplexed, to distribute that clock
properly?

And there isn't a single answer. The simplest, most portable answer
is the one that has been discussed here which is to use the muxed
clock not as something that actually clocks anything but instead
enables clocked logic. This has situations though where it is not
appropriate, namely where the free running clock is not significantly
faster than the clocks being muxed.

Another approach would mux the clocks external to the FPGA as I
mentioned here. This is appropriate in some cases.

Another approach is to input the two clocks into a PLL/DLL in the
device that allows for multiple input clocks to be muxed together.
Here the FPGA has dedicated resources with known delays that enable
this solution to work. Limits you to using devices that have this
ability.

Another approach is to simply run multiple sets of logic in parallel
each clock clocking its own set of logic. The mux then is on the
output. Takes more logic, but sometimes that is an acceptable way.

Another approach is to mux the signals in the FPGA and generate that
as an output signal. Feed that back on a real clock input pin to the
FPGA.

There are other approaches...the point is that each design situation
can present unique opportunities to take advantage of specific
situations.

There are discussions on the Altera forum of resources in the Altera
FPGAs for muxing clock sources and properly distributing them. So it is
clear that the chip maker sees the need for being able to do this correctly.

The Altera forum is open to posters from all over the world (as is
comp.arch.fpga)...don't believe everything you read on the internet.
One would also question the validity of using any technique that might
happen to take advantage of an Altera specific feature on a device
from Xilinx. If the technique is generic then it should be
applicable.

For ex:
"Ripple and gated clocks: clock dividers, clock muxes, and other
logic-driven clocks"http://www.alteraforum.com/forum/showthread.php?t=2388

I think my situation conforms with this guideline:

That post is written by 'Brad' who has a status of 'guest'. Do you
have some specific reason for believing everything that Brad has to
say? One person who has earned a decent rep thought the post was
good, maybe it is OK. But what exactly in your design is requiring
you to actually use a generated clock?

Guideline #2: Have no synchronous data paths going to/from the derived
clock domain

Thus, this post seems to pertain:

Which is immediately after...

#1: Do not use ripple or gated clocks; use clock enables or PLLs
instead.

What if the clock that I want to feed into a pin, which is not a GCLK,
defines a clock domain? Then this is not crossing domains. Can a
regular IO pin buffer onto a proper clock routing network?

You're hung up a bit on 'clock domain' as a term. You need to focus

instead on delays. How long does it take to get from an external pin
to a clock input of a flop for example? Do you care about the delay?
What is the skew between multiple flops? Are there any special
requirements or can those flops be placed anywhere?

Kevin Jennings

Nial Stewart · May 16, 2011

In this case, the solution is not to clock anything with your muxed
'clocks' since there is nothing that actually needs to be clocked with
those signals. Instead you should mux those input 'clocks' to create
a logic signal that you then synchronize to your FPGA's clock

I would probably take it a step further and synchronise the two 'clocks' to
the FPGA clock at the IO then select which of the two synchronised inputs is
used as the enable for the counter.

Generalising....

Mr CRC, FPGAs and the tools are designed to guarantee that the output of
one register clocked with one clock will get to the input of any other
register clocked with the same clock (as long as the build meets timing).
This is the beauty of the devices, you don't need to worry about timing,
just functionality.

Once you start introducing asynchronous clock transfers guaranteeing what happens
between them is difficult to constrain, and for the tools to analyse. This is
what needs to be carefully handled in your design.

It is easiest to design, constrain, guarantee meeting timing and so get
guaranteed functional devices if you keep the number of clocks to a minimum.

Preferably 1 (though this is rarely possible).

Nial.

rickman · May 17, 2011

On May 15, 3:24 pm, "Mr.CRC" <crobcBO...@REMOVETHISsbcglobal.net>
wrote:

rickman wrote:
....snip...
Ok, I am beginning to have my light bulb turn on slowly here. The clock
enable concept is new to me, but I think I get it. By sampling my
external clock and creating pulses in the clk50 domain, I can use those
to enable the counter.

The only thing I'm still unsure about is, that if these rising_edge
pulses are generated by logic, then they are delayed somewhat from
clk50. Do they enable count on the NEXT clk50 edge? (And then
disappear a few ns after that clock edge?)

Would it make more sense in this type of situation to sample with a 180
deg phase shifted ~clk50, so that the clock enable pulses would be
centered on the clk50 counting edges?

It is hard to give specific advice to such a general question, but I
don't think I have ever had to do anything other than detect the edge
of the clock, no "phase shifting" required. However, if your sampled
clock is close in speed to half the sampling clock rate (i.e. close to
the Nyquist rate) then you might find that you need to also sample the
data to keep the two in sync. But the details here depend on the
details of your interface. A timing diagram is the only way to see
what is going on.

Recently I had a design that needed to sample the incoming clock and
data, but I first registered the serial data with the incoming clock,
then sampled both. This allowed me to minimize the skew introduced to
the incoming clock relative to the data and have a fresh alignment to
work with internally. This was only required because the clock edge
and other aspects of the design were programmable so I need to
optimize the setup and hold times for a general case rather than
working with specific details of a single interface case.

"Always" is a big statement. I nearly always pick one fast clock
domain as the "internal" clock and sync slow clocks to this fast one.
Each "external" clock becomes a clock enable internally. There are
times when I use the external clock directly, for example the serial
clock on an SPI port directly clocks the shift register in a recent
SPI design. Then I have lots of time to sync the parallel word to the
internal clock. But this could be done either way. Sometimes it
depends on the relative rate of the two clocks. The closer in speed
they are the harder it can be to fully analyze the timing.

Yes, I see. And I also see how this becomes quite messy if the external
clock is near the speed of the internal one. But only if data needs to
cross between the two domains, right?

Even that doesn't have to be a problem. There is a circuit that will
guarantee that you get every clock edge in a crossing and by using a
combination of clocking with the incoming clock and sampling you can
assure that the data stays with the clock. The limitation is the
jitter in your clocks. That has to be smaller than the difference is
period. If one clock edge jitters too much so that a period has no
clock edge in it (or two if you are looking from the other
perspective), you can miss events and data.

Ok, so while I have learned something so far about how to sync external
(slow) clocks to an internal fast clock, it is the case here that this
qep_sim() is a unique clock domain, so should be able to be clocked by
its external source directly.

But you can use the internal clock as a means of "filtering" the
spurious edges of the incoming clock.

I suspect that has to do with your clock signal. Does it have a slow
rise/fall time? I suspect a bit of noise is making it double clock.
When you route it differently (since it is not on a clock IO pin) the
glitch can get filtered out depending on how it is routed. It only
takes a very tiny glitch to cause this sort of double clocking and
such a tiny glitch can be filtered by the random routing. Putting it
on a global clock tree should make it fail more often.

I have to probe it again Monday. I recall Friday that I was a bit
surprised that it had a rounder rising edge near the upper level than I
would have expected. However, I think it is still in the <200ns range.
It originates from a TI ISO7220C, which specify 1ns output rise/fall.
Good grief, that can't be right. I'ts slowed down a little with a
resistor in there for various reasons related to power sequencing of the
IO buffer panel, vs. the DSP that it usually feeds, or optionally the FPGA.

200 ns is an eternity on the input of an FPGA. The internal FFs can
be clocked by glitches on the order of 10's of ps, IIRC. You will
never see this on a scope. The FPGA circuit itself is the best way to
detect double clocking problems.

I tried to find the min rise time spec in the Spartan 3E datasheet last
night, and I could swear I found a place where it said 500ns, and now
for the life of me I can't find it again in that hideously long document.

Even if they give you a number it will have no meaning since it all
depends on the construction of your board. Ground bounce is
inevitable. The only question is "how much"? A slow input is
susceptible to a smaller amount of ground noise than a fast one.

Another advantage (or maybe this was pointed out before) of the rising
edge detection on the resync approach is that it removes all such
glitches. They are so fast that at the slower sample rate they can't
be seen.

I will look into this again with the scope. I'm almost sure it's not
noise though, since a time magnification of the clock when a counting
glitch occurs is perfectly smooth.

Yeah, it will look great. But if it is crossing the threshold when a
bit of noise hits, it will either double clock on one rising edge or
can see a rising edge when the clock is falling.

An important fact is that the counting error always happens on a falling
edge of this ext_sim_clk, not at random times. So it's not picking up
junk from other traces which may carry asynchronous signals, of which
there are a few.

I don't really suspect cross talk although that could be happening.
FPGAs have a lot of really fast stuff going on inside and it all
happens within the same fraction of a nanosecond. That puts a current
spike on the ground and power pins which causes a voltage spike from
the dI/dt. I have seen this a number of times while I haven't seen an
issue with crosstalk causing this in maybe 20 years. I think the PCB
designers tend to watch for crosstalk pairs like a hawk while folks
sometimes forget about the ground noise of fast parts.

I have gathered that when clocks are to be brought in to the FPGA, it is
highly recommended to use a GCLK IO pin, so the signal may be buffered
onto a global clock routing line. I have to see if I can rearrange my
IOs to get these external inputs onto GCLK IOs, but there are two of
them and the NEXYS2 isn't very liberal about providing GCLKs on the pin
headers. Some other GCLKs are available on another connector, but I
don't yet have the mating for that.

Don't try to fix a problem if you don't understand the cause. Why
would a GLK IO pin eliminate your problem?

Oh, I don't have any intention of fixing it until I have understood it
(I stated this in the OP)!

Of course, when muxing external clock sources, if there are a lot of
them, one could eat into the supply of GCLKs quickly, so this is
undesirable.

Bingo! That is a big reason for treating slow clocks as clock enables
on an internal clock.

Well in this case, it also gets muxed. So it would be a waste of GCLK
inputs if it gets fed through logic and becomes a "derived clock" anyway.

The big issue with muxing clocks is the added routing time which
disrupts the timing relationship between the clock and the incoming
data. The tools can help to analyze this, but it is better to not
have to deal with it at all if you can. Muxing a clock enable is
pretty much transparent since at that point it is just logic.

A more interesting question is then, is it possible to take a GP IO
input pin, and internally buffer it onto an internal clock routing line?

Isn't that what is already happening in your case?

I have no idea at this point until I learn more about constraints and
how to interpret the hundreds of pages of jibberish that the tools
report when I make my bit file.

As I mentioned in another post, there seems to be considerable
discussion on the Altera forum about using constraints to control how a
clock is distributed.

That sort of stuff is hard to deal with in my opinion. Better off
just not having the clock in the first place.

There is also a discussion of how muxing clocks can be dangerous. I am
very suspicous that this might be the real cause of my problem, rather
than a signal integrity issue (I'm pretty good with signal integrity):

"Gated clocks: On/off gating and muxing of clocks in clock control
blocks or logic"http://www.alteraforum.com/forum/showpost.php?p=8506&postcount=7

I have to learn more about how these things are managed on the Xilinx
device as well.

In the Lattice parts I use they have clock mux blocks. But these
don't deal with the main problems of muxing clocks, the delay. They
just provide a way to "cleanly" switch between clocks without
generating splinter pulses. If the clock has no data interface then
you have a lot more freedom to run clocks through logic.

Thanks for input and clarification.

Let us know what you find. It is hard to analyze problems like this
without doing tests. I may be completely off base in this case.

Rick

What do you make of the test that I did where I took a copy of the muxed
clock, and routed it back outside, then the glitches disappeared?

Remember that the glitch can be as small as a few 10's of picoseconds
which doesn't take much to filter out. I wouldn't worry with that
test myself. The problem is not something mysterious. You said the
problem *always* happens on a clock edge and never in the middle of a
clock cycle, so that is clear enough to me that it is a combination of
the slow clock edges and a bit of ground noise. Fix one or the other
or sample the clock and data to in essence, provide a digital low pass
filter.

Since this was not perturbing the ext_sim_clk input path, and yet the
problem disappeared, it argues strongly that the glitches are
originating internally, possibly in the mux, where perhaps a change in
loading by adding the new output path makes the glitches go away.

Thanks very much for your feedback!

I don't think this test shows that at all. You can filter the
glitches anywhere downstream of their source. Try adding a fast
buffer to the clock signal. If that makes the problem go away then it
is pretty clear that the slow clock edges are the problem. But that
doesn't mean you have to fix it with a buffer. I build a board with a
general purpose serial input/output. My customer at a major
networking company figured out what I was doing in the FPGA and liked
it so much they are doing the same thing in their larger products
which have dozens of external clocks. He got the details from me and
changed his design so it uses one internal clock instead. He
especially liked the "filtering" the sampling approach provides since
they don't know the nature of the signals their customers will use
this with.

Need any consulting help? Things are pretty slow for me at the
moment. Although this is working well for my kayaking schedule. :^)

Rick

rickman · May 17, 2011

On May 14, 11:17 pm, "Mr.CRC" <crobcBO...@REMOVETHISsbcglobal.net>
wrote:

Andrew Holme wrote:

If it's an external clock, slow rise and fall times plus additive noise can
cause multiple counts if it slews too slowly through the logic threshold.
If the external clock rate is low, the best solution is to sample it at a
rate comparable to the rise/fall time.

Thanks for the reply.

Ugh. I could have sworn a few minutes ago I found a spec of 500ns for
the min slew rate of an input in the hideous ds312.pdf "Spartan-3E FPGA
Family: Data Sheet" but now I can't find it again.

Anyway, getting a clean enough clock to a logic device really isn't that
hard. I'll scope it again Monday and see if I can't put it up somewhere.

One more comment and then I have to run. You will never see the
glitch on a scope, if for no other reason because it doesn't have to
be on the external signal. A small ground shift from internal
switching currents can add a very narrow spike to an input right at
the IO buffer in the FPGA. No noise on the input, spike on the
output. Ground bounce, the invisible noise source.

Rick

Mr.CRC · May 18, 2011

rickman wrote:

Would it make more sense in this type of situation to sample with a 180
deg phase shifted ~clk50, so that the clock enable pulses would be
centered on the clk50 counting edges?

It is hard to give specific advice to such a general question, but I
don't think I have ever had to do anything other than detect the edge
of the clock, no "phase shifting" required. However, if your sampled
clock is close in speed to half the sampling clock rate (i.e. close to
the Nyquist rate) then you might find that you need to also sample the
data to keep the two in sync. But the details here depend on the
details of your interface. A timing diagram is the only way to see
what is going on.

Ok I made a edge detect, clock enable modification to my circuit, and it
worked fine the first try (after an hour of battling with the Xilinx
tools from hell over some silly coding issue).

Ok, so while I have learned something so far about how to sync external
(slow) clocks to an internal fast clock, it is the case here that this
qep_sim() is a unique clock domain, so should be able to be clocked by
its external source directly.

But you can use the internal clock as a means of "filtering" the
spurious edges of the incoming clock.

Yes. I still think the situation can arise where frequency division,
pulse generation, etc. must be done based on a clock or timebase from
another system. And where the resynchronization jitter can't be
tolerated by registering the input "clock" to another clock close to the
chip.

I work in scientific research, and critical timing systems with lasers,
engines, etc. are all over the place. Having to do ms delays with ns
accuracy and ps jitter are commonplace. Hence a significant part of our
equipment budget goes to the likes of Stanford Research Systems,
Berkeley Nucleonics, etc.

So the situation will eventually arise where I will have to deliver a
clean external clock to an FPGA. I underestimated the snappiness needed
here to feed a clock from my rather robust (from an idiot proofing from
the user, surge, and ESD proofing perspective) but sluggish (with
excessive protective impedance in the line) input buffering scheme. The
signal is certainly smooth, so it's not external noise, but at a 30ns
rise/fall, that's certainly enough to allow a little wiggle of ground
bounce to screw things. Another reason why I love input hysteresis. I
always put Schmitt triggers on all my external world digital inputs.

200 ns is an eternity on the input of an FPGA. The internal FFs can
be clocked by glitches on the order of 10's of ps, IIRC. You will
never see this on a scope. The FPGA circuit itself is the best way to
detect double clocking problems.

This fact is now clear.

I tried to find the min rise time spec in the Spartan 3E datasheet last
night, and I could swear I found a place where it said 500ns, and now
for the life of me I can't find it again in that hideously long document.

Even if they give you a number it will have no meaning since it all
depends on the construction of your board. Ground bounce is
inevitable. The only question is "how much"? A slow input is
susceptible to a smaller amount of ground noise than a fast one.

Yes.

Another advantage (or maybe this was pointed out before) of the rising
edge detection on the resync approach is that it removes all such
glitches. They are so fast that at the slower sample rate they can't
be seen.

That is an advantage, and I think I'll use this approach whenever
possible now.

I don't really suspect cross talk although that could be happening.
FPGAs have a lot of really fast stuff going on inside and it all
happens within the same fraction of a nanosecond. That puts a current
spike on the ground and power pins which causes a voltage spike from
the dI/dt. I have seen this a number of times while I haven't seen an
issue with crosstalk causing this in maybe 20 years. I think the PCB
designers tend to watch for crosstalk pairs like a hawk while folks
sometimes forget about the ground noise of fast parts.

Yeah, I am convinced that it is the ground bounce at this point.

The big issue with muxing clocks is the added routing time which
disrupts the timing relationship between the clock and the incoming
data. The tools can help to analyze this, but it is better to not
have to deal with it at all if you can. Muxing a clock enable is
pretty much transparent since at that point it is just logic.

That's what I'm doing now. I made two edge detectors right on the input
timebases (I really don't want to call them clocks now) and then fed the
CEs to a mux, then on to the CE input of the enhanced qep_sim().

In the Lattice parts I use they have clock mux blocks. But these
don't deal with the main problems of muxing clocks, the delay. They
just provide a way to "cleanly" switch between clocks without
generating splinter pulses. If the clock has no data interface then
you have a lot more freedom to run clocks through logic.

In my case, splinter pulses (I call them "seams" or phase
discontinuities) aren't an issue, since there are a few recalibration
revolutions of the QEP every time the mux might be switched before
anything important happens. But this needs to be carefully considered.

Since this was not perturbing the ext_sim_clk input path, and yet the
problem disappeared, it argues strongly that the glitches are
originating internally, possibly in the mux, where perhaps a change in
loading by adding the new output path makes the glitches go away.

I don't think this test shows that at all.

Correct. I had gotten a bit sidetracked by the fact that muxes of the
Y=s&a|~s&b equation can glitch when s switches while a and b are both
true. This can be easily fixed, but isn't related to my problem, which
has s constant.

You can filter the
glitches anywhere downstream of their source. Try adding a fast
buffer to the clock signal. If that makes the problem go away then it
is pretty clear that the slow clock edges are the problem.

And that is what happened. Goes away with faster input.

Need any consulting help? Things are pretty slow for me at the
moment. Although this is working well for my kayaking schedule. :^)

Rick

Please send your CV to my work email:

crcarleREMOVETHISCRAP@ANDTHISTOOsandia.gov

I have used a consultant before (someone who hangs here), but wound up
doing things my way anyway, after learning a few important bits from
him. In general I have the time to figure things out, and I love the
learning process, so it's unlikely we'll need you. There are also some
expert FPGA designers on site, but somewhat partitioned from my
division. But I do like to keep a list of prospective consultants on
file, because the situation occasionally arises where a consultant is
the right solution.

Good day!

--
_____________________
Mr.CRC
crobcBOGUS@REMOVETHISsbcglobal.net
SuSE 10.3 Linux 2.6.22.17

Brian Drummond · May 18, 2011

On Tue, 17 May 2011 06:40:07 -0700, rickman wrote:

Need any consulting help? Things are pretty slow for me at the moment.
Although this is working well for my kayaking schedule. :^)

Rick

Heh, you too, huh?

If only the wind dropped (here on the Isle of Skye), anyway.

At least the three sheets of ply lurking in the garage since 2004 are now
kayak-shaped...

- Brian

Pete Fraser · May 18, 2011

Brian Drummond wrote:

On Tue, 17 May 2011 06:40:07 -0700, rickman wrote:

this is working well for my kayaking schedule. :^)

Heh, you too, huh?

Is there some requirement that FPGA coders
are also kayakers?

Pete Fraser
Looksha IV HV

Counter clocks on both edges sometimes, but not when differe

Mr.CRC

Guest

Joel Williams

Guest

Andrew Holme

Guest

Mr.CRC

Guest

Mr.CRC

Guest

Andrew Holme

Guest

rickman

Guest

KJ

Guest

Joel Williams

Guest

Mr.CRC

Guest

Mr.CRC

Guest

Mr.CRC

Guest

Mr.CRC

Guest

KJ

Guest

Nial Stewart

Guest

rickman

Guest

rickman

Guest

Mr.CRC

Guest

Brian Drummond

Guest

Pete Fraser

Guest

Log in

Welcome to EDABoard.com

Sponsor