Soft failures (?) 9536XL

J

Josep Duran

Guest
I have a small circuit using the 9536XL CPLD. The complete machine uses
64 of such circuits. I have tested it on the lab and everything works just
fine.

The problem is the other day, while at the client premises, I saw something
wrong
with one of the boards. The CPLD stopped responding to the commands sent by
the computer. As I had no test equipment available, I just tried to send
some reset
commands to the board and get no response. I turned power off to change the
board, but just before replacing it I gave it another try. To my surprise,
everything worked fine this time. I did some intensive testing to the board,
and again
everything went OK.

To me, it looks like the CPLD lost its configuration.
Is this at all possible ? If so, what can I do to prevent this from
happening ?
Anybody seen something like this before ?


NB - it is a 2 layer board (no GND plane) about 3 sq inches.


Thank you for your time.

Josep Duran
 
The XC9536XL is a Flash-based CPLD. Different from an SRAM-based FPGA,
you do not reconfigure it just by cycling Vcc. The CPLD would need a
fresh in-system programming operation, which is not automatic nor
happens by accident.
So, what might have happened is that your design got into an illegal
state, out of which it cannot excape, but which did not affect the configuration.

A more far-fetched explanation is based on the fact that a small part of
the Flash-based configuration actually gets transferred into internal
latches (like in an FPGA), which of course might get upset, and that
would be fixed by cycling Vcc.
All CPLD manufacturers use this convenient mechanism, but hardly anybody
talks about it, since it creates the impression of "volatility"...

Peter Alfke
===================================
Josep Duran wrote:
I have a small circuit using the 9536XL CPLD. The complete machine uses
64 of such circuits. I have tested it on the lab and everything works just
fine.

The problem is the other day, while at the client premises, I saw something
wrong
with one of the boards. The CPLD stopped responding to the commands sent by
the computer. As I had no test equipment available, I just tried to send
some reset
commands to the board and get no response. I turned power off to change the
board, but just before replacing it I gave it another try. To my surprise,
everything worked fine this time. I did some intensive testing to the board,
and again
everything went OK.

To me, it looks like the CPLD lost its configuration.
Is this at all possible ? If so, what can I do to prevent this from
happening ?
Anybody seen something like this before ?

NB - it is a 2 layer board (no GND plane) about 3 sq inches.

Thank you for your time.

Josep Duran
 
Peter Alfke wrote:
The XC9536XL is a Flash-based CPLD. Different from an SRAM-based FPGA,
you do not reconfigure it just by cycling Vcc. The CPLD would need a
fresh in-system programming operation, which is not automatic nor
happens by accident.
So, what might have happened is that your design got into an illegal
state, out of which it cannot excape, but which did not affect the configuration.
Correct - Check if you have any state machines, and what they do from
illegal states.

A more far-fetched explanation is based on the fact that a small part of
the Flash-based configuration actually gets transferred into internal
latches (like in an FPGA), which of course might get upset, and that
would be fixed by cycling Vcc.
All CPLD manufacturers use this convenient mechanism, but hardly anybody
talks about it, since it creates the impression of "volatility"...

Peter Alfke
You can sometimes find this hidden in the fine print, in the appx form
of "Vcc must be reduced to < 0.9V before being increased again'.
Systems that are prone to brown-out and non monotonic Vcc are
more risky in this area.

If you have the resource room, you can add read-back or similar
'check-it-actually-happened' to the PLD code, and have your system
watch for any out-to-lunch behaviour.

-jg
 
Did you analyze your design for nominal conditions
or worst case operating conditions?

I have seen boards fail when the got warm after
being closed up (or after a friend sets their engineering
notebook on top of the box and it heated up a little).

This happened to a board that I was interfacing to.
Problem went away after we drilled larger vent
holes in the product. I guess that is what you
get when you get a product that is still in beta
testing.

Cheers,
Jim



Josep Duran wrote:

I have a small circuit using the 9536XL CPLD. The complete machine uses
64 of such circuits. I have tested it on the lab and everything works just
fine.

The problem is the other day, while at the client premises, I saw something
wrong
with one of the boards. The CPLD stopped responding to the commands sent by
the computer. As I had no test equipment available, I just tried to send
some reset
commands to the board and get no response. I turned power off to change the
board, but just before replacing it I gave it another try. To my surprise,
everything worked fine this time. I did some intensive testing to the board,
and again
everything went OK.

To me, it looks like the CPLD lost its configuration.
Is this at all possible ? If so, what can I do to prevent this from
happening ?
Anybody seen something like this before ?


NB - it is a 2 layer board (no GND plane) about 3 sq inches.


Thank you for your time.

Josep Duran
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jim Lewis
Director of Training mailto:Jim@SynthWorks.com
SynthWorks Design Inc. http://www.SynthWorks.com
1-503-590-4787

Expert VHDL Training for Hardware Design and Verification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
Souds like you have asynchronous circuit in your design.
I saw many times your troubles in the past using CPLD andor FPGA. Every
times these troubles became from kinds of asynchronous circuit or
conception. When running asynchronous circuit, temperature can be affect
your design, sometimes or sometimes not.

First, make sure you synchronize all your input signals, before to use
it !!! Make sure your design DONT use comb. latch !

Regards,
Laurent
www.amontec.com



Josep Duran wrote:
I have a small circuit using the 9536XL CPLD. The complete machine uses
64 of such circuits. I have tested it on the lab and everything works just
fine.

The problem is the other day, while at the client premises, I saw something
wrong
with one of the boards. The CPLD stopped responding to the commands sent by
the computer. As I had no test equipment available, I just tried to send
some reset
commands to the board and get no response. I turned power off to change the
board, but just before replacing it I gave it another try. To my surprise,
everything worked fine this time. I did some intensive testing to the board,
and again
everything went OK.

To me, it looks like the CPLD lost its configuration.
Is this at all possible ? If so, what can I do to prevent this from
happening ?
Anybody seen something like this before ?


NB - it is a 2 layer board (no GND plane) about 3 sq inches.


Thank you for your time.

Josep Duran
 
Thank you Peter,

"Peter Alfke" <peter@xilinx.com> escribió en el mensaje
news:400F2439.EC57ADB0@xilinx.com...
The XC9536XL is a Flash-based CPLD. Different from an SRAM-based FPGA,
you do not reconfigure it just by cycling Vcc. The CPLD would need a
fresh in-system programming operation, which is not automatic nor
happens by accident.

Yes. That part I understand.


So, what might have happened is that your design got into an illegal
state, out of which it cannot excape, but which did not affect the
configuration.
This was my first thought, I double checked the state machine, and I don´t
think there is a problem there.


A more far-fetched explanation is based on the fact that a small part of
the Flash-based configuration actually gets transferred into internal
latches (like in an FPGA), which of course might get upset, and that
would be fixed by cycling Vcc.
This is the part I am actually concerned. Could a noisy or poorly decoupled
Vcc be the source of the problems. How far-fetched explanation is this ? Is
it really possible ?

If I read the configuration through the JTAG port, do I get the
internal-actual-RAM configuration, or the Flash configuration ?


Or should I be looking for a bad solder point or other more mechanical
explanation ?


Josep Duran
 
Josep Duran wrote:
Thank you Peter,

"Peter Alfke" <peter@xilinx.com> escribió en el mensaje
news:400F2439.EC57ADB0@xilinx.com...
The XC9536XL is a Flash-based CPLD. Different from an SRAM-based FPGA,
you do not reconfigure it just by cycling Vcc. The CPLD would need a
fresh in-system programming operation, which is not automatic nor
happens by accident.

Yes. That part I understand.

So, what might have happened is that your design got into an illegal
state, out of which it cannot excape, but which did not affect the
configuration.


This was my first thought, I double checked the state machine, and I don´t
think there is a problem there.
Illegal states have to do with combinations of your state FFs that are
not accounted for in your machine. Or if you have more than one machine
and have not accounted for all the state combinations you can get into
trouble. Sometimes two machines interact in a way that they need to be
considered one machine. Make sure you have a bubble in your state
diagram that cooresponds to every possible state encoding, then there
are no "illegal" states. Also account for all combinations of inputs at
every state.


A more far-fetched explanation is based on the fact that a small part of
the Flash-based configuration actually gets transferred into internal
latches (like in an FPGA), which of course might get upset, and that
would be fixed by cycling Vcc.

This is the part I am actually concerned. Could a noisy or poorly decoupled
Vcc be the source of the problems. How far-fetched explanation is this ? Is
it really possible ?
Yes, noise on the Vcc can cause trouble for any design that has volital
storage, state machine or not. Your state FFs can be corrupted by noise
on Vcc.


If I read the configuration through the JTAG port, do I get the
internal-actual-RAM configuration, or the Flash configuration ?

Or should I be looking for a bad solder point or other more mechanical
explanation ?
 
Your problem reminds me of a problem I had a while ago. The FPGA locked up just your CPLD was doing. By digging a bit, I found that ISE had implemented my state machines as one-hot so I thought that somehow the FSM had gone into an illegal state. Forcing the FSM to binary encoding reinforced my belief. <p>To shorten the story, it turned out the FPGA wasn't really going into an illegal state: the problem was that there was poor signal integrity on the clock signal, which occasionally would have a double edge, causing the bit in the one-hot encoding to be lost. <p>You might want to check out the clock after looking at the Vcc.
 
If you have clock glitch problems, and cannot resolve them by proper
attention to board-level signal integrity methods ( which you should! ),
then there is always a band-aid method to make the problem vanish. A few
years ago, I published a way to suppress clock glitches, which has saved
several designs alreay:

http://www.xilinx.com/xcell/xl34/xl34_54.pdf

Peter Alfke, Xilinx Applications
============================
Pascal Chamberland wrote:
Your problem reminds me of a problem I had a while ago. The FPGA
locked up just your CPLD was doing. By digging a bit, I found that ISE
had implemented my state machines as one-hot so I thought that somehow
the FSM had gone into an illegal state. Forcing the FSM to binary
encoding reinforced my belief.

To shorten the story, it turned out the FPGA wasn't really going into
an illegal state: the problem was that there was poor signal integrity
on the clock signal, which occasionally would have a double edge,
causing the bit in the one-hot encoding to be lost.

You might want to check out the clock after looking at the Vcc.
 
Hi,
What I've done in the past is this. Get your noisy clock into the
FPGA, call it CLKA. Feed it through spare unbonded IOBs with the input
delay feature turned on to make a delayed version, CLKB. Make the
delay longer than the glitch time by using as many IOB delays as
necessary. Now make two signals,
SET &lt;= CLKA and CLKB;
RESET &lt;= not(CLKA) and not(CLKB);
Use SET to set a latch and RESET to reset it. The output of the latch
is your debounced clock, which you feed to your circuit. Disgusting
but effective.
Cheers, Syms.

Peter Alfke &lt;peter@xilinx.com&gt; wrote in message news:&lt;40214E57.192B79B2@xilinx.com&gt;...
If you have clock glitch problems, and cannot resolve them by proper
attention to board-level signal integrity methods ( which you should! ),
then there is always a band-aid method to make the problem vanish. A few
years ago, I published a way to suppress clock glitches, which has saved
several designs alreay:

http://www.xilinx.com/xcell/xl34/xl34_54.pdf

Peter Alfke, Xilinx Applications
============================
 

Welcome to EDABoard.com

Sponsor

Back
Top