Electronic News Article on 90 nm soft error FUD

A

Austin Lesea

Guest
Hello from the SEU Desk:

Peter defended us rather well, but how can one seriously question real
data vs. babble and drivel?

Well, after 919 equivalent device years of experiment at sea level,
Albuquerque (~5100 feet), and White Mountain Research Center (12,500
feet) the Rosetta Experiment* on the 3 groups of 100 2V6000s has logged
a grand total of 45 single soft error events, for a grand total of 20.4
years MTBF (or 5335 FITs -- FITs and MTBF are related by a simple
formula -- mean time between failures vs failures per billion hours or
FITs).

It actual tests done by third parties, it takes from 6 to 80 soft errors
(flips) with about 10 flips on average to affect a standard non
redundant design in our FPGA. This is just common sense, as for years
ASIC vendors trashed FPGAs as they "use 30 transistors to do the job of
just one!" Guess what? What was our "downfall" is now a strength!

True. So that means that a 2V6000 at sea level gets a logic disturbing
hit once every 200 years.

535 FITs (soft errors affecting customer design) for a 6 million gate
FPGA.

The biggest part A**** makes is 6 times smaller, so for our 2V1000, we
get about 90 FITs. For a 3S1000, it is 30% better (see blelow), or 63
FITs. OK A****, tell us what your actual as measured FIT rate is for
your largest device? Go ahead, I'd like to know. How many device years
do you have to back it up? 1000 actual years? Nope. Didn't think so.

You know, if you want to use FITs, we'll use FITs. But I am afraid it
will give those spreading nonsense fits (pun intended).

Now if you use triple redundant logic, checksums, ECC codes, you can
design so you NEVER HAVE AN UPSET.

As has been published, Xilinx FPGAs are on the Mars Landers (on their
way there now), so someone is not concerned about upsets. Even periodic
reconfiguring (scrubbing) eliminates a major portion of the probability
of logic affecting upsets. Virtex II, and II Pro have ways to actually
check, detect, and correct the flipped bits using the ICAP feature. For
details, contact your FAE. If 535 FITs is completely unacceptable for
that critical application you have, this makes it 0 FITs from soft
errors.

Some of our customers have now qualified Virtex II Pro as the ONLY
solution to the soft error problem, as ASICs can't solve it (easily like
we have), and other FPGAs do not have the facts to back up their
claims. That is quite new: the Xilinx FPGA is the only safe design
choice to make? Maybe it is right now, as it is the only choice where
all of the variables are measured, understood, and techniqies exist to
reduce the risk to near zero, or whatever level is acceptable.

Oh, and yes, the 90nm technology is now 30% better than the 150 nm
technology (15% better than the 130 nm technology) as proven by our
tests (as presented to the MAPLD conference last month).

So, you can run around blathering on about data taken by grad students
(no offense, I was one at one time), or you can look at our real time
results from three locations on 300 devices being tested 24 by 7, or
talk to us about our beam tests in protons and neutrons, or ways to
design to get the desired level of reliability for your system.

And, you may want to consider going with the vendor who has been
actively working on soft error mitigation for more than five years now.
And has real results to show for it.

Let Moore's Law Rule!

Austin

*Rosetta Stone was the key that unlocked ancient Egyptian wisdom to the
world. The stone had an inscription in three languages, which allowed
archeologists to decipher ancient Egyptian writings. The Rosetta FPGA
Experiment is designed to translate beam testing (proton or neutron)
into actual atmospheric, or high altitude results, without having to
actually build huge arrays of FPGAs and send them to mountain tops
around the world to get real results. It was also designed to answer the
basic questions of altitude effects, position effects, and how smaller
device geometries behave in the real world.
 
Austin Lesea <Austin.Lesea@xilinx.com> writes:
Well, after 919 equivalent device years of experiment at sea level,
Albuquerque (~5100 feet), and White Mountain Research Center (12,500
feet) the Rosetta Experiment* on the 3 groups of 100 2V6000s has logged
a grand total of 45 single soft error events, for a grand total of 20.4
years MTBF (or 5335 FITs -- FITs and MTBF are related by a simple
formula -- mean time between failures vs failures per billion hours or
FITs).
But keep in mind that SEUs are random events, unlike other failure
mechanisms that depend on cumulative damage, so if one device has an
MTBF of 20 years then a system with 20 devices has an MTBF of one year.
Most professionals in the radiation effects field don't use MTBF as
a measure of SEU immunity, they use errors/bit-day or a similar metric.

So that means that a 2V6000 at sea level gets a logic disturbing
hit once every 200 years.
And that if you have 200,000 in the field at sea level then 2 or 3 are
getting a logic disturbing hit EVERY DAY. Or if you have critical mission
that lasts for five years then your chance of getting a logic disturbing
upset is one in forty. OK for a PC running Windows, perhaps, but if you
are building warheads....

You know, if you want to use FITs, we'll use FITs. But I am afraid it
will give those spreading nonsense fits (pun intended).
Again, FITs is not a good metric. These aren't "failure in time", they
are random events. An SEU can happen in the first millisecond of operation
or after 200 years of operation.

Some of our customers have now qualified Virtex II Pro as the ONLY
solution to the soft error problem, as ASICs can't solve it (easily like
we have),
Now that's just misinformation. We've put a number of ASICs in space,
and in worse environments than the surface of Mars. How did Galileo
survive the sulfur ions around Jupiter for ten years without your products?

Can you tell us what the penalty in area and speed would be in going
to TMR? And exactly which of your products have sufficient resistance
to total ionizing dose to be considered for space applications...do your
current state-of-the-art products fit in this category?

And, you may want to consider going with the vendor who has been
actively working on soft error mitigation for more than five years now.
I've been in this business for twenty years, on both the military and
civilian side. I've designed full custom, ASIC and FPGA products for
a variety of space applications.

Methinks the lady doth protest too much...

Joe
--
K. Joseph Hass
Center for Advanced Microelectronics & Biomolecular Research
721 Lochsa St., Suite 8 Post Falls, ID 83854
 
Let me just address the relatively simple subject of FITs vs MTBF.
100 FITs means an MTBF = 10 million years.

But nobody I know would be silly enough to interpret this to mean that
each circuit lives that long and then suddenly dies. We all assume a
statistically even distribution ( with different parameters descibing
infant mortality).
That's why we laughed when Actel (in the original press quote) made
sucha big issue about the difference:

"Actel, currently the only anti-fuse FPGA maker, refuted this
suggestion, pointing out that Xilinx's use of mean time between failures
(MTBF) is the wrong metric to measure error rates: "MTBF is the wrong
statistic, because a neutron event is
random," said Brian Cronquist, senior director of technology at Actel."

I sent him an e-mail suggesting for us to disagree on more relevant
things. No answer. Seems like they don't have a more meaningful
rebuttal.
Enough said.

Obviously the Xilinx large scale "Rosetta" test results have given the
antifuse community fits ( pun intended). They should.

That is not to say that we are perfect, or that we have the only viable
solution. But antifuses have lost their (high-priced, small size)
monopoly.
And fresh blood and competition is always healthy, even in aerospace !

Peter Alfke
 
Joe,

Thanks for giving me the opportunity to reply.

I thought no one cared to comment.

See below.

Austin

Joe Hass wrote:

Austin Lesea <Austin.Lesea@xilinx.com> writes:
Well, after 919 equivalent device years of experiment at sea level,
Albuquerque (~5100 feet), and White Mountain Research Center (12,500
feet) the Rosetta Experiment* on the 3 groups of 100 2V6000s has logged
a grand total of 45 single soft error events, for a grand total of 20.4
years MTBF (or 5335 FITs -- FITs and MTBF are related by a simple
formula -- mean time between failures vs failures per billion hours or
FITs).

But keep in mind that SEUs are random events, unlike other failure
mechanisms that depend on cumulative damage, so if one device has an
MTBF of 20 years then a system with 20 devices has an MTBF of one year.
Most professionals in the radiation effects field don't use MTBF as
a measure of SEU immunity, they use errors/bit-day or a similar metric.
So, the device has 20 million bits. Do the math. I have stated all the
arguments. You like cross section? bit errors/time? Just poke the buttons on
your calculator. It is all statistics (even MTBF or FITs).. Soft errors are no
different from any other failure: they are random!

So that means that a 2V6000 at sea level gets a logic disturbing
hit once every 200 years.

And that if you have 200,000 in the field at sea level then 2 or 3 are
getting a logic disturbing hit EVERY DAY. Or if you have critical mission
that lasts for five years then your chance of getting a logic disturbing
upset is one in forty. OK for a PC running Windows, perhaps, but if you
are building warheads....
Yes! And if I had 200 million of them, I would be getting an error every
millisecond! Oh my! Help! Oh s**t! Give me a break. This is standard 5
o'clock news hype: just make it sound as bad as possible. Fact: each unit
will still fail only once every 200 years. If you are fortunate enough to have
sold a million units, then you should also be smart enough to use the necessary
design techniques to mitigate being put out business by the more dominant
failure rate of the hardware in the system itself. Soft errors are a small
part of the overall system reliability calaculation you must perform. That is
my point here.

You know, if you want to use FITs, we'll use FITs. But I am afraid it
will give those spreading nonsense fits (pun intended).

Again, FITs is not a good metric. These aren't "failure in time", they
are random events. An SEU can happen in the first millisecond of operation
or after 200 years of operation.
Oh yes, and it happened right now! Oh my! Stop it. Give it up. You can only
scare people who are ignorant of real world effects.

Some of our customers have now qualified Virtex II Pro as the ONLY
solution to the soft error problem, as ASICs can't solve it (easily like
we have),

Now that's just misinformation. We've put a number of ASICs in space,
and in worse environments than the surface of Mars. How did Galileo
survive the sulfur ions around Jupiter for ten years without your products?
I don't know? Did it use 90nm technology? Nope.

Can you tell us what the penalty in area and speed would be in going
to TMR?
None. Uses up 3X+ logic though.

And exactly which of your products have sufficient resistance
to total ionizing dose to be considered for space applications...do your
current state-of-the-art products fit in this category?
Yes. We have rad hard FPGAs for total ionizing doses. Look it up in the Q-Pro
line on the web. The devices are immune to SEL, too. ASICs and standard parts
are having problems with SEL now. Didn't you know that? Haven't been reading
your LANSCE test updates, huh?

And, you may want to consider going with the vendor who has been
actively working on soft error mitigation for more than five years now.

I've been in this business for twenty years, on both the military and
civilian side. I've designed full custom, ASIC and FPGA products for
a variety of space applications.
Good, then you should welcome all the work we are doing, and the progress we
are making. And you should recognize4 the FUD that is being spread about by
others who are not only ignorant of what is going on, but have no other intent
than to save their own skins by spreading as much false information as
possible.

Methinks the lady doth protest too much...
All the world's a stage.....

Joe
--
K. Joseph Hass
Center for Advanced Microelectronics & Biomolecular Research
721 Lochsa St., Suite 8 Post Falls, ID 83854
 
Hi Austin,
Out of interest, how many of the 300 parts in your experiment broke
permanently? Any at all? If there were any 'hard' failures, did altitude
affect this statistic, or were these failures due to other mechanisms?
Syms.

"Austin Lesea" <Austin.Lesea@xilinx.com> wrote in message
news:3FA0113E.BDD340C1@xilinx.com...
Hello from the SEU Desk:

Peter defended us rather well, but how can one seriously question real
data vs. babble and drivel?

Well, after 919 equivalent device years of experiment at sea level,
Albuquerque (~5100 feet), and White Mountain Research Center (12,500
feet) the Rosetta Experiment* on the 3 groups of 100 2V6000s has logged
a grand total of 45 single soft error events, for a grand total of 20.4
years MTBF (or 5335 FITs -- FITs and MTBF are related by a simple
formula -- mean time between failures vs failures per billion hours or
FITs).
 
Symon,

None. We have no possibility of Single Event Latch-up (already tested every
famility in the neutron beam). No hard failures whatsoever.

The device FIT rate is probably somewhere around 20 FITs (estimate from high
temperature operating life testsing).

Austin

Symon wrote:

Hi Austin,
Out of interest, how many of the 300 parts in your experiment broke
permanently? Any at all? If there were any 'hard' failures, did altitude
affect this statistic, or were these failures due to other mechanisms?
Syms.

"Austin Lesea" <Austin.Lesea@xilinx.com> wrote in message
news:3FA0113E.BDD340C1@xilinx.com...
Hello from the SEU Desk:

Peter defended us rather well, but how can one seriously question real
data vs. babble and drivel?

Well, after 919 equivalent device years of experiment at sea level,
Albuquerque (~5100 feet), and White Mountain Research Center (12,500
feet) the Rosetta Experiment* on the 3 groups of 100 2V6000s has logged
a grand total of 45 single soft error events, for a grand total of 20.4
years MTBF (or 5335 FITs -- FITs and MTBF are related by a simple
formula -- mean time between failures vs failures per billion hours or
FITs).
 

Welcome to EDABoard.com

Sponsor

Back
Top