EDK : FSL macros defined by Xilinx are wrong

“Design and Implementation of a CFAR Processor for Target Detection”
César Torres-Huitzil, Rene Cumplido-Parra, Santos López-Estrada, 14t
International Conference on Field Programmable Logic, FPL04. Antwerp
, August 2004. Lectures Notes on Cumputer Science Vol. 3203, pp
943-947. ISBN 3540229892

“On the Implementation of an efficient FPGA-based CFAR Processor fo
Target Detection”, Rene Cumplido, César Torres, Santos López. 1s
International Conference on electrical and Electronics Engineering
Acapulco, México. September 2004. IEEE Catalog number: 04EX865C
ISBN: 0-7803-8532-2
 
Hi there

I have the same problem: two presorted arrays have to be merged (bu
here in Verilog). Can anybody provide me code fragments or hints

Thanks a lot

Chri
 
lecroy,

Regardless of what any piece of paper claims, it is the memory of many
here that the only way to recover is by powering down.

As a 15 year old problem, it is one that we only have our (failing)
memories to rely upon.

There was no answer database in those days.

There was no hotline.

Austin

lecroy7200 wrote:
The oscillator itself is at a much higher frequency, and is divided down
to the number listed in the data sheet. At least, we still do it that
way, even today.


This is not what the data sheet states. The 4000 data sheet makes a
distinction that it runs at 8MHz and divides down to the 1MHz where the 3000
is at 1MHz. I am not disagreeing with you. I believe that the 3000 was
changed overtime and the clock was part of these changes and now runs at
around 16MHz. The documents were never updated to reflect this change
because it was "transparrent" to the end user. Of course this is all a
guess on my part.


The accuracy of this oscillator would be from 1/2 to 2X the nominal (it
just isn't critical).


Agree, it just needs to work. Too bad it seems to have problems.


Since this part still had paper schematics (REALLY) it is far too old
for us to go look at its design.


Funny, we can still pull up our paper documents if needed. I agree, its
not fun but sometimes you just have to roll up your sleves and dig in.


Phil is on the right track.

This part did have a brownout issue (if the the voltage dropped just
right, for just the right amount of time, and came back up) that would
place it in a locked state that could not be recovered until the power
was cycled.


Again, I read Xilinx's app. note on the brown out problem and it makes it
clear that the part can be reset without removing power. I don't disagree
that the internal logic could get into a locked state and that there was not
a problem with brown out. I also think it is very possible that the current
devices being sold could have a second problem with the internal oscillator.
There is no mention anywhere about the oscillators failing to start or
locking up in the brown out app. note. I am sure if Xilinx would have known
this, it would have been documented and the power cycle requirements would
have been called out, which they are not.


I solved this problem 15 years ago by using a Dallas Semi Power on Reset
part to reset the power supply if it detected a glitch.


Again, power cycling the device, no matter how it could be done, is not an
option for this system.

It sounds like Xilinx is not willing to dig into the root problem of the
oscillator. I can understand this to some degree. After all the software
has not supported the device in several years. So my next question is if
you are able to tell me if the oscillator design used in the currently sold
3000s is being used in other Xilinx devices?
 
No one changed the frequency of any oscillator.
I will see if I can locate one of the pre 97 devices to verify this.
We may some some in stock in our area. I will let you know what I
find.

The layout has been shrinking so as to be able to be fabricated, that
is
all. Did the oscillator go from 8 MHz to 16 MHz over this period of
time?
Maybe.

But, for you to infer something from a 16 MHz signal is suspect:
does
failure to configure 100% correlate with this signal?
Well, there is certainly nothing that prevents Xilinx from running
their own tests to validate what I am seeing. I certainly can not
force them to do so.
I am 99.9% confident that the 16MHz is from the 3000's internal
oscillator and that this is the fundimental frequency. I see this
signal on every working device and no where else. It is a very loose
frequency, it tracks with the individual device's temperature and it
has the most energy from my sweeps.

So far there is 100% correlation of the failure, but we are only
talking one data point. I can also tell you that once I reset the
power that the 16MHz signal for that device was present (again which it
was not while in this failed state) and the part began to function
normally.

The oscillator locking fits with what I am seeing, not being able to
reprogram the devices.

If you power it down, and back on, can it be reprogrammed?
Again, this depends what you mean. Looking at the power on the device,
I can bring it below what I can detect for over 1mS and turn it back on
and the part will not allow me to reprogram it. Nor will the
oscillator start running. I have to remove power for a much longer time
in order for the oscillator to start and allow me to reprogram. This
appears to be the case with all six failures I have seen, in that they
need the power removed for several seconds inorder to recover.

If not, it
is a bad part. If it can be reprogrammed, then it is a good part (as

far as configuration is concerned).
Agree. Again, out of the six times I have now seen this, the failure
has not appeared to cause any damage to the devices.

If you have a case open with the hotline, what have they said, and
what
are they doing?
I am not so sure this is the place to discuss this. If you want the
persons name I have been in contact with, or the case number feel free
to send me a direct e-mail. During the first contact I was asked if I
was the person posting in this forum, to which I responded yes. I
explained in detail what I knew at the time, including providing exact
part numbers and lot codes for the parts I was testing. I was told by
the person I spoke with that this was outside what the hotline could
handle and that it would be elevated to a higher group and that they
would get back with me. I continued to work on the problem and made
the comment in one of my postings about not yet hearing back from
Xilinx. That same night I received and e-mail from the hotline as
follows:

"It's been a few days since we last communicated, and I wanted to check
in.
Since this device isn't officially supported by the hotline any longer,
I'm
having to do a fair amount of work to find any information on it here.
I'll
keep investigating this, and I'll let you know when I come across
anything
that hasn't been tried already according to the suggestions on
comp.arch.fpga."

Later after reproducing the failure I tried to contact the support
group and left a voice message stating what I had found and asked for
them to return my call. After I did not hear back from them for
several hours I continued my testing. Once I discovered the oscillator
problem I again tried to contact support and left a second voice mail.
Again, there was no return call. On my third attempt I finally spoke
with my original contact and was told that you were the expert at
Xilinx and that your posting about the part being designed on paper was
correct and that there was nothing they could do. Google groups was
down that day and I was not able to read your posting, so they
forwarded me your post. I still have the case number open, and if I
learn anymore about the problem I will try and contact them again.

So, now that we have established you as being the expert at Xilinx the
question becomes if you can help. I am setting up one more test to
determine the state of the program/done pin prior to attempting to
reprogram the device after the failure. I will publish these findings
once I have them. Also, if I am able to locate an older device and
test it, I will publish what I find on the internal oscillator.
 
The oscillator locking fits with what I am seeing, not being able to
reprogram the devices.

I would expect (generally speaking) a Config Ring osc to gate itself
off, after config is completed.
What does a normally operating device show - does this osc appear
to gate in normal usage ?
From what I see with the spectrum analyzer, all of the devices except the
failed part will keep their clocks running at all times.

It may pay to get a closer number on that - < 5 seconds and > 1ms is
quite wide...
Yes I agree, except I am not sure what value this information is. Next time
I reproduce the problem, I will measure it.

You could ask Xilinx explicitly if newer devices have any buried POR
cells, that are not also replicated by a RESET ?

I would expect this type of oops to be eliminated :)
I agree. But I am also a bit surprised with as much work was being done
with the DOD back then that a known problem like this would not have
been documented.

I did receive a third email from the hotline after posting that last message
basically stating that they have been in contact with Austin and Peter. So
at least it sounds like you should have all of information I provided to
them.

I had a direct mail about this subject, so I assume that the person wanted
to
remain anonymous so I will respect that. The person wrote:

" It seems possible that from the outside an 8MHz oscillator would
look like 16. That is, you see both transitions as a pulse.

Note, for example, that the "60Hz" sound that we are used to
hearing from things like transformers is actually 120Hz. "

It is an interesting point. I know nothing of the internal design. We do
not know the
symmetry or if there are possible resonance that occur that could fake out
the measurements.
However, if this were the case I would not expect a 1Mhz signal to have it's
majority
of power at 16MHz. I agree that 8MHz is possible, but I took note that
there was nothing
at 8MHz, or if there was it was buried in the noise floor of the analyzer.


I spent some time trying to locate some older parts to test. I can not find
any
documents that state how the data code was marked, so I am supplying all of
the
markings as they are shown on each device. Note that these parts are in
different
packaging, different sizes, etc. So I am not even sure how valid any of
this data is.
Also, the amplitude is a relative number. I have nothing to gauge it on
than it's being
relative from one device to the next. Also the probe was moved to detect
the peak
reading.

XC3190A
PQ160AKJ9901
A2025068A
Assumed date: 99
Fundimental frequency: 16MHz @ -60dB

XC3164A
PC84CKG9649
A71686A
4C
Assumed date: 96
Fundimental frequency: 20MHz @ -40dB

XC3164A-5
PC84C
X24961M
AIG9406
Assumed date: 94
Fundimental frequency: 20MHz @ -40dB

XC3120-5
PC84C
XG2936M
AJG9537
Assumed date: 95
Fundimental frequency: 20MHz @ -50dB *
* It was very difficult to lay the probe flat onto this device with the PCB
it was located
on. Suspect that the reading would have been much higher.


From this I would agree that the basic frequency was the same from at least
as far back as
94. It is interesting that there seems to have been a shift and that the
amplitude changed so
much, but I don't know if this is any kind of an indicator. After all it is
a different package.
 
Several more tests were conducted using the same test configuration.
During this test I monitored the state of the done/program pins of all
of the devices prior to the failure. The test would read and store the
D/P pins status, attempt to program the devices, if failed to program
all eight after five attempts then report the original status of the
D/P pins. Then report the status of the D/P pins after an attempt was
made to program the devices.

I wanted to also collect enough data in an attempt to determine if the
failure of the internal oscillator could be duplicated.

I was able to replicate the failure three more times and it would
appear that when the device fails, the initial state of the D/P pin is
high. After an attempt was made to program the devices, the D/P pin
latched in the low state.
It also appears that with every failure that something happens with the
16MHz oscillator in that I no longer see anything in that area. What
is interesting is that if the oscillator was dead, I would not expect
the D/P pin to latch low. Maybe it is not a sampled input but is
trully edge triggered.

Also note that once the power was cycled, that in all three cases the
oscillator returned to normal and the devices were able to be
programmed.
 
I was able to replicate the failure three more times and it would
appear that when the device fails, the initial state of the D/P pin
is
high. After an attempt was made to program the devices, the D/P pin
latched in the low state.
I need to retract the above statement. As it turns out, the software
that was being used to monitor the status of these pins inverted them
prior to displaying it. So, the devices appear to go into the program
state.


In this last test I wanted to try and decouple the CORE that was being
loaded into the device. For this test, all that was done was to cycle
the supply. I used a 5mS off time and cycled at 100Hz. Using the
spectrum analyzer I monitored the 16MHz clock. After about 10 minutes
of testing, the oscillator had failed to start. I probed the remaining
devices and found that three others also had failed to start. I then
started to increase the off time using the one-shot mode. I noted that
at about 200mS - 250mS two of the devices oscillators restarted. The
third device took more than a second of off time before starting.

During the about tests, I noted that the D/P pin was low for all
devices during the test, reguardless of the state of the oscillator.
Also, during the tests, no attempt was made to reprogram the devices.
Only the spectrum analyzer with the near field probe was used to
determine if the part had failed.
 
lecroy7200@chek.com wrote:
<Snip>
In this last test I wanted to try and decouple the CORE that was being
loaded into the device. For this test, all that was done was to cycle
the supply. I used a 5mS off time and cycled at 100Hz. Using the
spectrum analyzer I monitored the 16MHz clock. After about 10 minutes
of testing, the oscillator had failed to start. I probed the remaining
devices and found that three others also had failed to start.
This is multiple devices on one board, or multiple boards being cycled ?

I then
started to increase the off time using the one-shot mode. I noted that
at about 200mS - 250mS two of the devices oscillators restarted. The
third device took more than a second of off time before starting.
Sounds like you now have a reasonably rapid means of entering the
suspect state, and some numbers on Trec. ( which probably also varies
with temperature... )

Is 5ms enough time to exit pgm load mode, or is this test removing
Vcc before the Load state engine has finished ?

This does sound like a 'sticky trigger' test, in that any of the
~60,000 power cycles that causes an upset, will not clear on the next
cycle, as that Toff is < Trec.

-jg
 
Not that I do not appreciate everyones help in this matter, but I have
received several PMs included from Xilinx tech support asking if I have
tried the following:

- Bring the DONE/PROGB pin low
- Hold RESETB low fot at least 6 us
- Start the re-configuration

I am not sure if some people are not able to read the entire thread and
that is the cause. The following are from my first and fourth posts:

"Pulling the XC3000's reset low for 10us has no effect. ... The
only way to reprogram the part is to power down the IC. "

"The note to your link suggests that setting Reset high for > 6us then
setting it and the Prog/Done pin low for > 6us will bring the device
back to the clear configuration state. Looking at the loader code,
this is pretty much what is being done on every load. The Reset
normally idles high and it along with the Program pin are pulled low
for 7.5us. I verified this as well. Doing this does not make the
device exit this strange mode. So far, the only thing that seems to
clear it from this state is a hard power down."
 
In this last test I wanted to try and decouple the CORE that was
being
loaded into the device. For this test, all that was done was to
cycle
the supply. I used a 5mS off time and cycled at 100Hz. Using the
spectrum analyzer I monitored the 16MHz clock. After about 10
minutes
of testing, the oscillator had failed to start. I probed the
remaining
devices and found that three others also had failed to start.

This is multiple devices on one board, or multiple boards being
cycled ?

I once again may need to retract this. I have not been able to
reproduce the power cycle test results. I am beginning to wonder if
there was something flawed in my first attempt.

I have been testing multiple boards with multiple devices per board.

I then
started to increase the off time using the one-shot mode. I noted
that
at about 200mS - 250mS two of the devices oscillators restarted.
The
third device took more than a second of off time before starting.

Sounds like you now have a reasonably rapid means of entering the
suspect state, and some numbers on Trec. ( which probably also varies
with temperature... )
Again, temperature does not appear to be a factor. I have done
numerious temperature tests and have never seen any corrilation. I am
seeing a failure in four days on average. The rapid failure appears to
have been a fluke of nature. Just one more random data point.

Is 5ms enough time to exit pgm load mode, or is this test removing
Vcc before the Load state engine has finished ?
Again, this is no loading. Just looking at the internal oscillator and
watching how long power must be removed before it recovers. Nothing to
do with reprogramming the device.

This does sound like a 'sticky trigger' test, in that any of the
~60,000 power cycles that causes an upset, will not clear on the next
cycle, as that Toff is < Trec.

From what I see, it all is pointing to a problem with the internal
oscillator.
It would be great if there were a way to probe it to verify what I am
seeing with the analyzer.
 
?

I thought we were going to take this offline, but since you are still
posting here (fine with me, by the way):

Yes. We found the schematic. We found the hand written note in the margin.

Basically what Rob sent you from the hotline.

If that doesn't work, then I am afraid we are at the end of our
resources to provide help.

Changes were later made to the XC4000 so that it did not have this issue.

It is caused by a power supply glitch (and made worse if you use the
power down mode as well). Remove the glitch, and the problem goes away.
Perhaps you just need to add a 1,000 uF capacitor to the power suppy?
(or remove one, to prevent the glitch)

Time spent on the KNOWN CAUSE (the glitch) would be beneficial (in my
opinion). You are unlikely (in fact: never going) to fix the chip. The
issue was addressed in later families, and never in the XC3000.

If anyone else out there can help, please do.

Austin

(and the rest of us back here at Xilinx that actually remember the XC3000)
 
On 29 Mar 2005 05:33:50 -0800, "lecroy7200@chek.com" <lecroy7200@chek.com> wrote:
Not that I do not appreciate everyones help in this matter, but I have
received several PMs included from Xilinx tech support asking if I have
tried the following:

- Bring the DONE/PROGB pin low
- Hold RESETB low fot at least 6 us
- Start the re-configuration

I am not sure if some people are not able to read the entire thread and
that is the cause.
Well, I have read all of your posts, and everyone elses too. The problem
is one of clarity of communications.

The following are from my first and fourth posts:

"Pulling the XC3000's reset low for 10us has no effect. ... The
only way to reprogram the part is to power down the IC. "
Ok, this seems pretty clear,

But in another article you write "I would drop the old 3000A" and in
another article you write:

******
XC3190A
PQ160AKJ9901
A2025068A
Assumed date: 99
Fundimental frequency: 16MHz @ -60dB
******

Xilinx produces an XC3000 family, and XC3000A family, an XC3100 family
and an XC3100A family (and many others too). My point about clarity is
that your original article says XC3000, another article says XC3000A,
and finally with actual partnumbers it turns out XC3100A.

Are all the devices on all the boards XC3100A? It matters, as the various
familys had slightly different config logic.


"The note to your link suggests that setting Reset high for > 6us then
setting it and the Prog/Done pin low for > 6us will bring the device
back to the clear configuration state. Looking at the loader code,
this is pretty much what is being done on every load.
"pretty much" is not clear.

You are dealing with a tough problem. It is rare, difficult to reproduce,
and in an area (configuration) in which almost every designer has at least
at one time had problems, some times intermittent, sometimes easy to
repeat. The experience has been that except in extremely rare situations
the problem has been traced back to something outside the FPGA.

I understand your frustration, you've been at this for over 2 weeks, and
no magic bullet yet.

The Reset
normally idles high and it along with the Program pin are pulled low
for 7.5us.
See, this is surprising. This is not the way configuration is supposed
to be started.

In normal configuration, the reset is high, Init and Done/Prog both have
pullup resistors.

The software in your configuration processor should test that INIT is high
indicating that housecleaning is complete, then it should test D/P, it
should be high too. To start the program process, you pull D/P low, and
wait till INIT goes high, indicating it is ready for configuration data.
The clock and data should start greater than 10us after INIT goes high.
Starting sooner than this can cause the header to not be read correctly.


In the fault mode you have described, the D/P is permanently low.
For this situation, assuming that the device is in slave serial mode,
I believe you would supply a clock (at 1MHz or slower to CCLK), and
try taking INIT high for > 10uS, then low for 10 uS, then stop driving it
and let the pullup resistor try to pull it high. I would expect for
INIT to stay low for the house cleaning, and then eventually, the FPGA
would stop asserting it low, and the pullup resistor would then pull it
high. At this point, stop driving the CCLK signal. It would now be ready
for configuration. The D/P signal (which you should not be driving) should
also go high because of the pullup resistor. If you get this far, then
things are back to normal, and a low going pulse on D/P should start the
config process as described in the previous paragraph.

I know you think you have done the above, and the problem is the internal
oscillator, but I am unconvinced. I would suggest the following laborious
process.

Describe in excruciating detail the signal sequence and timing you observe
on ALL the following signals, including timing relationships, and whether
the signal has a pullup resistor or not, and when the processor is driving
it or not. These are the signals:

DONE/PROG
CCLK
DIN
DOUT
INIT
RESET
PWRDWN
LDC
HDC
M0
M1
M2
RDY/BSY

Somewhere in all of this there is an answer.



Still trying to help
Philip Freidin



===================
Philip Freidin
philip.freidin@fpga-faq.org
Host for WWW.FPGA-FAQ.ORG
 
Proper internal oscillator startup would normally be guaranteed
by the monotonic VCC rise requirements for the part in question;
oscillator failure would be consistent the earlier speculation of
a hypothetical transient of some sort taking out the FPGA.

BTW, on a failed part, have you observed DOUT for activity under
the test conditions described in Philip's earlier posts?

Also, what value pullup/pulldown resistors are you using for the
mode and powerdown pins? I have another vague recollection that
that the internal pullups were "stiffer" in later 3xxx series parts,
and needed lower values for the external resistors.

From what I see, it all is pointing to a problem with the internal
oscillator. It would be great if there were a way to probe it to
verify what I am seeing with the analyzer.

At the risk of sounding repetitive, the method you seek is
called "master serial mode", which lets you directly observe
CCLK ( or a divided down version thereof ).

Yes, this requires changing another variable in your test setup,
which might affect your chances of observing something.

However, it provides the benefit that you would now have a
signal that can be directly probed, and used to catch whatever
transient event is perturbing the FPGA: e.g., trigger a deep
memory scope on "loss of CCLK" while probing any likely suspects
(VCC, configuration pins, VEE, translator output pins, etc.) at
a high sample rate with plenty of pretrigger storage.

Brian
 
I thought we were going to take this offline, but since you are still

posting here (fine with me, by the way):


From my previous post to Peter:
"Seeing that you have decided to continue to post to the public thread
rather than contact me directly, I will assume that this is how you
wish to handle this issue. "

You had my direct contact information. I expected that you and Peter
would have used it rather than continue to post.


Yes. We found the schematic. We found the hand written note in the
margin.

Basically what Rob sent you from the hotline.
I believe this is a different problem than what was originally noted.
I only state this as it seems that there was never any mention of a
non-recoverable state like I am seeing and there is never any mention
of the internal oscillator failing. Maybe this was the orignal
problem.

If that doesn't work, then I am afraid we are at the end of our
resources to provide help.
Your call. My guess is had the device been used in the some of the DOD
designs, that help would be coming out of the woodwork.

Changes were later made to the XC4000 so that it did not have this
issue.

It is caused by a power supply glitch (and made worse if you use the
power down mode as well). Remove the glitch, and the problem goes
away.
Perhaps you just need to add a 1,000 uF capacitor to the power
suppy?
(or remove one, to prevent the glitch)
Again, the problem I am seeing could be very well be caused by a
transient of some kind. That is why I am running so many different
transients to try and reproduce the problem. If I am unable to find a
way to reproduce the problem, it will be near impossible to know if it
can be fixed or if any changes I make have an impact on the problem.
It's nice to be able to throw out a recommendation of a 1000uF bulk
cap. but without proof that it did anything to improve or hurt the
design, there is little value. That is why testing at this stage is so
important.

Time spent on the KNOWN CAUSE (the glitch) would be beneficial (in my

opinion). You are unlikely (in fact: never going) to fix the chip.
The
issue was addressed in later families, and never in the XC3000.
I agree that fixing the device is not an option. I never expected
this. Again, to make it very clear, I need to make sure that we do not
run into this with whatever device we replace the 3000 with. I had
hoped that Xilinx would have been more proactive in helping to identify
the problem. If it is an oscillator design issue that you would be
able to tell me that the problem was found and that corrections were
made to newer devices to prevent it.

It would seem that getting anything from Xilinx is impossible. So the
next step will be to qualify a new device based on the tests I am
currently running.

On the upside, it seems that the D/P pin going low is a side effect of
the problem. So at least I think we can limit our customers exposure
to the problem.
 
lecroy,

We have been in contact with you directly (through Rob).

I am cc'd on all of the emails, and since I escalated this to the fire
department, I was responsible for all communications.

I am sorry you are frustrated.

We found the shcematics.

We (and you) know this is caused by a glitch, yet you will do nothing to
change the setup, so nothing changes!

A famous line by the owner and CEO of California Microwave - Dave Leason
- is as follows: (said to a technician staring at a broken pcb)-

"Well, what have you tried?"

"I don't know what is wrong, so I don't know what to do."

"If you do nothing, nothing will be the result."

Basically, by refusing to add a capacitor to the supply (or in your best
judgement do anything to the supply that would modify its behavior) you
are in exactly the same state as the technician: doing nothing will
result in no change.

Sometimes you have to do something to get something. In fact, I would
state that stronger: you must do something to get any information at all.

Playing with a spectrum analyzer is like looking for your keys under the
streetlamp: because to look anywhere else is tough (it is dark there!?).

To imply that your application is not important enough to warrant a
response from Xilinx is an insult to the good folks on the hotline, and
to me personnaly.

I am now taking time out of my day to reply to you (again). I could be
working with the NSA, JPL, NASA, or the US AF, or any one of the
government folks that I am responsible for working with on the many
government programs that we work on everyday.

But, no, I am working on tyring to help you.

Abuse is not going to make me likely to post further. As of this
moment, the case is closed. We have done what we can with what you are
willing to do (look under the streetlamp). I hope you take the other
advice here on the newsgroup, and do some of the things they suggest, if
you do not like the suggestions we have provided.

Sorry that you are upset, we are upset as well now.

Austin
 
<lecroy7200@chek.com> wrote in message
news:1112113812.367672.94020@z14g2000cwz.googlegroups.com...

<snip>

I agree that fixing the device is not an option. I never expected
this. Again, to make it very clear, I need to make sure that we do not
run into this with whatever device we replace the 3000 with. I had
hoped that Xilinx would have been more proactive in helping to identify
the problem. If it is an oscillator design issue that you would be
able to tell me that the problem was found and that corrections were
made to newer devices to prevent it.

It would seem that getting anything from Xilinx is impossible. So the
next step will be to qualify a new device based on the tests I am
currently running.

On the upside, it seems that the D/P pin going low is a side effect of
the problem. So at least I think we can limit our customers exposure
to the problem.
Personally I would consider it unreasonable to expect Ford Motor Company to
figure out why a '73 Pinto station wagon is experiencing occasional
vapor-lock AND base my decision whether to buy a 2005 Thunderbird on what
their level of support was... whether they fixed the problem or not.

I applaud your efforts to exhaustively address a problem you're experiencing
with ancient parts. Those parts aren't old, they're ancient in the progress
of FPGAs.

Be happy for the support you HAVE received - the Xilinx and non-Xilinx folks
that continue to add their insights are good people. Don't look to hold the
FPGA manufacturer accountable when they HAVE addressed the issue you're
encountering but it was put to bed a decade ago.

Often the true cause of something can't be determined without excessive
investment of time, money, or newsgroup postings. I wish you luck in
finding your happy place with respect to the error you encountered.

Respectfully,
- John_H
 
Austin Lesea wrote:
lecroy,

We have been in contact with you directly (through Rob).

From your following comment and your original comment about going
off-line, I assumed you would be in contact.
"So, your support for this issue is now Peter Alfke and Austin Lesea."

What I did get from Rob was the following:
" Have you been in contact with Austin or Peter on this issue yet,
aside from
the postings on comp.arch.fpga? If so, can you please CC me on those
e-mails
to keep me in the loop on this case? "
Again, leading me to think you would be in touch.



We found the shcematics.

We (and you) know this is caused by a glitch, yet you will do nothing
to
change the setup, so nothing changes!
It's great that your putting words in my mouth. I am not sure of the
cause of the problem. Sure it could be what you refer to as a
"glitch". I really do not know, nor can I seem to find any correlation
what any tests I have run.

A famous line by the owner and CEO of California Microwave - Dave
Leason
- is as follows: (said to a technician staring at a broken pcb)-

"Well, what have you tried?"

"I don't know what is wrong, so I don't know what to do."

"If you do nothing, nothing will be the result."
Nice. I am sorry you feel this way about my efforts.

Basically, by refusing to add a capacitor to the supply (or in your
best
judgement do anything to the supply that would modify its behavior)
you
are in exactly the same state as the technician: doing nothing will
result in no change.
I have taken the opposite direction of trying to cause the failure, and
from this you feel I am doing nothing.

Sometimes you have to do something to get something. In fact, I
would
state that stronger: you must do something to get any information at
all.

Playing with a spectrum analyzer is like looking for your keys under
the
streetlamp: because to look anywhere else is tough (it is dark
there!?).

It's just another tool to me that provides another way to look at the
problem.

To imply that your application is not important enough to warrant a
response from Xilinx is an insult to the good folks on the hotline,
and
to me personnaly.
That's fine, but it's the truth.

I am now taking time out of my day to reply to you (again). I could
be
working with the NSA, JPL, NASA, or the US AF, or any one of the
government folks that I am responsible for working with on the many
government programs that we work on everyday.

But, no, I am working on tyring to help you.
I am sorry you felt that all your hard work on this problem has taken
away from your other customers.

Abuse is not going to make me likely to post further. As of this
moment, the case is closed. We have done what we can with what you
are
willing to do (look under the streetlamp). I hope you take the other

advice here on the newsgroup, and do some of the things they suggest,
if
you do not like the suggestions we have provided.
Abuse, LOL!!! I needed that bit of humor.

Sorry that you are upset, we are upset as well now.
Sorry you are upset. I am just trying to find the root problem.
 
Well, I have read all of your posts, and everyone elses too. The
problem
is one of clarity of communications.
Good that you know that everyone read them all. I for sure could not
make that statement.

Xilinx produces an XC3000 family, and XC3000A family, an XC3100
family
and an XC3100A family (and many others too). My point about clarity
is
that your original article says XC3000, another article says XC3000A,
and finally with actual partnumbers it turns out XC3100A.
Very good point!! The part in question is an XC3190A. But, I have
also tried some tests with the non A devices as well. When I first
opened the call with the hotline I provided them with all of the
details but did not even think about it in my original posts.

Are all the devices on all the boards XC3100A? It matters, as the
various
familys had slightly different config logic.
Yes, all of the parts are the same on this board. All the XC3190A.

"The note to your link suggests that setting Reset high for > 6us
then
setting it and the Prog/Done pin low for > 6us will bring the device
back to the clear configuration state. Looking at the loader code,
this is pretty much what is being done on every load.

"pretty much" is not clear.
I measured 7.5uS. Also note where I ran some tests at 10uS. All
greater than minimum. While I don't think I posted it, I even tried a
test where I held reset for well over a second.

You are dealing with a tough problem. It is rare, difficult to
reproduce,
and in an area (configuration) in which almost every designer has at
least
at one time had problems, some times intermittent, sometimes easy to
repeat. The experience has been that except in extremely rare
situations
the problem has been traced back to something outside the FPGA.
Good enough.


I understand your frustration, you've been at this for over 2 weeks,
and
no magic bullet yet.
LOL. Not looking for any magic.

In the fault mode you have described, the D/P is permanently low.
For this situation, assuming that the device is in slave serial mode,
Which they all are.
 
Philip,

The performance enhancement in the XC3100/3100A family is achieved by
the
use of on chip charge pumps. ... These charge pumps use free running
oscillators that are separate from the config oscillator, and are
almost
certainly the 16 MHz that you are seeing.
Thanks for all your insight!! I was a bit surprized that the "smartest
and most helpful engineers" at Xilinx did not pick up on the charge
pump right away.

As per our off-line talks, I have gone ahead and rebuilt the design
using slew limited outputs for the two pins in question. I have begun
running my transient tests but it will be a few weeks before I am
convinced this was the problem.

The following link is to my post about the reflected energy causing
possible problems:

http://groups-beta.google.com/group/comp.arch.fpga/browse_frm/thread/1423e577bf37d509/1f921b2ef9ae4542?q=reflected&rnum=3#1f921b2ef9ae4542

The following was taken from a Xilinx app. note.

"For all FPGA families, ringing signals are not a cause for reliability
concerns. To cause such a problem, the Absolution Maximum DC conditions
need to be violated for a considerable amount of time (seconds). "

I am including parts of our off-line talks that may be a benifit to
others reading this thread.



So, I drug out the trusty scope to start probing around. Made a nice
ground plane around the device for a reference. Ground plane is
attached to devices ground in multiple places. The scope is a LeCroy
7300. 3GHz BW with a sample rate of 20GS/S. Using a 3.5GHz active
probe with a loop of about 0.5". All measurements are taken at the
FPGA's pins. Using no filtering, etc. If there is a glitch, I will
find it.

The outputs are not terminated (except by the next device) and there
is some undershoot from the reflection. This undershoot can be more
than 0.5 Volts below the rail. On their newer parts I had seen where
they started to specifiy the SWR of the next stage, but I was not able
to obtain this document. You may recall me posting this lenghtly
post last summer. I have never seen a problem where, say all of the
energy was reflected back to the device's output and have it cause a
problem. Maybe the 3100A was prone to having problems with this.
Any idea?
Well anything that goes more than .5V below ground would concern me,
even
very short duration. I don't think the 3100A was particularly prone to
this.

While normally you worry about undershoot and overshoot at a receiver,
in the case of FPGAs, all pins are both. So even if you are usin a pin
only as an output, it still has an input structure including the
protection diodes. The undershoot can cause the diode to conduct, and
this can in turn upset the local ground reference inside the FPGA. This
may be your fault mode. Note that this type of thing can have data
pattern sensitivity. I.E. a bunch of outputs all switching low at the
same time, maybe on pins that are further away from the ground pins
rather than nearer, with reflections arriving at about the same time,
etc.

Two suggestions: can you force the data outputs to bang between paterns
that are predominantly all '1's and then '0's? Other idea, set up a low
impedance pulse generator to generate say a 1 uS pulse of -1V, and
apply it to some pins (1 at a time) and see if this induces the
problem.
 
I'd agree, the PCI will kill you 1st, and any difficult for FPGA but
easy on the PC will kill you again, and finally C++ will not be so fast
as HDL by my estimate maybe 2-5x (my pure prejudice). If you must use C
take a look at HandelC, at least its based on Occam so its provably
able to synthesize into HW coz it ain't really C, just looks like it.

If you absolutely must use IEEE to get particular results forget it,
but I usually find these barriers are artificial, a good amount of
transforms can flip things around entirely.

To be fair an FPGA PCI could wipe out a PC only if the problem is a
natural, say continuously processing a large stream of raw data either
from converters or special interface and then reducing it in some way
to a report level. Perhaps a HD could be specially interfaced to PCI
card to bypass the OS, not sure if that can really help, getting high
end there. Better still if the operators involved are simple but occur
in the hundreds atleast in parallel.

The x86 has atleast a 20x starting clock advantage of 20ops per FPGA
clock for simple inline code. An FPGA solution would really have to be
several times faster to even make it worth considering. A couple of
years ago when PCI was relatively faster and PC & FPGAs relatively
slower, the bottleneck would have been less of a problem.

BUT, I also think that x86 is way overrated atleast when I measure nos.

One thing FPGAs do with relatively no penalty is randomized processing.
The x86 can take a huge hit if the application goes from entirely
inside cache to almost never inside by maybe a factor of 5 but depends
on how close data is temporally and spatially..

Now standing things upside down. Take some arbitrary HW function based
on some simple math that is unnatural to PC, say summing a vector of
13b saturated nos. This uses less HW than the 16b version by about a
quarter, but that sort of thing starts to torture x86 since now each
trivial operator now needs to do a couple of things maybe even perform
a test and bra per point which will hurt bra predictor. Imagine the
test is strictly a random choice, real murder on the predictor and
pipeline.

Taken to its logical extreme, even quite simple projects such as say a
cpu emulator can runs 100s of times slower as a C code than as the
actuall HW even at the FPGA leisurely rate of 1/20th PC clock.

It all depends. One thing to consider though is the system bandwidth in
your problem for moving data into & out of rams or buffers. Even a
modest FPGA can handle a 200 plus reads / writes per clock, where I
suspect most x86 can really only express 1 ld or st to cached location
about every 5 ops. Then the FPGA starts to shine with 200 v 20/4 ratio,

Also when you start in C++, you have already favored the PC since you
likely expressed ints as 32b nos and used FP. If you're using FP when
integer can work, you really stacked the deck but that can often be
undone. When you code in HDL for the data size you actually need you
are favoring the FPGA by the same margin in reverse. Mind you I have
never seen FP math get synthesized, you would have to instantiate a
core for that.

One final option to consider, use an FPGA cpu and take a 20x
performance cut and run the code on that, the hit might not even be 20x
because the SRAM or even DRAM is at your speed rather than 100s slower
than PC. Then look for opportunities to add a special purpose
instruction and see what the impact of 1 kernal op might be. A example
crypto op might easily replace 100 opcodes with just 1 op. Now also
consider you can gang up a few cpus too.

It just depends on what you are doing and whether its mostly IO or
mostly internal crunching.

johnjakson at usa dot com
 

Welcome to EDABoard.com

Sponsor

Back
Top