EDK : FSL macros defined by Xilinx are wrong

Simon Peacock · Apr 21, 2006

One extra point.. it might be possible to implement C but less likely is
C++.. C++ supports abstract concepts which are difficult to put into
hardware. The trick of using VHDL or Verilog is often thinking "how would I
implement this in logic?" so abstract is out the door immediately. C is
relatively good at low level so consider a FPGA as something akin to a
driver.

One seriously huge advantage a FPGA has over any processor is massive
parallelisation.. that is it can do many things simultaneously. To an
extreme there are bruit force encryption breakers that simply try every
combination of key in parallel!! This is where you gain... If you are
considering the FPGA as a peripheral to a processor than it will most likely
run as fast as the processor can give it data and take back results. But
give an FPGA freedom and it will direct convert a 1.5 Ghz radio and not
break a sweat.

I would have to suggest that you look at Verilog or VHDL... for starters..
they support parallelism ... next they are typed languages and force
casting. Then there aren't any pointers

or gotos
There are simulators for VHDL and Verilog which will allow you to see what
you are doing... Otherwise what you write in C.. which is a sequential
language ... may not be what actually happens... VHDL Processes which
roughly translate to functions (not well) in C will all execute at the same
time. C only handles this by kicking of threads and a C program with 200
threads might not be considered manageable by some.. but that's what VHDL
does! and C's support of semaphores would make a hardware designer cringe.

Simon

"JJ" <johnjakson@yahoo.com> wrote in message
news:1112666920.468559.68270@l41g2000cwc.googlegroups.com...

I'd agree, the PCI will kill you 1st, and any difficult for FPGA but
easy on the PC will kill you again, and finally C++ will not be so fast
as HDL by my estimate maybe 2-5x (my pure prejudice). If you must use C
take a look at HandelC, at least its based on Occam so its provably
able to synthesize into HW coz it ain't really C, just looks like it.

If you absolutely must use IEEE to get particular results forget it,
but I usually find these barriers are artificial, a good amount of
transforms can flip things around entirely.

To be fair an FPGA PCI could wipe out a PC only if the problem is a
natural, say continuously processing a large stream of raw data either
from converters or special interface and then reducing it in some way
to a report level. Perhaps a HD could be specially interfaced to PCI
card to bypass the OS, not sure if that can really help, getting high
end there. Better still if the operators involved are simple but occur
in the hundreds atleast in parallel.

The x86 has atleast a 20x starting clock advantage of 20ops per FPGA
clock for simple inline code. An FPGA solution would really have to be
several times faster to even make it worth considering. A couple of
years ago when PCI was relatively faster and PC & FPGAs relatively
slower, the bottleneck would have been less of a problem.

BUT, I also think that x86 is way overrated atleast when I measure nos.

One thing FPGAs do with relatively no penalty is randomized processing.
The x86 can take a huge hit if the application goes from entirely
inside cache to almost never inside by maybe a factor of 5 but depends
on how close data is temporally and spatially..

Now standing things upside down. Take some arbitrary HW function based
on some simple math that is unnatural to PC, say summing a vector of
13b saturated nos. This uses less HW than the 16b version by about a
quarter, but that sort of thing starts to torture x86 since now each
trivial operator now needs to do a couple of things maybe even perform
a test and bra per point which will hurt bra predictor. Imagine the
test is strictly a random choice, real murder on the predictor and
pipeline.

Taken to its logical extreme, even quite simple projects such as say a
cpu emulator can runs 100s of times slower as a C code than as the
actuall HW even at the FPGA leisurely rate of 1/20th PC clock.

It all depends. One thing to consider though is the system bandwidth in
your problem for moving data into & out of rams or buffers. Even a
modest FPGA can handle a 200 plus reads / writes per clock, where I
suspect most x86 can really only express 1 ld or st to cached location
about every 5 ops. Then the FPGA starts to shine with 200 v 20/4 ratio,

Also when you start in C++, you have already favored the PC since you
likely expressed ints as 32b nos and used FP. If you're using FP when
integer can work, you really stacked the deck but that can often be
undone. When you code in HDL for the data size you actually need you
are favoring the FPGA by the same margin in reverse. Mind you I have
never seen FP math get synthesized, you would have to instantiate a
core for that.

One final option to consider, use an FPGA cpu and take a 20x
performance cut and run the code on that, the hit might not even be 20x
because the SRAM or even DRAM is at your speed rather than 100s slower
than PC. Then look for opportunities to add a special purpose
instruction and see what the impact of 1 kernal op might be. A example
crypto op might easily replace 100 opcodes with just 1 op. Now also
consider you can gang up a few cpus too.

It just depends on what you are doing and whether its mostly IO or
mostly internal crunching.

johnjakson at usa dot com

lecroy7200@chek.com · Apr 21, 2006

Jim Granville wrote:

lecroy7200@chek.com wrote:
snip

The following link is to my post about the reflected energy causing
possible problems:

That's from a Pin-failure viewpoint. - ie energy damage.
They also spec a MAX peak current.
-jg

Yes, I think that's what I had stated.

Peter's original app note on the subject.
http://klabs.org/richcontent/fpga_content/DesignNotes/signal_quality/xapp096.pdf

So far no problems with my testing. If this solves the problem it
would be interesting to know if there was some reason that the 3100A's
internal doublers were prone to failure because of this.

Jonathan Bromley · Apr 21, 2006

On 18 Apr 2005 03:03:25 -0700, ALuPin@web.de (ALuPin) wrote:

I want to sample 16MHz data with a 125MHz clock.

But 125/16 = 7.8125

Is there some tricky method to perform this kind of oversampling ?

Why do you need any tricks? You have at least 7 cycles of your
fast clock in which to sample each data point. The samples will
be either 7 or 8 clocks apart. So, you look at the 16MHz clock
and use it to create a clock-enable in the 125MHz clock domain.

If you don't have access to the original 16MHz clock, you can
either try to recover it by locking a digital PLL on to the
data transitions (7x oversampling is *just* about enough
to be able to do this easily) or you can use dead-reckoning
by using a numerically-controlled oscillator (NCO) to generate
16MHz sampling pulses with a phase jitter. And yes,
an NCO is effectively an implementation of Bresenham.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Symon · Apr 21, 2006

"Jonathan Bromley" <jonathan.bromley@doulos.com> wrote in message
news:ft8761tdcb22fop7o910lkc8l5qlirhueg@4ax.com...

If you don't have access to the original 16MHz clock, you can
either try to recover it by locking a digital PLL on to the
data transitions (7x oversampling is *just* about enough
to be able to do this easily)

It's easy enough with just 4 times oversampling. XAPP224 shows how to do it
at 400Mb+ data rates.
Cheers, Syms.

Jonathan Bromley · Apr 21, 2006

On Mon, 18 Apr 2005 09:42:27 -0700, "Symon" <symon_brewer@hotmail.com>
wrote:

"Jonathan Bromley" <jonathan.bromley@doulos.com> wrote
[...]
7x oversampling is *just* about enough
to be able to do this easily

It's easy enough with just 4 times oversampling. XAPP224 shows how to do it
at 400Mb+ data rates.

OK. I was recalling bad memories of an attempt to do DPLL with only
4x oversampling in a system where the cheap-and-nasty fibre optic
transceivers introduced rather a lot of pulse width distortion,
which made it a whole lot more difficult.

Thanks for the reference.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

lecroy7200@chek.com · Apr 21, 2006

I have come to the conclusion that it is possible that the actual core
design can effect the internal charge pump circuits. After weeks of
testing a core that was auto generated, I have been unable to
reproduce the problem. Setting the outputs to FAST or slow appears to
have no effect on the failure. Talking with Philip, it does not appear
that the device had any capabilities to turn off the charge pumps.

I did go back to the original core and made sure I could reproduce the
failure once more.

I also came across this old note from Xilinx:

"Note that XC3100L and XC5200L use a continuously running
internal oscillator to generate an elevated voltage for driving
the pass-transistor gates , This is called "pumped gates" and
gives better speed, but results in significantly elevated idle
( quiescent ) current consumption, bad for battery-operated
systems. XC3100 devices have always used this technique, while
the original XC5200 devices did not, but the coming releases
will."

It appears some of the newer parts also used internal charge pumps.
Would be interesting to know if they would be prone to the same
problem.

Austin Lesea · Apr 21, 2006

All of

The more recent FPGAs use the Vccaux supply through a regulator to
supply Vgg, or the pass gate voltage supply.

There are no charge pumps in FPGAs now since Virtex (roughly 7.5 years).

Austin

lecroy7200@chek.com · Apr 21, 2006

There are no charge pumps in FPGAs now since Virtex (roughly 7.5
years).

Just doing a quick search I find the Coolrunner is using a charge pump
for the programming voltage. Just search the data sheet for "charge"
and you will find it.

Austin Lesea · Apr 21, 2006

Coolrunner is a CPLD.

Austin

lecroy7200@chek.com wrote:

There are no charge pumps in FPGAs now since Virtex (roughly 7.5

years).

Just doing a quick search I find the Coolrunner is using a charge pump
for the programming voltage. Just search the data sheet for "charge"
and you will find it.

lecroy7200@chek.com · Apr 21, 2006

You are correct!

Now that your posting again, why did you feel the need to change the
subject line?

Austin Lesea · Apr 21, 2006

See the subject line.

And, I have been posting, just not on your last thread, as I had nothing
more to say.

Austin

Jim Granville · Apr 21, 2006

Austin Lesea wrote:

Coolrunner is a CPLD.

Austin

lecroy7200@chek.com wrote:

There are no charge pumps in FPGAs now since Virtex (roughly 7.5

years).

Just doing a quick search I find the Coolrunner is using a charge pump
for the programming voltage. Just search the data sheet for "charge"
and you will find it.

... and one hopes that this charge pump is only on, during programming.
Otherwise it is turned off.
-jg

Austin Lesea · Apr 21, 2006

Jim,

True. Charge pump in the CPLD is only needed for programming voltages
of the EEPROM cells.

Austin

Luc · Apr 21, 2006

What about contacting a local Lattice FAE?

Luc

On 20 Apr 2005 07:39:34 -0700, ALuPin@web.de (ALuPin) wrote:

Hi,

I am trying to use the "DDR_MEM" Lattice template which is responsable
for the datapath for DDR SDRAM controller.

"DDR_MEM" can be found in the MODULE/IP MANAGER under
ARCHITECTURE_MODULES
--> IO --> DDR_MEM

When instantiating that module and compiling my design I can see in
the timing analysis
that the bits on the bidirectional busses DQ and DQS have
different Clock-To-Output times.

In my opinion the busses should be routed into IO register cells
when instantiating that special template.

So there are three possibilities:

1. They are not routed into IO register cells so that the PERFORMANCE
ANALYST does show different tCO
2. They ARE routed into IO register cells, but the PERFORMANCE ANALYST
does not take them into timing calculation
3. I have to make any assignment so that the busses are routed into
IO registers. But I have not found any assignment possibility in
the PREFERENCE EDITOR.

Has someone experienced similar or same problems ?

Thank you in advance.

Rgds
André

lecroy7200@chek.com · Apr 21, 2006

Austin Lesea wrote:

See the subject line.

And, I have been posting, just not on your last thread, as I had
nothing
more to say.

Austin

I did not expect to see any more posts from Xilinx after your last
comments:

"Abuse is not going to make me likely to post further. As of this
moment, the case is closed. We have done what we can with what you are
willing to do (look under the streetlamp)."

Bryan · Apr 21, 2006

That just means if he didn't work for fairchild back then it would have
been Bob Smith that put the first FIFO to IC. That line convinces you
he was the first person to invent a FIFO?

"In 1970, Alfke invented the first FIFO integrated circuit, the
Fairchild
3341"

John_H · Apr 21, 2006

Bryan wrote:

That just means if he didn't work for fairchild back then it would have
been Bob Smith that put the first FIFO to IC. That line convinces you
he was the first person to invent a FIFO?

"In 1970, Alfke invented the first FIFO integrated circuit, the

Fairchild

3341"

Nowhere did I claim that he was the first person to invent a FIFO.
There appear to have been some discrete implementations as discussed a
few years ago on this newsgroup. Fortunately during that discussion,
people regarded accomplishment and years of expertise as worth something.

You appear to claim that "anyone" can design a bulletproof asynchronous
FIFO and that anyone who believes that the discipline is difficult is
trying to pull the wool over other engineers' eyes to make themselves
look "holier than though."

Well, though art thick.

Accomplished.
Published.
Insightful.
Helpful.

What are you?

Peter Alfke · Apr 21, 2006

Brian, you are really getting obnoxious when you write about something
you do not understand. Yes, the FIFO-IC was my original idea, and I
cajoled Fairchild management for a whole year to let me design it,
because I thought it would become a good product, but Marketing had a
hard time understanding. The 3341 became very successful, and later was
second-sourced by AMD.
And there had never been an integrated FIFO before it.

It may be difficult for you to understand, but many products can still
be born out of one person's idea, conviction, and drive. And one person
can make a difference. I hope you also will do some time...
Peter Alfke

Bryan · Apr 21, 2006

The original generalization should have read as follows: Designing an
Async FIFO may prove too hard for some junior engineers. Not a blanket
statement that designing one is not to be entrusted to junior
engineers. That implies that "anyone" who is a junior engineer cannot
design an async FIFO(and bulletproof is always implied). So I would
say he is just as thick.

tom · Apr 21, 2006

Hi,

I think you guys have scared Andre away(how do you do that tick mark on
top of the e?). I havn't seen him in this discussion since he first
started this thread.

I'm currently working on my senior design project and came accross this
thread. I'm not sure what's all big hoopla is all about?
I've actually have an ASYNC FIFO in my design. The design is targeted
for a Spartan eval board. I've decode my EMPTY flag off the read clock
and the FULL flag off the write clock using grey-code counters.
Targeting a Xilinx device with clock buffers will minimize clock skew.
My design seemed to be running fine. It took me about a day to come up
with this design. Are there more to it than what I have done?

If I had ran acrross this thread sooner, I dont think I would have
tried to do the design myself. With all the expert opinions on how
hard all of this is, I would have been scared to attempt it myself.
Maybe my design was not done correctly. I would be happy to send my
schematics to you guys for reviews.

So when I graduate from school, I'm sure my title will have "junior
engineer" in it. I will definitely wait until I get promoted to senior
engineer before I attempt to do my first ASYNC FIFO. Seemed like my
best bet is to go ahead and cut and paste from one of Xilinx already
proven design. Would I have to learn VHDL now too? Does Xilinx have a
schematic version?

EDK : FSL macros defined by Xilinx are wrong

Simon Peacock

Guest

lecroy7200@chek.com

Guest

Jonathan Bromley

Guest

Symon

Guest

Jonathan Bromley

Guest

lecroy7200@chek.com

Guest

Austin Lesea

Guest

lecroy7200@chek.com

Guest

Austin Lesea

Guest

lecroy7200@chek.com

Guest

Austin Lesea

Guest

Jim Granville

Guest

Austin Lesea

Guest

Luc

Guest

lecroy7200@chek.com

Guest

Bryan

Guest

John_H

Guest

Peter Alfke

Guest

Bryan

Guest

tom

Guest

Log in

Welcome to EDABoard.com

Sponsor