EDK : FSL macros defined by Xilinx are wrong

"Mike Harrison" <mike@whitewing.co.uk> schrieb im Newsbeitrag
news:d49901h2b5o38knfjla4orvjk9tjiofkun@4ax.com...
On Fri, 4 Feb 2005 19:16:23 +0000 (UTC), Alex <uksb@greenbank.org> wrote:

[ Apologies if this is a frequently asked question but I've
been googling for hours and haven't found much info. ]

I'm looking to buy a Spartan-3 Starter Kit:-
http://www.xilinx.com/products/spartan3/s3boards.htm#
in the UK.

The only supplier in the UK with a website is nuhorizons.com
and I've seen some warnings about using them.

I'll give Xilinx UK a call tomorrow (or Monday) but I'm
wondering if anyone has any suggestions for suppliers I can
check with.

Or how quick/expensive is it to order from the US and deliver
to the UK. Which suppliers would you recommend?

Ta,

-Alex

I ordered one recently off the xilinx website - only took a few days to
arrive
ISTR recently (probably here) seeing a well-hidden link on the xilinx site
offering the S3 and CPLD
devkits together for $99 (i.e. same as the S3 kit alone)
I think it was for visitors of Electronica 2004, the link was open, but
hidden.. well in any case that offer isnt valid any more. But maybe there is
some other similar offer who knows

Antti
 
The 'Altera Max+plus II advanced synthesis' is svailable at:
https://www.altera.com/support/software/download/altera_design/mp2_adv_syn/dnl-mp2_adv_syn.jsp

It will be useful if you are using VHDL or Verilog files in your design.


Hope this helps,
Subroto Datta
Altera Corp.



"SeungHeun, Lee" <kis2kima@hitel.net> wrote in message
news:ctvcs6$t6n$1@ccsun2.sogang.ac.kr...
I used FLEX8000 device and developed with MAX2+. For poor performance,
altera produced advanced synthesis tool as I know.
Synthesis tool name is 'Altera Max+plus II advanced synthesis'. It is
download free, but you may feel hard to find it from altera web site. :)

Regards,
S.H, Lee

"Vincent Perron" <vincent.perron@usherbrooke.ca> wrote in message
news:8290822c.0502021107.6689e006@posting.google.com...
Here's a question I know has already been asked but I was not
satisfied with the answer.

How could I get Quartus II to support the FLEX 8000 devices?

I've already got a couple of FLEX 8000 chips and a complete version of
Quartus II 4.1. All the FLEX family is supported (6000, 10K, 10KA and
10KE) except for the 8000.

Is there a way to add the FLEX 8000 to this list? I would really
prefer working with Quartus II than Max Plus II.

thx,
Vincent
 
Yeah, the 39% seems cooked to me, especially with no way to check it
for the interested public.
Where is that Altera guy hiding ?
I have posted all I need to say on the subject. Clearly, I believe the +39%
is real. We have invested years of engineering time to gather benchmark
designs, fairly convert them between architectures, and figured out how to
get the best out of both tools, and to produce comparisons. We have
disclosed how we run our tests, and the results we achieved. But short of
releasing the actual designs, which we cannot do, we will never be able to
convince those people (such as you) who believe we are cooking the numbers
when we are not. I can't blame you for being in disbelief -- trust me, we
double- and triple-checked our results because we were so surprised that
Virtex-4 came out so poorly.

Do I believe 39% tells the whole story of comparing these two device
families? No; it is just one (very important) parameter.

Regards,

Paul Leventis
Altera Corp.
 
Paul, let me help you.
There are three ingredients to this "surprise":
1. Altera used its fastest (of three) speed grade against the middle of
three Xilinx speed grades.
( I have previously explained your reason for, and the much stronger
reason against doing that.)
2. Altera did not exercise the Xilinx software as strongly as they
pushed their own. The software tools are quite different, and require a
different approach if absolute highest speed is the goal. Which it was.
3. It is reasonable to assume that Altera's stored designs are more
Stratix-friendly.

So, don't you guys play the surprised innocent onlookers. Nobody
expected Altera to be fair.
Hell, I think the whole business of competitive benchmarks being run
and promoted by an interested party is a sham and a disgusting
deception. That's why I refused to enter the mudbath...
Peter Alfke
 
Peter Alfke wrote:

Paul, let me help you.
There are three ingredients to this "surprise":
1. Altera used its fastest (of three) speed grade against the middle of
three Xilinx speed grades.
( I have previously explained your reason for, and the much stronger
reason against doing that.)
Correct me if I am wrong, but didn't Altera use the most current speed
file data that was available at the time? Or was the data available in
the speed file and just the parts are not available? Lets face it.
Even if the speed file data was available, data based on estimates is
pretty pointless. We have seen significant changes in speed files even
*after* a chip is in production. So the data is pretty meaningless
*before* the parts are in production.

2. Altera did not exercise the Xilinx software as strongly as they
pushed their own. The software tools are quite different, and require a
different approach if absolute highest speed is the goal. Which it was.
This is a point that no one can prove either way. Xilinx does not
release their benchmark designs and Altera does not either. So the
users are left not knowing if any of the info is correct.

3. It is reasonable to assume that Altera's stored designs are more
Stratix-friendly.
That sounds like marketing-speak. Regardless, until we get a set of
benchmarks that are open *and* useful, this is all just a tempest in a
teapot.

So, don't you guys play the surprised innocent onlookers. Nobody
expected Altera to be fair.
Hell, I think the whole business of competitive benchmarks being run
and promoted by an interested party is a sham and a disgusting
deception. That's why I refused to enter the mudbath...
But here you are... :)

--

Rick Collins

rick.collins@XYarius.com

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design http://www.arius.com
4 King Ave. 301-682-7772 Voice
Frederick, MD 21701-3110 GNU tools for the ARM http://www.gnuarm.com
 
Paul Leventis (at home) wrote:
<snip>
But short of releasing the actual designs, which we cannot do, we
will never be able to
convince those people (such as you) who believe we are cooking the numbers
when we are not.
And there you have the best possible argument for Public (WEB) Source
code.
- I cannot believe the designers at Altera feel happy to have
"invested years of engineering time", and find themselves unable to
publicly verify the numbers. ( and even have them openly laughed at ? ) -
To me, that is a total waste of time. You, and your customers deserve
better.

Simple solution: Get some designs you CAN release !?

-jg
 
Hi Jim,

And there you have the best possible argument for Public (WEB) Source
code.
- I cannot believe the designers at Altera feel happy to have "invested
years of engineering time", and find themselves unable to publicly verify
the numbers. ( and even have them openly laughed at ? ) -
To me, that is a total waste of time. You, and your customers deserve
better.
From a marketing perspective, yes it makes life difficult. That is only a
secondary goal of our benchmarking effort. The primary reasons we collect
designs and measure our performance is to (a) improve our CAD tools and (b)
experiment on new architectures. When developing new cad algorithms and new
architectures, we need to be able to compare the new vs. the old to see if
the change is a useful one. For example, there is no way we could have ever
made the radical change of moving from our old Stratix 4-LUT based LE to the
Stratix II decomposable 6-LUT with shared LUT function capability. There is
a lot of pain (synthesis effort, IP changes, customer impact, etc.)
associated with changing the logic architecture of a family, and we need
good, solid data to back it up. Similarly, when we make changes to our
synthesis, placement and routing algorithms, every such change must be
validated for functionality and quality.

Hopefully someone out there will put together some new public domain big
benchmarks (like the old MCNC benchmarks, still quoted so often in academic
literature). It would do the academic community some good to see what real
designs look like these days.

Paul Leventis
Altera Corp.
 
Hi Rick,

Correct me if I am wrong, but didn't Altera use the most current speed
file data that was available at the time? Or was the data available in
the speed file and just the parts are not available? Lets face it.
You are correct. No speed files are available for -12. No numbers are in
the datasheet. So we compare to -11.

Even if the speed file data was available, data based on estimates is
pretty pointless. We have seen significant changes in speed files even
*after* a chip is in production. So the data is pretty meaningless
*before* the parts are in production.
I think performance comparisons based on preliminary timing models are still
valid. Regardless of how correct speed files are, that is the performance a
customer will see, and is what customers are using to select devices and
speed grades. Of course, performance comparisons need to be updated with
each release of speed files (and cad software too -- algorithms are always
improving).

I cannot speak for Xilinx and their speed files. But on the Altera side of
things, I would not expect much change in core performance. For all
families I've been involved with (Stratix, Cyclone, and beyond), our core
performance predictions made in the preliminary timing models have been very
close (within 5%) of final production numbers. Stratix II core (logic +
routing) speed will not be changing more than a few % in the future. Our
models have already been correlated to silicon and compare very well. The
toggle rate limitations on DSP and memory blocks will likely increase
(again) since we're still in the process of finishing off our
characterization of these blocks and we like to stick with conservative
limits until that characterization is completed.

Regards,

Paul Leventis
Altera Corp.
 
This is getting tedious, but Paul did write " we were so surprised that

Virtex-4 came out so poorly." That's where I got the SURPRISE from.
:)

It looks like everybody agrees: the evolution from 130 nm to 90 nm does
not automatically give a big performance boost. (it lowers the cost,
though).
So Altera and Xilinx used additional means to improve Stratix-II and
Virtex-4 performance, but the two companies did this in very different
ways.
Altera changed the LUT structure significantly, and I can believe that
this makes certain applications faster, if they can tolerate shared
inputs. But Altera made no systems-oriented functional changes, added
no new functions or structures.
Xilinx did the opposite, leaving the LUT structure pretty much as
before, but adding and improving functionality in many ways (as I
emphasized in the web-seminar).
What does that mean for the old benchmarks?
Since all Altera benchmark use established legacy designs, and only
designs available to Altera, they will benefit from LUT-level
improvements, but will of course have no clue about major structural
and functional improvements, as introduced by Virtex-4.

I bet there are no dual-clock FIFOs in the Altera benchmarks, or 32-tap
FIR filters, or Gigabit SerDes, or microprocessors, or even SRL16s or
LUT-RAMs. Such applications do not exist in the old designs, or they
are implemented in such different, less efficient ways that they do not
migrate.

Altra evolved Stratix-II in a direction that old legacy benchmarks can
easily take advantage of. Xilinx evolved Virtex-4 into a more
systems-oriented direction. I am convinced the Virtex-4 innovations
provide a bigger performance boost for new designs that can take
advantage of the new features. Who cares about porting obsolete
designs?

This also answers the quest for public benchmarks: There is no way that
otherwise nice guys like Paul and Peter would ever agree on such
benchmarks. I would insist on SRL16s, FIFOs etc, and Paul would load up
with applications that are favored by the complex LUTs with their
shared inputs. Both of us would have to be selfish, since there is so
much at stake.
There can be no common ground, since there is no "typical benchmark".

Benchmarks are dead, long live performance!
Peter Alfke
 
Paul Leventis (at home) wrote:
<snip>
I cannot speak for Xilinx and their speed files. But on the Altera side of
things, I would not expect much change in core performance. For all
families I've been involved with (Stratix, Cyclone, and beyond), our core
performance predictions made in the preliminary timing models have been very
close (within 5%) of final production numbers. Stratix II core (logic +
routing) speed will not be changing more than a few % in the future. Our
models have already been correlated to silicon and compare very well.
snip

Funny, I was sure this was an Altera link ( you have seen this ? )

http://www.altera.com/corporate/news_room/releases/products/nr-perf_power.html

To me that looks rather more than a 5% nudge ?

Does this mean the prelim models were not as good as you claim, or are
you more carefully qualifying the best ones, to better compete with
Xilinx's claims ?

In all this speed/benchmark hoopla, I see one thing that is never
mentioned is price.

Will we see a 'bragging rights' bin, that is the
three sigma yield limit, and so very expensive, but hey, look how
fast we are !!

-jg
 
rickman wrote:

Peter Alfke wrote:
snip
This also answers the quest for public benchmarks: There is no way that
otherwise nice guys like Paul and Peter would ever agree on such
benchmarks. I would insist on SRL16s, FIFOs etc, and Paul would load up
with applications that are favored by the complex LUTs with their
shared inputs. Both of us would have to be selfish, since there is so
much at stake.
There can be no common ground, since there is no "typical benchmark".

Benchmarks are dead, long live performance!
Peter Alfke


I think most of what Peter has said is very reasonable. But I think you
*can* have common public benchmarks if you start at a higher level
rather than try to share HDL code. Typically, a project will know which
vendor is going to be used and can design their architecture to fit. So
why not spec a design function and let the vendor write their own code
to implement it?
Exactly, this is the 'optimised' branch I was talking about.

Uses do not care what tricks you use, they want to know the peak MHz,
MHz/$, or mW/MHz or whatever, for a particular application.

They EXPECT differences in fit and ideal usage, by end application, but
right now, this type of information is sparse indeed.

They also are more interested in a design broadly in their field, than
in how fast one node can spin.

FPGAs are getting more complex, with the dedicated blocks, so HDL
designers need vendor-tuned examples of how to efficently use those blocks.

-jg
 
There is a lot of negative baggage associated with the term Benchmark.

Yet there is a lot of performance related info that would help
designers, both in choosing between chips and when working on
a design.

I'm thinking of things like the speed of a 32 bit counter.
It's a reasonably basic building block. It's not the whole
story, but it's one very useful data point.

The next question is how many variations make sense? Do they
have major impacts on the speed and/or take extra resources?
Enable, load, overflow flag, ...

Maybe the info I'm looking for should be printed in
green if there is a good fit with the hardware (sweet spot)
and flagged with red if the answer might surprise you (say
because it takes an extra level of logic).

Other basic things I'd like to see...
Data rate between 2 chips (same type/speed).
Routing delays to go N steps H or V.


Maybe that's all BS because most engineering time is
spent working at a higher level. (But I always get involved
with the details.)

Maybe a handful of medium size tasks would be interesting.
Maybe library elements. Maybe just good examples.
Pulse width modulators - fixed and programmable parameters.
FIR filters.
State machines - need several examples.
DRAM controller


I think most of what Peter has said is very reasonable. But I think you
*can* have common public benchmarks if you start at a higher level
rather than try to share HDL code. Typically, a project will know which
vendor is going to be used and can design their architecture to fit. So
why not spec a design function and let the vendor write their own code
to implement it?
I was thinking about that too. One complication is that you
can push some things off to the non-FPGA parts of the system
or pull things in.

What sort of problem would be a good whole-system example?

Vendor neutral would be nice, but I'm also interested in
good examples that take advantage of special features,
and I'm willing to push things around and/or change the
problem if that makes things fit better for the total
system.


--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
Hal Murray wrote:
<snip>
What sort of problem would be a good whole-system example?
How about:

Frequency/Pulse Counter [24/32/40/48 bits ]
This needs counters, and capture. Counters can be either
Decimal, or Binary, with some Bin-BCD post conversion, for
display of the results.

DDS - same widths, this needs wide adders
Expanding to Waveform generation, and pulse generation...

These are complex enough to push the FPGA, have easily
verifiable output everyone can relate to, and would
be genuinely usefull for education..
Some would be simple enough for the MAX II or
smallest ProASIC3...


and more ambitious: a Storage Oscilloscope, that needs
a timebase, and wide data path pumping, [assume external
fast-enough ADC] to any choices of USB / Firewire / ATA / SerialATA
(etc) for the results.

Vendor neutral would be nice, but I'm also interested in
good examples that take advantage of special features,
and I'm willing to push things around and/or change the
problem if that makes things fit better for the total
system.
Of course..
-jg
 
Jim, I like your idea, but it is not all that straightforward.
I mentioned in the seminar that there is a loadable synchronous counter
inside every DSP Slice, and it runs at 500 MHz, no ifs, no buts.
Now, I will build a 5 GHz counter using the MGT. Is that allowed?
DDS obviously runs at 500 MHz in the DSP slice, but I will run a
virtual 8 GHz DDS either by using 16 accumulators, or by doing some
clever math. Is that kosher?
If you saw my seminar, I mentioned those things, and there will be app
notes, etc.
Storage oscilloscope is dear to my heart, but it has many arbitrary
parameters.
Its performance and cost are determined by the A/D at the input, not
the FPGA.

I suggest we keep generating creative app notes, reference designs, and
evaluation boards, without calling them benchmarks.

BTW:
Choosing between X and A isn't all that difficult, it's only between
two suppliers. Selecting between umpteen brands of breakfast cereals,
washing mashines, cars, or colleges for your kids is a much tougher
task. ( Life in NZ may be simpler, but the choices in the US are
mindboggling. ;-)
Peter Alfke
 
"Peter Alfke" <alfke@sbcglobal.net> schrieb im Newsbeitrag
news:1107733371.888845.5080@l41g2000cwc.googlegroups.com...
Jim, I like your idea, but it is not all that straightforward.
I mentioned in the seminar that there is a loadable synchronous counter
inside every DSP Slice, and it runs at 500 MHz, no ifs, no buts.
Now, I will build a 5 GHz counter using the MGT. Is that allowed?
DDS obviously runs at 500 MHz in the DSP slice, but I will run a
virtual 8 GHz DDS either by using 16 accumulators, or by doing some
Hi Peter

I have already done that! its reletivly simple to use MGT as DDS with
virtual 10GHz clock for every user clock 40 accu samples are calculated

and I also am using MGT as 3GS/S logic analyzer with ChipScope

I wish it would make sense for me to publish all that work

Antti
 
The data sheet says there are 16 BlockRAMs in the XC3S400. Go for 16!

"design" <vasus_ss@yahoo.co.in> wrote in message
news:1107590479.459500.312620@f14g2000cwb.googlegroups.com...
When i synthesize my design in a SPARTAN3 XCS400 FPGA the synthesis
report says that there are 8 blockRAMs available in Spartan3 FPGA .
But after place and route and everything it says there are 16 blockRAMs
available.
So which one is actually correct.
Does it also mean like if we used 9 blockRAMs then we are still under
the limit of BLOCK RAMS or have we exceeded the design resources as the
synthesis tool says.

Thanks in advance
 
Paul Leventis (at home) wrote:
(non-constructive criticism snipped)

Altera changed the LUT structure significantly, and I can believe that
this makes certain applications faster, if they can tolerate shared
inputs. But Altera made no systems-oriented functional changes, added
no new functions or structures.

Shared inputs do not need to be "tolerated" -- they are an available but not
necessary feature. Ignoring all other aspects of the ALM, a 6-input LUT can
do a lot more than a 4-LUT, reducing the depth of the critical path and
hence increasing its speed. The reason where straight 6-LUTs lose out to
4-LUTs is in area (or silicon cost) -- that's where all the other
innovations in the ALM come in, including shared inputs. For example, the
ALM can split into two (fully indepedent) 4-LUTs. Or you can share some
inputs and/or LUT mask bits and create two larger functions. With the ALM,
you get the speed when you need it and good area when you do not.
I do wonder if the optimal LUT size has changed over the years.

Is there work showing the optimal LUT size as a function of
silicon resources needed to implement such LUTs?

-- glen
 
Antti Lukats wrote:
"Peter Alfke" <alfke@sbcglobal.net> schrieb im Newsbeitrag
news:1107733371.888845.5080@l41g2000cwc.googlegroups.com...

Jim, I like your idea, but it is not all that straightforward.
I mentioned in the seminar that there is a loadable synchronous counter
inside every DSP Slice, and it runs at 500 MHz, no ifs, no buts.
Now, I will build a 5 GHz counter using the MGT. Is that allowed?
DDS obviously runs at 500 MHz in the DSP slice, but I will run a
virtual 8 GHz DDS either by using 16 accumulators, or by doing some


Hi Peter

I have already done that! its reletivly simple to use MGT as DDS with
virtual 10GHz clock for every user clock 40 accu samples are calculated

and I also am using MGT as 3GS/S logic analyzer with ChipScope

I wish it would make sense for me to publish all that work

Antti
Hi Antti
-- just an idea -- how about Xilinx keep you supplied with
FPGA Eval Boards, & Tools, on 'long loan', and you supply Xilinx with
source codes... ? Peter?
-jg
 
Glen,

" > I do wonder if the optimal LUT size has changed over the years.
Is there work showing the optimal LUT size as a function of silicon
resources needed to implement such LUTs?"
Good point. Paul has referred to their studies of replacing a 4 LUT
with a 6 LUT, and then re-running synthesis to see just how much
improvement one sees.

Assuming one can get enough >4 term, <=6 term logic synthesised, one
saves logic levels, and improves speed (even if a 6 LUT is slower than a
4 LUT).

Then comes the other nagging questions:
- are inputs shared?
- how badly does that mess up the results?
- is it a universal 6 LUT, or 2 5 LUTs with some sharing and some extra
logic to almost give you a 6 LUT? How badly does that work?
- given the smallest LUT is not a 4 LUT, for smaller than 5 logic terms,
how badly does that increase the delay?

I would claim a properly engineered 6 LUT would improve the overall
performance. A compromise would provide some improvement. A poor
implementation woukd make no difference.

Should all LUTs be 6 LUTs? Or a mixture of both? In what ratio?

Can you use them as SRL? LUT RAM?

The synthesis tools all have to be retuned, and debugged to take
advantage, so this is not without risk.

As for area, a 6 LUT is not all that big as technology shrinks, so some
combinations of variable LUT size, alternate architectures, is in my
opinion, inevitable.

As for speed, the smaller the technology, the less improvement in speed
(ITRS roadmap, and anyone who says differently can be confidently
ignored). For speed, one now has to use triple oxide to get both speed
and static power reduction (eg compare us to S2 at 25C and there is no
difference for leakage, but compare us at 85C, and we are 1/2 to 1/3 the
static power!).

The wonderful thing about standard CMOS, is just that. No one has a
remarkably different or unique process. But one can use standard CMOS
with all of the available tricks, and see a 1/2 to 1/3 reduction in
static power, an improvement in speed, and an improvement in SEU
resilience. Like V4.

Also, there is room for improvement with the P&R tools, so software is
always looking for that QOR improvement that gives us another speed
grade advantage without any process change. So far they do that every
generation (they get credit for part of the improvement in speed with
each generation, too, including the most recent ones).

Austin
 
Hi Glen,

I do wonder if the optimal LUT size has changed over the years.
Is there work showing the optimal LUT size as a function of silicon
resources needed to implement such LUTs?
Elias Ahmed & Jonathan Rose from the Unversity of Toronto published "The
Effect of LUT and Cluster Size on Deep-Submicron FPGA Performance and
Density". See http://www.eecg.toronto.edu/~jayar/pubs/ahmed/fpga00.pdf.
Elias's M.A.Sc. thesis was on clustering and optimal lut sizes. This paper
contains many references to previous work in the area and is probably a good
starting point. The paper's conclusion is that a LUT size between 4 and 6
is and cluster sizes of between 3 and 10 LEs are best from a balanced
area-delay perspective. If you want higher speed, larger LUTs are better.
One suggested area of future research is finding a way to reduce logic
levels without the area cost of large LUTs -- and this is what we have done
in Stratix II with the ALM. Figure 12 is particularly interesting.

I think Guy Lemieux had some work in this area from his PhD -- not sure if
its published anywhere yet.

At the FPGA 2005 conference in two weeks, the Stratix II logic architecture
and some experimental results will be presented in a paper by David Lewis et
al.

Regards,

Paul Leventis
Altera Corp.
 

Welcome to EDABoard.com

Sponsor

Back
Top