EDK : FSL macros defined by Xilinx are wrong

i saw it
its not of much help..as we are doing it based on subpipelining
concepts and composite field arithmetic if you find something of such
sort please do help us
thanks in advance
 
Austin Lesea wrote:
John,

Interesting opinions.
snipping

How much circuit design for FPGAs is done outside the US? Outside
silicon valley? How much software for CAD tool support of FPGAs is done
elsewhere? Where are the patents being filed?
I notice on the job listings for Xilinx that most all of the good stuff
is in the US CA,CO etc, but on that other companies website, they
seemed to have half their VLSI design reqs in Malaysia last time I
looked, perhaps cost cutting or something.

John Jakson
 
JJ wrote:

I notice on the job listings for Xilinx that most all of the good stuff
is in the US CA,CO etc, but on that other companies website, they
seemed to have half their VLSI design reqs in Malaysia last time I
looked, perhaps cost cutting or something.

John Jakson


Xilinx have a circuit design group in Ireland also.

Alan
 
On Tue, 07 Mar 2006 11:15:33 -0800, Austin Lesea <austin@xilinx.com>
wrote:

http://www.eetimes.com/news/semi/showArticle.jhtml;jsessionid=2VY5CYWYDOXWUQSNDBCSKH0CJUMEKJVN?articleID=181501385

Well, I guess that about wraps it up for the attempt to disguise ASIC
design as something different...
Pity; I'm doing a RapidChip at the moment. I guess they got their
business model wrong, but there are still half-a-dozen other people in
the market, and I'd be surprised if we don't see more.

On the RapidChip, it was (then) a no-brainer: 110nm, (much) bigger
than any 'real' FPGA, much better performance, good unit price, and
the total NRE+tools cost was equivalent to the cost of about 50 or 60
of the same-size (but slower) FPGA equivalent that I was quoted for.

I don't know about the others, but LSI wasn't disguising this as
anything other than ASIC design. ASIC design is not actually that
different from (good) FPGA design.

Evan
--
Riverside
emlat
riverside-machinesdotcodotuk
 
manjunath.rg@gmail.com wrote:
i saw it
its not of much help..as we are doing it based on subpipelining
concepts and composite field arithmetic if you find something of such
sort please do help us
do you have a C based implemention somewhere as an example?
 
I've made a implementation of the aes core in fpga
which work with pipelining - i.e. only 4 sboxes that I use and itterate
each 5 times for every round. I cannot give you the code/spec due to IP
issues...
the design nature depeneds on what is the speed (i.e. clk cycles) you
need for each round and how much memories you can spare (dpbram = 2
sboxes).
Hope it helps, Mordehay.
 
On 8 Mar 2006 10:31:12 -0800, laura_pretty05@yahoo.com.hk wrote:

I want to know which VHDL book is better for learning...??
A good starting point is Peter Ashenden's "VHDL Cookbook".
You should be able to find it online for free download, using Google.

His bigger book "Designer's Guide to VHDL" is very good too.

- Brian
 
"Thomas Womack" <twomack@chiark.greenend.org.uk> wrote in message
news:u9c*pabbr@news.chiark.greenend.org.uk...
In article <dunp4j$ast22@xco-news.xilinx.com>,

Austin Lesea <austin@xilinx.com> wrote:

And, have you heard anything about that 65nm ASIC process being
ready for anyone, anywhere? For anything? Other than it is too
much money, and too much power? (with no proven IP)

As I'm sure you know, Intel has shipped several million 65nm ASICs in
the last few months.
I think Austin most likely meant "is anyone selling a 65nm cell-based ASIC
design flow", rather than "is anyone shipping a product based on a
full-custom 65nm process".

As you pointed out, it makes sense for Intel to do deep-submicron design.
But for most people, it's surely far more economical to let (e.g.) Xilinx
solve that particular very hard problem for you. After all, you already have
your own solution space to worry about - graphics, or DSP, or wireless, or
whatever - why focus on anything else?

Cheers,

-Ben-
 
In article <dunp4j$ast22@xco-news.xilinx.com>,
Austin Lesea <austin@xilinx.com> wrote:

Of course, I disagree that ASICs have any rosy future at all, and I
also feel that your conclusion that whoever controls the foundry
controls the technology is also quite bizarre.
It depends what you mean by ASIC, of course; I can just about imagine
a future in which a fair number of the functions we nowadays associate
with chipsets are performed in an FPGA (indeed, the Cray XD1's
network-chip to processor-chip logic is a large Xilinx chip).

I can even nearly imagine one where FPGAs have taken over the job of
graphics processors; current graphics processors are pretty much a
large grid of floating-point units with interconnect.

I don't see one in which the functions we nowadays associate with CPUs
in things we think of as computers happen in FPGA: the incentive for
greater performance is enough to make it worth doing actual circuit
design, and an FPGA containing a credible CPU as a functional block is
not going to be competitive with that CPU implemented without the
surrounding FPGA.

Though this may mean that you end up with an ecosystem containing some
FPGA vendors, Intel, and possibly a couple of other CPU manufacturers
(IBM, AMD?).

I think there will be a role for ASICs when you want to move data
around at enormous rates in truly-commodity applications: an
eight-port gigabit ethernet switch ASIC, essentially eight gigabit
transceivers bought from an IP firm and pasted around the edge of a
small amount of routing logic, is in volume going to be cheaper than
an FPGA with lots of logic in the middle and eight transceivers around
the edge. OK, CISCO's bigger routers are sold at margins where you
can afford to buy Virtex4 chips just for the transceivers, but there's
a much larger, more price-sensitive market in the home.

And, have you heard anything about that 65nm ASIC process being
ready for anyone, anywhere? For anything? Other than it is too
much money, and too much power? (with no proven IP)
As I'm sure you know, Intel has shipped several million 65nm ASICs in
the last few months.

Tom
 
See http://www.ht-lab.com/freecores/AES/aes.html

No pipelining but perhaps the testbench can save you some time.

Hans
www.ht-lab.com

<manjunath.rg@gmail.com> wrote in message
news:1141827140.488659.53020@u72g2000cwu.googlegroups.com...
We have been doing a project on high speed aes using subpippelining
concepts we would be happy if we find some code which may help us.. if
anyone in this group has any access pls help us
 
Hi Joseph,

"Altera: hardcopy-II - these are structured ASICs. They look
appealing, but
I did not get the impression that there were many conversions (at least
as
of a year ago)."

We have many customers lined up to use HardCopy II, and have have had a
significant number of conversions in the original HardCopy devices; one
example is TI's DLP chipset (yes, I'm talking about those fancy HDTVs).
HardCopy II is particularly exciting because it uses a very efficient
fine-grained logic fabric and provides a choice of migration devices
allowing greater cost reductions than previous members of the family.
HardCopy II also provides a significant speed-up over the equivalent
Stratix II FPGA devices and cuts power consumption in half.

I don't know why LSI is exiting this market. But there are a few
advantages Altera has offering a structured ASIC over pure-play
structured ASIC vendors. First, we can leverage existing sales
channels and contacts from our FPGA products; selling a structured ASIC
is a lot more like selling an FPGA than it is selling a standard-cell
ASIC product.

Second, our customers can prototype immediately in an FPGA, and can
sell that prototype for as long as it takes them to finalize their
design and commit to the HardCopy II device. They can wait for their
product to take-off before migrating to HardCopy II for a
cost-reduction, or they can move immediately into HardCopy II in order
to attain speed, power and cost advantages.

Third, and certainly not last, we can leverage our software and
intellectual property from our FPGAs. We put a full-fledged CAD suite
(Quartus II) on the customer's desktop, allowing them to design,
iterate, and close timing before handing off the design to us. The
software is easy-to-use (especially by ASIC standards), giving
push-button RTL-to-Silicon results. In addition, we have a large
library of Altera and 3rd-party intellectual soft-IP that customers can
use in their designs that have been tested in numerous FPGA designs.
And HardCopy II devices incorporate much of the hard-IP from our FPGA
devices, such as PLLs, I/Os and RAMs, and work the same way as the
Stratix II FPGA.

Regards,

Paul Leventis
Altera Corp.
 
After all, in spite of the best spin efforts from all the marketing
departments, the Silicon performances are actually quite similar -
expected given they depend mostly on what Process the leading edge FABs
can run.
Hi Jim,

Actually, there is a lot of R&D spent on the logic and routing
architectures of FPGAs (never mind other integrated IP) that has a
large influence on the speed of the resulting chip. And then there's
the software... what is the difference between transistors getting
faster by 5% and an algorithm enhancement in the placer (for example)
that gives 5%? From a user's perspective, nothing.

Lets pretend for a moment that Altera & Xilinx high-end FPGAs are the
same speed (based on our experiments, Stratix II has a considerable
advantage, but anyway...). The reason for this is not because its easy
to make good FPGAs, but because both Altera and Xilinx employee a large
number of smart, experienced FPGA architects, IC Designers, and
Software engineers that are capable of wringing out every drop of
performance they can. These innovations result in improved performance
from one generation to the next, more than would have been possible by
process technology alone.

Sure, process technology helps, but even that isn't as free as it used
to be -- it takes a lot more effort these days to simultaneously gain
speed, reduce area and keep power under control when moving to new
processes.

Regards,

Paul Leventis
Altera Corp.
 
Paul Leventis wrote:

After all, in spite of the best spin efforts from all the marketing
departments, the Silicon performances are actually quite similar -
expected given they depend mostly on what Process the leading edge FABs
can run.


Hi Jim,

Actually, there is a lot of R&D spent on the logic and routing
architectures of FPGAs (never mind other integrated IP) that has a
large influence on the speed of the resulting chip. And then there's
the software... what is the difference between transistors getting
faster by 5% and an algorithm enhancement in the placer (for example)
that gives 5%? From a user's perspective, nothing.

Lets pretend for a moment that Altera & Xilinx high-end FPGAs are the
same speed (based on our experiments, Stratix II has a considerable
advantage, but anyway...). The reason for this is not because its easy
to make good FPGAs, but because both Altera and Xilinx employee a large
number of smart, experienced FPGA architects, IC Designers, and
Software engineers that are capable of wringing out every drop of
performance they can. These innovations result in improved performance
from one generation to the next, more than would have been possible by
process technology alone.

Sure, process technology helps, but even that isn't as free as it used
to be -- it takes a lot more effort these days to simultaneously gain
speed, reduce area and keep power under control when moving to new
processes.
Yes, but my statement still stands. The devices nett performances
( SW and Silicon combined ) are actually quite similar - there is not a
4:1 edge, as one could expect from a 15% improvement every year, for a
decade.

Same with Intel and AMD. Years of effort, and still very similar results.

Instead, the marketing depts have to resort to selective spin,
to try and make each iteration sound larger than it really is.

-jg
 
I don't have any stats on re-spins and failures (I'm not in that
group). But HardCopy II is an ASIC. If the customer makes a mistake
in the design and discovers it after tape-out, they have to pay for it.
It costs Altera engineering resources to perform the migration, and
costs us for the few custom masks needed, and for the production run.
We are not a charity :)

The beauty of HardCopy is that we've had quite a few instances of
customers who sent us their "final" design, only to let us know moments
before tape-out that they'd discovered another bug in their FPGA
prototype system. Do these customers get charged for a partial
tape-out (we didn't make the masks, but we did do some engineering
work)? I don't know. But I'm sure it cost the customer a heck of a
lot less than a full ASIC (or structured-ASIC) run + 3 months to market
that it would have cost them without the ability to prototype in
advance.

Regards,

Paul Leventis
Altera Corp.
 
The Intel/AMD example is a good one. Sometimes one vendor has the
advantage over the other (AMD has one right now). But overall, the two
companies basically innovate at a similar rate. Does this mean they
are only reaping the benefits of process technology? Does this mean
their claims of performance improvements due to innovation are not
valid? I'd argue no (well, lets ignore NetBurst for a moment ;-)).

All I'm saying is that the average rate of performance increase in
FPGAs is much greater than that of the underlying process technology
(which I believe was your initial statement). I would not expect a 4:1
advantage from one FPGA vendor to the next; both vendors are innovating
at a similar rate per year, with some jumps here and there. I'd argue
this with a simple existance proof -- if one company was doing so
badly, they would disappear from the market. This has happened in the
CPU market and in the FPGA market, where companies that fall behind in
innovation fail and disappear.

Even the Xilinx vs. Altera epic battle has not been a neck-and-neck
race. Xilinx was getting trounced by Altera back in the Flex10K days
-- but a combination of innovation and competitor mis-execution got
them back into things. Similarly, Altera was starting to fall behind
until Stratix & Cyclone came out.

If it were easy to make a cost, power and speed competitive FPGA, I'm
sure there would be other players in our market (given the profit
margins). But it isn't, which is good -- it keeps my job exciting.

Regards,

Paul Leventis
Altera Corp.
 
I've put together a preliminary slide showing before/after
eye diagram comparisons of the ADS5273 -> V4 interface:

- IBIS/HyperLynx models vs. simple SPICE model
- no back termination vs. 6mA back term + attenuation scheme

Plots are temporarily at the following location until I update the
original file:
ftp://members.aol.com/fpgastuff/temp_plots.pdf

Many thanks to Symon for running those HyperLynx sims for me,
and for reminding me that real world current sources are less
reflective away from midstream than ideal ones .

Setup:
- simple lossless models as originally described in <lvds_current.pdf>
- PRBS-5 data pattern
- note varying scales on plots

Comments:

Again, I'd not trust either method without lab verification;
see notes below for specific concerns, particularly regarding
the DC offset seen on the Xilinx IBIS input models.

In any case, I'd say the plots clearly show the improvement
in ISI crossing jitter and eye closure over the direct connection.

The back terminated version also significantly reduces the
peak-peak reflected junk at the pins of the precision mixed
signal A/D. (bearing in mind real world Tlines have more loss)

The two simulation methods match fairly well, but there's a
smaller eye opening in the SPICE model than in the IBIS plots.

I think they'd match much better without the huge DC offset
in the IBIS models, which seems to be causing the driver to
saturate and/or clamp in the Hyperlynx sims.

This causes the asymmetrical TX overshoot in the lower left
IBIS plots, the imbalance from which then causes the squiggle
in the leading edge of the IBIS Rx crossing waveform.

IBIS model concerns:

- Xilinx v4 IBIS model for LVDS inputs generates IBIS parser
warnings about non-zero clamp currents

- V4 IBIS model pulls up LVDS driver output common mode
from the expected 1.2V to around 1.5V

- DT terminator modeled as simple resistor in IBIS files;
how much does it vary over allowed input range of diff Rx?

Spice model concerns:

- ideal current source model of the driver is perfectly reflective,
unlike an actual device which can't swing past its headroom

- 9 db may be too much with real world Tline loss at weak driver
corner ( reduce attenuation or remove/change 100 ohm back term )


Brian
 
Allan Herriman wrote:
You want to try it in your C -> hardware compiler? I'd be interested
in the results.
Me too :)

I'll take a look at it this weekend, as it might make another
interesting example for the next FpgaC release. I have a pipelined
RSA-72 I did two years ago when looking at building dnet engines that
is a monster because of the barrel shifters and LUT RAMS required for
retiming. First glance at the referenced materials suggests the problem
with AES is going to be 80 or more block rams for S box lookups tables
to get any reasonable parallelism. It's not clear there is an easy way
to avoid using sbox tables, as the algorithm for creating the table is
itterative. The rest of the requirements per round seem pretty timid. I
have a couple ideas to ponder first.

The feedback chaining clearly limits performance unless you have a fair
number of independent concurrent streams that can be muxed into the
pipeline - like a 11 port mux/switch used to breakout a very fast
connection.
 
Google shows that there are many papers claiming rather fast AES in
FPGAs (with some fine print saying they're using a non-feedback mode).
I've never seen a feedback mode cypher in a real world application get
anything over some Gb/s.

Regards,
Allan
Hi Allan,
interesting point, but have you thought about what the reasons may be?

Let's do some (approximative) calculations.

Assume you have a single AES-Round that runs with a 100MHz Clock.
This round needs at least 10 clocks to produce an AES Cipher.

With 128 Bits Data width that gives:

128 * 100e6 /10 = 1,28e9 Bits per second

So that is the limit for the assumed circuit.

Adding a feedback path for block cipher modes will extend the number of
clocs to create a ciper.

Let's assume 14 clocks to produce a CBC cipher

Now we have:

128 * 100e6 /14 = 914,3e6 Bits per second

That's all what's possible with the assumed circuit.

How can we increase the throughput?

1) Wait for better silicon that allows higher clock rates.
2) Use more chip-space to implement aditional rounds and decrease the
number of iterations needed in the round. But that may be rather expensive!

3) Improve the rounds latency. Make it fast to the limit. (Which is at
about 500MHz as some vendors claim for their products ;-) )

Now let's assume our circuit will still run at 100MHz, but the improved
round runs at 500 MHz. That will reduce the round latency to 2 100MHz
cycles. Which gives 6 cycles to create the CBC cipher.

Now we have:

128 * 100e6 /6 = 2,1e9 Bits per second

So, that's the theoretical limit for the assumed circuit. You can exceed
it by investing in additional or better (ASIC) silicon, if you have the
money.

As I understand the original posting, these guys want to spend some work
on solution 3 somehow.

My tip to manjunath & co.: Have a look at the standard implementations
and the book "The design of rijndael" ISBN: 3540425802
Identify the modules and start optimizing the designs to whatever your
goal is.

Have a nice synthesis
Eilert
 
Paul,

Nice post. Not often we agree.

Overall, I am very pleased with the thread.

LSI Logic was the dominant structured ASIC player with 42M$ in 2005
(numbers from ISuppli). There are 10 other vendors left now in this
space, for a total market of 155M$ in 2005 (same source).

Out of those 10 vendors, Toshiba announced in June that "there is no
money in this market" (from a EE Times article, 6/5/2005). Is this a
case of the "emperor having no clothes?" Or just no money in this business?

5 of those vendors offer 90nm, the rest all have older technologies,
some as old as .18 micron.

Two vendors that used to be in that business also left in the last year.

If Altera gets some of that 42M$ that LSI has dropped (although LSI will
convert anyone who desires to the standard cell flow), then that will be
good for Altera. Good for the other 9 players, too.

Generally, the structured ASIC venbdors have as many mask sets as they
have customers, which means that the vendors have not saved a penny, and
in fact are losing money.

So, it is a great deal for a customer who wants a cheaper ASIC, but how
long will companies stay in the business of losing money?

When the dominant (that means #1) supplier of structured ASICs calls it
quits, that is not a sign of a healty market (IMHO).

So, while Altera runs off to do (structured) ASICs, we will instead
continue to believe that programmable devices are the future, and
continue to spend all of our effort (as in R&D $) on innovation in that
field.

Austin
 
In article <1141986452.587421.137510@i40g2000cwc.googlegroups.com>,
<fpga_toys@yahoo.com> wrote:
Allan Herriman wrote:
You want to try it in your C -> hardware compiler? I'd be interested
in the results.

Me too :)

I'll take a look at it this weekend, as it might make another
interesting example for the next FpgaC release. I have a pipelined
RSA-72 I did two years ago when looking at building dnet engines that
is a monster because of the barrel shifters and LUT RAMS required for
retiming. First glance at the referenced materials suggests the problem
with AES is going to be 80 or more block rams for S box lookups tables
to get any reasonable parallelism. It's not clear there is an easy way
to avoid using sbox tables, as the algorithm for creating the table is
itterative.
There has been a lot of research put into efficient implementations of
the S-boxes without using lookup tables;

http://www.st.com/stonline/press/magazine/stjournal/vol00/pdf/art08.pdf

might be an example. I went to a conference in August where
http://class.ee.iastate.edu/tyagi/cpre681/papers/AESCHES05.pdf was
presented, which runs AES at 25Gbits/second on an XC3S2000; the round
function is pipelined into seven stages of three levels of LUT each.

Tom
 

Welcome to EDABoard.com

Sponsor

Back
Top