FPGA with fully asynchronous RAM

Fuchs Gottfried · Jun 30, 2004

I am looking for a FPGA Development Board, with a FPGA that supports
fully asynchronous RAMs. I know that Altera APEX20k supports fully
asynchronous RAMs.
Are there also Xilinx FPGAs that provide this functionality (large FPGAs
like the APEX20K1000C or APEX20K1500E)?

regards
Gottfried

John_H · Jun 30, 2004

If you need small RAMs, sure! But small is the key word. The two types of
memory in the Xilinx devices are BlockRAMs which require the clock for the
read and CLB SelectRAM distributed memory. In the Virtex-II and
Virtex-IIPro devices support up to 128 bit depth for single port memories or
64 bit for dual port. The memories can be any width, distributed throughout
the logic fabric. While BlockRAMs are more efficient, supplying 18kbit
chunks, they require a clock to get the read value. The CLB SelectRAMs are
fully asynchronous read. You still need an edge for the write - the write
pulse is generated internally from the supplied edge - but you have the
asynchronous nature you just *have* to have.

"Fuchs Gottfried" <fuchs@ecs.tuwien.ac.at> wrote in message
news:40E2DB28.20501@ecs.tuwien.ac.at...

I am looking for a FPGA Development Board, with a FPGA that supports
fully asynchronous RAMs. I know that Altera APEX20k supports fully
asynchronous RAMs.
Are there also Xilinx FPGAs that provide this functionality (large FPGAs
like the APEX20K1000C or APEX20K1500E)?

regards
Gottfried

Peter Alfke · Jun 30, 2004

Xilinx LUT-RAMs read asynchronously and write synchronously (clocked).
Xilinx BlockRAMs perform read and write operations synchronously (clocked).
In most situations, clocked operation is preferrable and inherently more
reliable.

Peter Alfke

From: Fuchs Gottfried <fuchs@ecs.tuwien.ac.at
Newsgroups: comp.arch.fpga
Date: Wed, 30 Jun 2004 17:24:24 +0200
Subject: FPGA with fully asynchronous RAM

I am looking for a FPGA Development Board, with a FPGA that supports
fully asynchronous RAMs. I know that Altera APEX20k supports fully
asynchronous RAMs.
Are there also Xilinx FPGAs that provide this functionality (large FPGAs
like the APEX20K1000C or APEX20K1500E)?

regards
Gottfried

Symon · Jun 30, 2004

Hi Gottfried,
I assume you're talking about on chip RAM? If so, I respectfully suggest
that you don't need asynchronous RAMs at all. What you do need to do is
think harder about your application! You'll have a more robust solution if
you keep everything synchronous.
cheers, Syms.
"Fuchs Gottfried" <fuchs@ecs.tuwien.ac.at> wrote in message
news:40E2DB28.20501@ecs.tuwien.ac.at...

I am looking for a FPGA Development Board, with a FPGA that supports
fully asynchronous RAMs. I know that Altera APEX20k supports fully
asynchronous RAMs.
Are there also Xilinx FPGAs that provide this functionality (large FPGAs
like the APEX20K1000C or APEX20K1500E)?

regards
Gottfried

Fuchs Gottfried · Jul 1, 2004

I know that synchronous RAMs are preferable, but the asynchronous RAMs
are needed in my design due to the fact that it is a processor design
that is fully asynchronous.

regards
Gottfried

Symon wrote:

Hi Gottfried,
I assume you're talking about on chip RAM? If so, I respectfully suggest
that you don't need asynchronous RAMs at all. What you do need to do is
think harder about your application! You'll have a more robust solution if
you keep everything synchronous.
cheers, Syms.
"Fuchs Gottfried" <fuchs@ecs.tuwien.ac.at> wrote in message
news:40E2DB28.20501@ecs.tuwien.ac.at...

I am looking for a FPGA Development Board, with a FPGA that supports
fully asynchronous RAMs. I know that Altera APEX20k supports fully
asynchronous RAMs.
Are there also Xilinx FPGAs that provide this functionality (large FPGAs
like the APEX20K1000C or APEX20K1500E)?

regards
Gottfried

Symon · Jul 1, 2004

Hi Gottfried,
I don't know, you academics poisoning those kids' minds with your crazy
plans! ;-)
Just kidding, sounds like an interesting project, I wish you the best of
luck with the timing tools provided by the FPGA manufacturers!
cheers, Syms.
"Fuchs Gottfried" <fuchs@ecs.tuwien.ac.at> wrote in message
news:40E3C0E0.5040101@ecs.tuwien.ac.at...

I know that synchronous RAMs are preferable, but the asynchronous RAMs
are needed in my design due to the fact that it is a processor design
that is fully asynchronous.

regards
Gottfried

Peter Alfke · Jul 1, 2004

"Fully asynchronous", ohmygod!
Well, you can always generate a strobe pulse when you want to read or write.
We would call that a clock, but if your religion forbids that, you can call
it something else..We are agnostic, up to a point

Peter Alfke
=================================================

From: "Symon" <symon_brewer@hotmail.com
Newsgroups: comp.arch.fpga
Date: Thu, 1 Jul 2004 09:39:08 -0700
Subject: Re: FPGA with fully asynchronous RAM

Hi Gottfried,
I don't know, you academics poisoning those kids' minds with your crazy
plans! ;-)
Just kidding, sounds like an interesting project, I wish you the best of
luck with the timing tools provided by the FPGA manufacturers!
cheers, Syms.
"Fuchs Gottfried" <fuchs@ecs.tuwien.ac.at> wrote in message
news:40E3C0E0.5040101@ecs.tuwien.ac.at...
I know that synchronous RAMs are preferable, but the asynchronous RAMs
are needed in my design due to the fact that it is a processor design
that is fully asynchronous.

regards
Gottfried

Nicholas Weaver · Jul 1, 2004

In article <40E3C0E0.5040101@ecs.tuwien.ac.at>,
Fuchs Gottfried <fuchs@ecs.tuwien.ac.at> wrote:

I know that synchronous RAMs are preferable, but the asynchronous RAMs
are needed in my design due to the fact that it is a processor design
that is fully asynchronous.

I'm sorry for the pain you are going to suffer making self timing
circuits in an FPGA.
--
Nicholas C. Weaver nweaver@cs.berkeley.edu

Ray Andraka · Jul 15, 2004

I'm not. FPGAs are specifically designed for synchronous logic designs.
While an async design can be done if done very carefully, the lack of support
for this by the tools makes it excruciating at best. Every once in a while
someone comes along with the bright idea to use an FPGA as a platform for
async logic experiments. A search of the literature should provide a trail of
efforts that al come to the same conclusion: that the tools and FPGAs don't
support it. There are plenty of academics that have already plowed this
path. Read their work and be forewarned.

Nicholas Weaver wrote:

In article <40E3C0E0.5040101@ecs.tuwien.ac.at>,
Fuchs Gottfried <fuchs@ecs.tuwien.ac.at> wrote:
I know that synchronous RAMs are preferable, but the asynchronous RAMs
are needed in my design due to the fact that it is a processor design
that is fully asynchronous.

I'm sorry for the pain you are going to suffer making self timing
circuits in an FPGA.
--
Nicholas C. Weaver nweaver@cs.berkeley.edu

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

E. Backhus · Jul 16, 2004

Hi everybody,
while you are discussing asynchronous designs on FPGAs...

Has anybody made experience with the balsa tools?

http://www.cs.man.ac.uk/apt/projects/tools/balsa/

And YES! todays FPGAs are designed to work synchronously,
but there are asynchronous chips around (e.g. part of the SPARC 2i and
other ľP)
that seem to work sufficently. So, while it's still hard work its not
impossible to do..
My final question is...when will there be FPGAs with C-Elements etc.
??
(Small market at this time, I think

)

Ray Andraka <ray@andraka.com> wrote in message news:<40F7154E.794740D5@andraka.com>...

I'm not. FPGAs are specifically designed for synchronous logic designs.
While an async design can be done if done very carefully, the lack of support
for this by the tools makes it excruciating at best. Every once in a while
someone comes along with the bright idea to use an FPGA as a platform for
async logic experiments. A search of the literature should provide a trail of
efforts that al come to the same conclusion: that the tools and FPGAs don't
support it. There are plenty of academics that have already plowed this
path. Read their work and be forewarned.

john jakson · Jul 16, 2004

ibis@tiscalinet.de (E. Backhus) wrote in message news:<e5e7ca2e.0407160017.61ceb013@posting.google.com>...

Hi everybody,
while you are discussing asynchronous designs on FPGAs...

Has anybody made experience with the balsa tools?

http://www.cs.man.ac.uk/apt/projects/tools/balsa/

And YES! todays FPGAs are designed to work synchronously,
but there are asynchronous chips around (e.g. part of the SPARC 2i and
other ľP)
that seem to work sufficently. So, while it's still hard work its not
impossible to do..
My final question is...when will there be FPGAs with C-Elements etc.
??
(Small market at this time, I think )

An outgrowth of the Amulet project!

Well Manchester always had a great reputation for these sorts of
things but even in full custom design Async design is pushing a rock
up a hill, there are few $ EDA tools in the ASIC market to my
knowledge and few takers.

I suspect that for large asics, a better aproach is to design clocked
subsystems with pseudo async interfaces so any no of free cycles can
pass by between clocked blocks. Its still really a synced design with
sync handshakes but more tolerant of delays that are in clock periods.
Still thats hard too.

Mind you with Xilinx talking up 500MHz and my timing reports giving me
many single wire nets in 1-4ns zone, maybe FPGAs will have to do same
thing.

regards

johnjakson

Whats a C-Elements?

E. Backhus · Jul 19, 2004

johnjakson@yahoo.com (john jakson) wrote in message

I suspect that for large asics, a better aproach is to design clocked
subsystems with pseudo async interfaces so any no of free cycles can
pass by between clocked blocks. Its still really a synced design with
sync handshakes but more tolerant of delays that are in clock periods.
Still thats hard too.
You're talking about "Asynchronous Communicatiing Processes" ...

Systems that communicate over Clock domains by handshaked signals.
Can save a lot of energy, if your system can be divided into parts
running at different clock speeds.

Mind you with Xilinx talking up 500MHz and my timing reports giving me
many single wire nets in 1-4ns zone, maybe FPGAs will have to do same
thing.
Using multiple independant clock domains and communicate via

handshakes?
I think lots of FPGA Designs are doing that already.
Just look at the increasing number of available dedicated clock nets
in large FPGAs. Someone gotta need them

Besides that, speed is not the only (wanted or unwanted) advantage of
asynchronous designs. Power is another one. Have you recently taken a
look into a state of the art PC? What good is the integration of the
several million transistors of a Pentium4 when you need a cooling
plate, the size of a big mans fist plus an extra fan? (Not to mention
the power supply.)

I wonder, besides any asynchronous stuff, if the same space could hold
a bunch of (slow) low power SoC microcomputers, working together as
known from grid computers.

Remember the old days, when a '386 was just a bare PLCC on the PCB and
from that time on all the higher speed and integration started cooling
plates to grow like fungi on the motherbords.

Well, it's starting to get philosophical here...but, I was just
wondering.

Whats a C-Element?
A special kind of latch designed for use in asynchronous designs (as

mentioned in "Logic Synthesis of Asynchronous Controllers and
Interfaces" (Springer, ISBN 3-540-43152-7)
Unfortunately there is no commonly known terminology for asynchronous
building blocks.

regards
Eilert Backhus

john jakson · Jul 19, 2004

ibis@tiscalinet.de (E. Backhus) wrote in message news:<e5e7ca2e.0407182337.570d1070@posting.google.com>...

johnjakson@yahoo.com (john jakson) wrote in message

I suspect that for large asics, a better aproach is to design clocked
subsystems with pseudo async interfaces so any no of free cycles can
pass by between clocked blocks. Its still really a synced design with
sync handshakes but more tolerant of delays that are in clock periods.
Still thats hard too.
You're talking about "Asynchronous Communicatiing Processes" ...
Systems that communicate over Clock domains by handshaked signals.
Can save a lot of energy, if your system can be divided into parts
running at different clock speeds.

Mind you with Xilinx talking up 500MHz and my timing reports giving me
many single wire nets in 1-4ns zone, maybe FPGAs will have to do same
thing.
Using multiple independant clock domains and communicate via
handshakes?
I think lots of FPGA Designs are doing that already.
Just look at the increasing number of available dedicated clock nets
in large FPGAs. Someone gotta need them
Besides that, speed is not the only (wanted or unwanted) advantage of
asynchronous designs. Power is another one.

Have you recently taken a
look into a state of the art PC? What good is the integration of the
several million transistors of a Pentium4 when you need a cooling
plate, the size of a big mans fist plus an extra fan? (Not to mention
the power supply.)

My family is all too aware of AthlonXP heat output, looking to kill it
one day with a Transputer or 2, but 1 should suffice. After all it
only surfs & plays net TV.

I wonder, besides any asynchronous stuff, if the same space could hold
a bunch of (slow) low power SoC microcomputers, working together as
known from grid computers.

Yes, thats what I am working on, Inmos did this 20yrs ago, probably
20yrs too early. I keep doing the same engineering calculation, Intel
ups the freq of x86 by 30x and gets 30x perf over the p100. BUT
transister count also went up (big no) and heat,noise,space too. That
used to be called bad engineering. Notice that bridge builders today
build lighter bridges today than IKB did many yrs ago.

The intel supporters will pooh pooh that analysis but if you have
distributed cpus & local memory and know how to use them (Transputer
people do), you also get far more total memory b/w than pushing it all
up 1 pipe. Also no reason to be limited to std DRAM, theres RLDRAM
available with 20n RAS times. And with MTA architecture, branching &
memory delays are better hidden than single threaded cpus with ever
bigger caches.

Remember the old days, when a '386 was just a bare PLCC on the PCB and
from that time on all the higher speed and integration started cooling
plates to grow like fungi on the motherbords.

Yes been along time since I saw computer fanless. my old BBC & QL
still got quite warm though!

Well, it's starting to get philosophical here...but, I was just
wondering.

Whats a C-Element?
A special kind of latch designed for use in asynchronous designs (as
mentioned in "Logic Synthesis of Asynchronous Controllers and
Interfaces" (Springer, ISBN 3-540-43152-7)
Unfortunately there is no commonly known terminology for asynchronous
building blocks.

regards
Eilert Backhus

regards

johnjakson_usa_com

rickman · Jul 19, 2004

john jakson wrote:

ibis@tiscalinet.de (E. Backhus) wrote in message news:<e5e7ca2e.0407182337.570d1070@posting.google.com>...
johnjakson@yahoo.com (john jakson) wrote in message

I suspect that for large asics, a better aproach is to design clocked
subsystems with pseudo async interfaces so any no of free cycles can
pass by between clocked blocks. Its still really a synced design with
sync handshakes but more tolerant of delays that are in clock periods.
Still thats hard too.
You're talking about "Asynchronous Communicatiing Processes" ...
Systems that communicate over Clock domains by handshaked signals.
Can save a lot of energy, if your system can be divided into parts
running at different clock speeds.

Mind you with Xilinx talking up 500MHz and my timing reports giving me
many single wire nets in 1-4ns zone, maybe FPGAs will have to do same
thing.
Using multiple independant clock domains and communicate via
handshakes?
I think lots of FPGA Designs are doing that already.
Just look at the increasing number of available dedicated clock nets
in large FPGAs. Someone gotta need them
Besides that, speed is not the only (wanted or unwanted) advantage of
asynchronous designs. Power is another one.

Have you recently taken a
look into a state of the art PC? What good is the integration of the
several million transistors of a Pentium4 when you need a cooling
plate, the size of a big mans fist plus an extra fan? (Not to mention
the power supply.)

My family is all too aware of AthlonXP heat output, looking to kill it
one day with a Transputer or 2, but 1 should suffice. After all it
only surfs & plays net TV.

Then why don't you dial down the clock and save all that heat? The
Athlon will run at lower voltages as well when you slow it down. The
entire system can run cooler and you might even be able to get rid of
the fan altogether. I did that with a Celeron. Of course I dialed it
back up after I ran the test. I need the speed for FPGA work.

I wonder, besides any asynchronous stuff, if the same space could hold
a bunch of (slow) low power SoC microcomputers, working together as
known from grid computers.

Yes, thats what I am working on, Inmos did this 20yrs ago, probably
20yrs too early. I keep doing the same engineering calculation, Intel
ups the freq of x86 by 30x and gets 30x perf over the p100. BUT
transister count also went up (big no) and heat,noise,space too. That
used to be called bad engineering. Notice that bridge builders today
build lighter bridges today than IKB did many yrs ago.

The intel supporters will pooh pooh that analysis but if you have
distributed cpus & local memory and know how to use them (Transputer
people do), you also get far more total memory b/w than pushing it all
up 1 pipe. Also no reason to be limited to std DRAM, theres RLDRAM
available with 20n RAS times. And with MTA architecture, branching &
memory delays are better hidden than single threaded cpus with ever
bigger caches.

The only problem is the same one Motorola ran into, all the code is
written for the Intel architecture. You can say you have a better
solution, but it really is not practical for a typical app. One CPU
(even a monster Intel with heatsink) is much easier to design a system
with than 30 Transputers. Of course you get higher memory bandwidth,
you have a dozen more pins to memory!

Remember the old days, when a '386 was just a bare PLCC on the PCB and
from that time on all the higher speed and integration started cooling
plates to grow like fungi on the motherbords.

Yes been along time since I saw computer fanless. my old BBC & QL
still got quite warm though!

I bought a Walmart special with a Via CPU (I forget the name of it). It
was not fanless, but I unhooked the fan and it idled nearly at room temp
and only got up to about 55-60C when running a heat test program. That
is a little hot, but with a better cooling design I believe it could be
kept under 50C.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

E. Backhus · Jul 20, 2004

johnjakson@yahoo.com (john jakson) wrote in message news:<adb3971c.0407190711.45a7a53a@posting.google.com>...

ibis@tiscalinet.de (E. Backhus) wrote in message news:<e5e7ca2e.0407182337.570d1070@posting.google.com>...
johnjakson@yahoo.com (john jakson) wrote in message

My family is all too aware of AthlonXP heat output, looking to kill it
one day with a Transputer or 2, but 1 should suffice. After all it
only surfs & plays net TV.

I just didn't dare to mention Transputers

I wonder, besides any asynchronous stuff, if the same space could hold
a bunch of (slow) low power SoC microcomputers, working together as
known from grid computers.

Yes, thats what I am working on, Inmos did this 20yrs ago, probably
20yrs too early. I keep doing the same engineering calculation, Intel
ups the freq of x86 by 30x and gets 30x perf over the p100. BUT
transister count also went up (big no) and heat,noise,space too. That
used to be called bad engineering. Notice that bridge builders today
build lighter bridges today than IKB did many yrs ago.

The intel supporters will pooh pooh that analysis but if you have
distributed cpus & local memory and know how to use them (Transputer
people do), you also get far more total memory b/w than pushing it all
up 1 pipe. Also no reason to be limited to std DRAM, theres RLDRAM
available with 20n RAS times. And with MTA architecture, branching &
memory delays are better hidden than single threaded cpus with ever
bigger caches.

There are lots of pro's and con's about transputer technology, but
what really broke their neck was the high price of the CPU alone
compared to a whole PC with (then) cheap Network cards. The parallel
processing people started grid computing and the controller people
were just happy with their (then) fast controllers(ARM etc.)

Today there is a possibility for the return of integrated parallel
processing architectures. The IEEE1355 and Spacewire Interfaces
(successors of the Transputer links) are available as FPGA-cores, and
combined with a CPU-core and other fancy stuff (e.g. hardware
scheduler) we get powerful Transputer-Substitutes on cool and cheap(?)
FPGA-Silicon.

regards
Eilert Backhus

john jakson · Jul 21, 2004

ibis@tiscalinet.de (E. Backhus) wrote in message news:<e5e7ca2e.0407200343.4837f037@posting.google.com>...

johnjakson@yahoo.com (john jakson) wrote in message news:<adb3971c.0407190711.45a7a53a@posting.google.com>...
ibis@tiscalinet.de (E. Backhus) wrote in message news:<e5e7ca2e.0407182337.570d1070@posting.google.com>...
johnjakson@yahoo.com (john jakson) wrote in message

My family is all too aware of AthlonXP heat output, looking to kill it
one day with a Transputer or 2, but 1 should suffice. After all it
only surfs & plays net TV.

I just didn't dare to mention Transputers

I wonder, besides any asynchronous stuff, if the same space could hold
a bunch of (slow) low power SoC microcomputers, working together as
known from grid computers.

Yes, thats what I am working on, Inmos did this 20yrs ago, probably
20yrs too early. I keep doing the same engineering calculation, Intel
ups the freq of x86 by 30x and gets 30x perf over the p100. BUT
transister count also went up (big no) and heat,noise,space too. That
used to be called bad engineering. Notice that bridge builders today
build lighter bridges today than IKB did many yrs ago.

The intel supporters will pooh pooh that analysis but if you have
distributed cpus & local memory and know how to use them (Transputer
people do), you also get far more total memory b/w than pushing it all
up 1 pipe. Also no reason to be limited to std DRAM, theres RLDRAM
available with 20n RAS times. And with MTA architecture, branching &
memory delays are better hidden than single threaded cpus with ever
bigger caches.

There are lots of pro's and con's about transputer technology, but
what really broke their neck was the high price of the CPU alone
compared to a whole PC with (then) cheap Network cards. The parallel
processing people started grid computing and the controller people
were just happy with their (then) fast controllers(ARM etc.)

Today there is a possibility for the return of integrated parallel
processing architectures. The IEEE1355 and Spacewire Interfaces
(successors of the Transputer links) are available as FPGA-cores, and
combined with a CPU-core and other fancy stuff (e.g. hardware
scheduler) we get powerful Transputer-Substitutes on cool and cheap(?)
FPGA-Silicon.

regards
Eilert Backhus

Exactly, but today it is even more prudent to consider the plusses of
FPGA design and work around the minusses to build a new Transputer or
any cpu for that matter.

Once you have x MHz in FPGA you get maybe 2-5X more in ASIC. I've been
keeping an eye on .13 cell libs, and the critical path in both is
ultimately how fast a dual port BlockRam can cycle for about 512x32.
In Samsung its near 1GHz.

One very nice advantage of MTA is that it allows for even that
bottleneck to be pipelined although all the cells I've seen are single
cycle 1.0ns designs. A fully pipelined DP SRAM could probably go 2x
faster still.

The cost isn't as low as I'd originally hoped, maybe in the 1k to 1.5K
flops ballpark but FPGA does allow interesting architecture to be
tried out for <<10th of the original Tp and probably 10-50x perf so
thats not a bad combination. Anyway the IP will be fully portable.

The SpaceWire,1355 certainly helps but I haven't decided yet on link
layer HW issues.

regards

johnjakson_usa_com

FPGA with fully asynchronous RAM

Fuchs Gottfried

Guest

John_H

Guest

Peter Alfke

Guest

Symon

Guest

Fuchs Gottfried

Guest

Symon

Guest

Peter Alfke

Guest

Nicholas Weaver

Guest

Ray Andraka

Guest

E. Backhus

Guest

john jakson

Guest

E. Backhus

Guest

john jakson

Guest

rickman

Guest

E. Backhus

Guest

john jakson

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

FPGA with fully asynchronous RAM

Fuchs Gottfried

Guest

John_H

Guest

Peter Alfke

Guest

Symon

Guest

Fuchs Gottfried

Guest

Symon

Guest

Peter Alfke

Guest

Nicholas Weaver

Guest

Ray Andraka

Guest

E. Backhus

Guest

john jakson

Guest

E. Backhus

Guest

john jakson

Guest

rickman

Guest

E. Backhus

Guest

john jakson

Guest

Log in

Welcome to EDABoard.com

Sponsor