Info on FPGA routing algorithms?

Fred Ma · Jul 9, 2004

I've been reading papers about routing in island-style FPGAs. Most
cite Xilinx architectures, though I've looked at a few papers about
H-tree networks. Often, there is a simplified model being used.
There is mention (kind of dated) that commercial routers use
derivatives of maze routing, with some more recent mention of channel
routing. Is there some papers that can give a good idea of how the
real industry software does global and detailed routing, what
algorithms are actually used? What is the typical lag time between
the advent of certain approaches in conference/journal papers versus
uptake in commercial routers? I'm kind of curious how much I can
should trust the papers as an indication of actual practice. As well,
I am still in rummaging mode, and have yet to rummage into a paper
that shows how the switches in the switch boxes are actually explored
to get detailed routing, given a non-full crossbar. I've looked at

Wu &Tsukiyama et al.: Graph analysis of 2D FPGA routing

but I'm hoping to rummage into something more applied.

Thanks.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
Ottawa, Ontario, Canada

Paul Leventis (at home) · Jul 10, 2004

Hi Fred,

These days, the best academic routers do combined global + detailed routing
using negototiated congestion. The base algorithm employed is known as
Pathfinder -- see "Placement and Routing Tools for the Triptych FPGA" by C.
Ebeling, L. McMurchie, S.A. Hauck, and S. Burns. A widely used tool is VPR
(by Vaughn Betz), which performs clustering, followed by placement (via
simulated annealing), and then timing-driven negotiated-congestion based
routing. It uses an improved form of the Pathfinder algorithm, but what
exact improvements it adds I forget, since VPR was commercialized and I no
longer remember what was in the original academic version! I *think* that
VPR (or some improved version of it) is still the best academic router,
quickly providing small channel widths and good timing.

There is a book "Architecture and CAD for Deep-Submicron FPGAs" by Betz,
Rose, and Marquardt that covers the algortihms employed in VPR and T-VPACK
(a timing-driven clusterer by Marquardt). It also covers the architecture
of island-style FPGAs and various experiments on the detailed routing
design. And there are a slew of papers at FPGA, ICCAD, FPL and other
conferences on FPGA routing. Go to http://www.eecg.toronto.edu/~vaughn/ to
see information on the "purple book" plus a list of papers on VPR by
Vaughn -- the refernces in these papers should serve as good starting point.

Regards,

Paul Leventis
Altera Corp.

Yttrium · Jul 10, 2004

thanx for the info, was looking for the same information ...

thx

"Paul Leventis (at home)" <paul.leventis@utoronto.ca> wrote in message
news:uWQHc.850486$Ar.77113@twister01.bloor.is.net.cable.rogers.com...

Hi Fred,

These days, the best academic routers do combined global + detailed
routing
using negototiated congestion. The base algorithm employed is known as
Pathfinder -- see "Placement and Routing Tools for the Triptych FPGA" by
C.
Ebeling, L. McMurchie, S.A. Hauck, and S. Burns. A widely used tool is
VPR
(by Vaughn Betz), which performs clustering, followed by placement (via
simulated annealing), and then timing-driven negotiated-congestion based
routing. It uses an improved form of the Pathfinder algorithm, but what
exact improvements it adds I forget, since VPR was commercialized and I no
longer remember what was in the original academic version! I *think* that
VPR (or some improved version of it) is still the best academic router,
quickly providing small channel widths and good timing.

There is a book "Architecture and CAD for Deep-Submicron FPGAs" by Betz,
Rose, and Marquardt that covers the algortihms employed in VPR and T-VPACK
(a timing-driven clusterer by Marquardt). It also covers the architecture
of island-style FPGAs and various experiments on the detailed routing
design. And there are a slew of papers at FPGA, ICCAD, FPL and other
conferences on FPGA routing. Go to http://www.eecg.toronto.edu/~vaughn/
to
see information on the "purple book" plus a list of papers on VPR by
Vaughn -- the refernces in these papers should serve as good starting
point.

Regards,

Paul Leventis
Altera Corp.

Fred Ma · Jul 10, 2004

"Paul Leventis (at home)" wrote:

These days, the best academic routers do combined global + detailed routing
using negototiated congestion. The base algorithm employed is known as
Pathfinder -- see "Placement and Routing Tools for the Triptych FPGA" by C.
Ebeling, L. McMurchie, S.A. Hauck, and S. Burns. A widely used tool is VPR
(by Vaughn Betz), which performs clustering, followed by placement (via
simulated annealing), and then timing-driven negotiated-congestion based
routing. It uses an improved form of the Pathfinder algorithm, but what
exact improvements it adds I forget, since VPR was commercialized and I no
longer remember what was in the original academic version! I *think* that
VPR (or some improved version of it) is still the best academic router,
quickly providing small channel widths and good timing.

There is a book "Architecture and CAD for Deep-Submicron FPGAs" by Betz,
Rose, and Marquardt that covers the algortihms employed in VPR and T-VPACK
(a timing-driven clusterer by Marquardt). It also covers the architecture
of island-style FPGAs and various experiments on the detailed routing
design. And there are a slew of papers at FPGA, ICCAD, FPL and other
conferences on FPGA routing. Go to http://www.eecg.toronto.edu/~vaughn/ to
see information on the "purple book" plus a list of papers on VPR by
Vaughn -- the refernces in these papers should serve as good starting point.

Thanks, Paul,

I did in fact encounter Vaughn Betz's papers on VPR. There is pretty
intuitive description of the VPR, and Pathfinder is described as an
iterative application of shortest path. I'm still gnawing
breadth-first on promising references from web-found papers. I'll
look more closely at Pathfinder to see if it details how the switchbox
settings are determined. Thanks for the pointer to Vaughn's book.

If anyone can comment on how relatively wide-spring are the various
algorithms, both in academia and industry, that would be helpful.
Papers often reference other papers, but don't actually indicate which
algorithms are used by which commercial tools, and how prevalent are
the various commercial tools. Vaughn's website says Right Track is
now part of Altera, so maybe Altera's own tools may start using ideas
from VPR. I have yet to come across information about what was the
algorithm prior to this. What about Xilinx? Is it the case that they
don't like to disclose the actual inner workings of their native
tools? How common is it to use 3rd party tools for place-and-route
rather than those from the vendor? In general, information about
which commercial tool (vendors) use which algorithms seems much rarer
than just the algorithms presented in isolation.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
Ottawa, Ontario, Canada

Paul Leventis (at home) · Jul 12, 2004

Hi Fred,

I'll look more closely at Pathfinder to see if it details how the
switchbox
settings are determined.

I'm not quite sure what you mean by this. If you're referring to how the
topology of the switchbox is determined (an architectural decision), there
are some other papers I can point you to.

From a routing algorithm perspective, it is really quite simple. You
represent the entire chip as a graph where each node represents a routing
resource -- a block input, block output, or routing wire. Directed edges
are placed between these nodes to represent the presence of a (programmable)
connection from one resource to another. So a swich box that is not a
complete cross-bar is encoded in this graph as edges. For each net, the
router starts at the source block output, traverses the edges from that
node, and assigns a desirability/cost to each of the nodes seen and places
them in a heap. It then removes the best node from the heap and repeats
from there until it hits the desired destination (a block input). This is
known as an a-star or best-first traversal, and clearly all the
work/intelligence lies in how you cost the nodes (including a prediction of
future cost), how you guide the router to correct congestion, etc. For a
multi-terminal net, the routing thus far is placed on the heap and the
router expands from there. There are optimizations that can be done to
reduce the amount of re-expansion of the wavefront, and other techniques for
quickly routing high-fanout nets.

Global routing (not worrying about precise wires and switch box settings) is
done using the same algorithm, except that rather than nodes representing
one wire, you represent a group of wires with a node of capacity = channel
width. A well-written global + detailed router does not require a global
route and performs fast enough that there really isn't any point to look at
global routing (in FPGAs) anymore.

If anyone can comment on how relatively wide-spring are the various
algorithms, both in academia and industry, that would be helpful.
Papers often reference other papers, but don't actually indicate which
algorithms are used by which commercial tools, and how prevalent are
the various commercial tools. Vaughn's website says Right Track is
now part of Altera, so maybe Altera's own tools may start using ideas
from VPR. I have yet to come across information about what was the
algorithm prior to this. What about Xilinx? Is it the case that they
don't like to disclose the actual inner workings of their native
tools? How common is it to use 3rd party tools for place-and-route
rather than those from the vendor? In general, information about
which commercial tool (vendors) use which algorithms seems much rarer
than just the algorithms presented in isolation.

We don't like to tell each other what we do in our tools for obvious reasons

VPR served as the basis for Right Track CAD's work, and Right Track
provided Altera with enhanced place & route results for their FLEX10K
products in MaxPlus II. What the algorithms were before or what they've
become since is not a matter of public record.

Regards,

Paul Leventis
Altera Corp.

Joseph H Allen · Jul 12, 2004

Do any placement algorithms try to make regular structures? For datapaths,
for example? This is how a human would do it, and might give excellent
results in some cases.

Although, you certainly want to architect FPGAs with enough routing
resources so that regular structures are not needed.

--
/* jhallen@world.std.com (192.74.137.5) */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--

q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0

%79-77?1:0

<1659?79:0

>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817

printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Tom Seim · Jul 12, 2004

Fred Ma <fma@doe.carleton.ca> wrote in message news:<40EF0407.8F0A430E@doe.carleton.ca>...

I've been reading papers about routing in island-style FPGAs. Most
cite Xilinx architectures, though I've looked at a few papers about
H-tree networks. Often, there is a simplified model being used.
There is mention (kind of dated) that commercial routers use
derivatives of maze routing, with some more recent mention of channel
routing. Is there some papers that can give a good idea of how the
real industry software does global and detailed routing, what
algorithms are actually used? What is the typical lag time between
the advent of certain approaches in conference/journal papers versus
uptake in commercial routers? I'm kind of curious how much I can
should trust the papers as an indication of actual practice. As well,
I am still in rummaging mode, and have yet to rummage into a paper
that shows how the switches in the switch boxes are actually explored
to get detailed routing, given a non-full crossbar. I've looked at

Wu &Tsukiyama et al.: Graph analysis of 2D FPGA routing

but I'm hoping to rummage into something more applied.

1 On optimum switch box designs for 2-D FPGAs
Hongbing Fan; Jiping Liu; Yu-Liang Wu; Chak-Chung Cheung;
Design Automation Conference, 2001. Proceedings , 18-22 June 2001
Pages:203 - 208

2 General models and a reduction design technique for FPGA switch box
designs
Hongbing Fan; Jiping Liu; Yu-Liang Wu;
Computers, IEEE Transactions on , Volume: 52 , Issue: 1 , Jan. 2003
Pages:21 - 30

3 Not necessarily more switches more routability [sic.]
Yu-Liang Wu; Chang, D.; Marek-Sadowska, M.; Tsukiyama, S.;
Design Automation Conference 1997. Proceedings of the ASP-DAC '97.
Asia and South Pacific , 28-31 Jan. 1997
Pages:579 - 584

4 The effect of switch box flexibility on routability of field
programmable gate arrays
Rose, J.; Brown, S.;
Custom Integrated Circuits Conference, 1990., Proceedings of the IEEE
1990 , 13-16 May 1990
Pages:27.5/1 - 27.5/4

5 General models for optimum arbitrary-dimension FPGA switch box
designs
Hongbing Fan; Jiping Liu; Yu-Liang Wu;
Computer Aided Design, 2000. ICCAD-2000. IEEE/ACM International
Conference on , 5-9 Nov. 2000
Pages:93 - 98

6 Graph based analysis of 2-D FPGA routing
Yu-Liang Wu; Tsukiyama, S.; Marek-Sadowska, M.;
Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on , Volume: 15 , Issue: 1 , Jan. 1996
Pages:33 - 44

7 On improving FPGA routability applying multi-level switch boxes
Jiping Liu; Hongbing Fan; Yu-Liang Wu;
Design Automation Conference, 2003. Proceedings of the ASP-DAC 2003.
Asia and South Pacific , 21-24 Jan. 2003
Pages:366 - 369

8 On computational complexity of a detailed routing problem in two
dimensional FPGAs
Yu-Liang Wu; Shuji Tsukiyama; Malgorzata Marek-Sadowska;
VLSI, 1994. 'Design Automation of High Performance VLSI Systems'. GLSV
'94, Proceedings., Fourth Great Lakes Symposium on , 4-5 March 1994
Pages:70 - 75

9 On optimal hyperuniversal and rearrangeable switch box designs
Hongbing Fan; Jiping Liu; Yu-Liang Wu; Chak-Chung Cheung;
Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on , Volume: 22 , Issue: 12 , Dec. 2003
Pages:1637 - 1649

10 A global routing model for universal switch box design
Hongbing Fan; Jiping Liu; Yu-Liang Wu;
Electronics, Circuits and Systems, 2000. ICECS 2000. The 7th IEEE
International Conference on , Volume: 1 , 17-20 Dec. 2000
Pages:78 - 81 vol.1

Fred Ma · Jul 13, 2004

"Paul Leventis (at home)" wrote:

I'll look more closely at Pathfinder to see if it details how the
switchbox
settings are determined.

I'm not quite sure what you mean by this. If you're referring to how the
topology of the switchbox is determined (an architectural decision), there
are some other papers I can point you to.

From a routing algorithm perspective, it is really quite simple. You
represent the entire chip as a graph where each node represents a routing
resource -- a block input, block output, or routing wire. Directed edges
are placed between these nodes to represent the presence of a (programmable)
connection from one resource to another. So a swich box that is not a
complete cross-bar is encoded in this graph as edges. For each net, the
router starts at the source block output, traverses the edges from that
node, and assigns a desirability/cost to each of the nodes seen and places
them in a heap. It then removes the best node from the heap and repeats
from there until it hits the desired destination (a block input). This is
known as an a-star or best-first traversal, and clearly all the
work/intelligence lies in how you cost the nodes (including a prediction of
future cost), how you guide the router to correct congestion, etc. For a
multi-terminal net, the routing thus far is placed on the heap and the
router expands from there. There are optimizations that can be done to
reduce the amount of re-expansion of the wavefront, and other techniques for
quickly routing high-fanout nets.

I sort of got lost with the heap, but don't worry, I was just getting
a rough idea. If needed, I will look up A* (I have papers on it).
The representation for wires/switches above matches that in
Pathfinder, and I noticed that Betz's VPR also has tricks to cut down
some work in restarting the wave front for multiterminal nets. The
use of arcs to represent switches seems to get rid of the division
between detailed and global routing.

Global routing (not worrying about precise wires and switch box settings) is
done using the same algorithm, except that rather than nodes representing
one wire, you represent a group of wires with a node of capacity = channel
width. A well-written global + detailed router does not require a global
route and performs fast enough that there really isn't any point to look at
global routing (in FPGAs) anymore.

Yes, I was noticing that in recent papers.

We don't like to tell each other what we do in our tools for obvious reasons
VPR served as the basis for Right Track CAD's work, and Right Track
provided Altera with enhanced place & route results for their FLEX10K
products in MaxPlus II. What the algorithms were before or what they've
become since is not a matter of public record.

Aaawwww. OK. Thanks. I understanding why there is a paucity of such
information.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
Ottawa, Ontario, Canada

Fred Ma · Jul 13, 2004

Thanks for the references, Tom. I looked briefly at the abstracts of a few,
and they are quite relevant. I will look at them in more detail.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
Ottawa, Ontario, Canada

Fred Ma · Jul 13, 2004

Joseph H Allen wrote:

Do any placement algorithms try to make regular structures? For datapaths,
for example? This is how a human would do it, and might give excellent
results in some cases.

Joseph, I'm no expert in the area, but seem to recall seeing this.
Look up the authors of C compilers for Garp, I think one of them
works on a fast linear placement algorithm. It might be from arraying
bit slices. I expect the granularity of one's slice to be highly
dependent on the target platforms array architecture.

Although, you certainly want to architect FPGAs with enough routing
resources so that regular structures are not needed.

That's the constant balance, it seems. Deciding ow to focused to make
your application domain, which determines the suite of applications
(and kinds of operations) you need to support, which helps you avoid
excessive flexibility in the array architecture. It's pretty vague,
but it seems to be the general motivation for reconfigurable logic.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
Ottawa, Ontario, Canada

Info on FPGA routing algorithms?

Fred Ma

Guest

Paul Leventis (at home)

Guest

Yttrium

Guest

Fred Ma

Guest

Paul Leventis (at home)

Guest

Joseph H Allen

Guest

Tom Seim

Guest

Fred Ma

Guest

Fred Ma

Guest

Fred Ma

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Info on FPGA routing algorithms?

Fred Ma

Guest

Paul Leventis (at home)

Guest

Yttrium

Guest

Fred Ma

Guest

Paul Leventis (at home)

Guest

Joseph H Allen

Guest

Tom Seim

Guest

Fred Ma

Guest

Fred Ma

Guest

Fred Ma

Guest

Log in

Welcome to EDABoard.com

Sponsor