Why is router software not multi-threaded?

P

Peter Sommerfeld

Guest
Excuse my ignorance if there is an obvious answer ...

Is it true that the routing algorithms used in Altera's and Xilinx'
tools are single-threaded? If so, what is the primary difficulty with
making them multi-threaded? Is the algorithm particularly difficult to
partition into parallel units? Or has it simply not been a priority
for companies making the tools?

Now that some of the projects I am working on are taking quite a while
to fit, I'm thinking it would be nice to parcel out a fit between 5
machines sometime in the future when/if the software supports it.

-- Pete
 
Look at the PAR section of the dev.pdf (Developers Reference Guide)
in your Xilinx doc directory. The -m option of par allows you to specify
a list of workstations to parse out place-and-route runs. I used this 3
years ago in environment of Solaris workstations ..... the effort level to
get it working was minimal.

This does not mean that par is multi-threaded. I just means that you
can run N different place-and-routes using different placement starting
points (cost table entries) across multiple machines.

--
Regards,
John Retta
Owner and Designer
Retta Technical Consulting Inc.

email : jretta@rtc-inc.com
web : www.rtc-inc.com


"Peter Sommerfeld" <petersommerfeld@hotmail.com> wrote in message
news:5c4d983.0401220801.700517ba@posting.google.com...
Excuse my ignorance if there is an obvious answer ...

Is it true that the routing algorithms used in Altera's and Xilinx'
tools are single-threaded? If so, what is the primary difficulty with
making them multi-threaded? Is the algorithm particularly difficult to
partition into parallel units? Or has it simply not been a priority
for companies making the tools?

Now that some of the projects I am working on are taking quite a while
to fit, I'm thinking it would be nice to parcel out a fit between 5
machines sometime in the future when/if the software supports it.

-- Pete
 
Is it true that the routing algorithms used in Altera's and Xilinx'
tools are single-threaded? If so, what is the primary difficulty with
making them multi-threaded? Is the algorithm particularly difficult to
partition into parallel units? Or has it simply not been a priority
for companies making the tools?
This would be nice...Xilinx willing comment?
 
Peter Sommerfeld wrote:
Excuse my ignorance if there is an obvious answer ...

Is it true that the routing algorithms used in Altera's and Xilinx'
tools are single-threaded? If so, what is the primary difficulty with
making them multi-threaded? Is the algorithm particularly difficult to
partition into parallel units? Or has it simply not been a priority
for companies making the tools?
I think by definition, a routing task is a single threaded process.
Although there may be some ways to use multitasking. I belive that a
routing problem is what computer scientists call NP complete. It is a
process of exploring a tree (a very huge tree) finding the shortest path
to a leaf (which represents a completed route). The only algorithm I
know of that can find the best path without exploring the entire tree is
to prune branches when you decide that they are already longer than the
shortest path you have found so far. So at any given node, there are
multiple paths that can be followed which could be partioned out as
multiple tasks, but if this is done at a low level it will be a very
short lived process. Even near the top levels any given subtask could
reach a dead end very quickly.

In the end, no one ever finds the best path. Algorithms are used that
try the most likely *good* paths and hope for the best.

I expect the work involved in splitting off subtasks would be
significant compared to the basic function of searching the tree. But
certainly this is worth a look unless it has already been determined
that the overhead is too large.


Now that some of the projects I am working on are taking quite a while
to fit, I'm thinking it would be nice to parcel out a fit between 5
machines sometime in the future when/if the software supports it.
If you are just trying to get a solution, then multiprocessing would be
a benifit. But for most people who are looking to improve their fit,
multiple runs work pretty well.
 
"John Retta" <jretta@rtc-inc.com> wrote in message news:<hxUPb.22161$1e.13971@newsread2.news.pas.earthlink.net>...
Look at the PAR section of the dev.pdf (Developers Reference Guide)
in your Xilinx doc directory. The -m option of par allows you to specify
a list of workstations to parse out place-and-route runs. I used this 3
years ago in environment of Solaris workstations ..... the effort level to
get it working was minimal.

This does not mean that par is multi-threaded. I just means that you
can run N different place-and-routes using different placement starting
points (cost table entries) across multiple machines.
This feature (-m) is called the Turns Engine and will be available
under Linux beginning with version 7.1i. No multi-threaded support
yet, but it's being considered for a future release.

Regards,
Bret

--
Regards,
John Retta
Owner and Designer
Retta Technical Consulting Inc.

email : jretta@rtc-inc.com
web : www.rtc-inc.com


"Peter Sommerfeld" <petersommerfeld@hotmail.com> wrote in message
news:5c4d983.0401220801.700517ba@posting.google.com...
Excuse my ignorance if there is an obvious answer ...

Is it true that the routing algorithms used in Altera's and Xilinx'
tools are single-threaded? If so, what is the primary difficulty with
making them multi-threaded? Is the algorithm particularly difficult to
partition into parallel units? Or has it simply not been a priority
for companies making the tools?

Now that some of the projects I am working on are taking quite a while
to fit, I'm thinking it would be nice to parcel out a fit between 5
machines sometime in the future when/if the software supports it.

-- Pete
 
(reverse order -- information first, pseudo-rant second :))

Now that some of the projects I am working on are taking quite a while
to fit, I'm thinking it would be nice to parcel out a fit between 5
machines sometime in the future when/if the software supports it.
Quartus has a mode known as Fast Fit. This turns down the fitter effort
levels and disables some of the fancier but expensive algorithms. The
result is a 50% faster compile at the expense of about 10% performance. If
you're hitting your performance target, you might as well enable this
option.

You can also parallelize the multiple compiles across multiple processors
(trying different options, etc.). This doesn't help you with your big
design's compile time, unless you break it up into pieces and use the
modular design features such as LogicLock to compile each piece and then
recombine them in a final compile. But this would likely loose you some
performance too...

Is it true that the routing algorithms used in Altera's and Xilinx'
tools are single-threaded? If so, what is the primary difficulty with
making them multi-threaded? Is the algorithm particularly difficult to
partition into parallel units? Or has it simply not been a priority
for companies making the tools?
Parallelizing a routing (or placement) algorithm is very difficult. The
problem is that the routing of different nets in different
threads/processors may seem like a natural division, except that the routing
of one net will be affected by the routing of another. I guess you could
partition the nets in the device in a way that they are unlikely to interact
with one another, and then route these nets in parallel, and after each
iteration repartition and try to reconcile conflicts (nets using the same
wire twice) the next time around, but I imagine that you rapidly run into
the situation that after the first few iterations, everything left to
resolve is interconnected in some way and you can no longer parallelize...

On top of this, we'd quickly hit Amdahl's law -- if you make routing
parallelizable by N, run time will quickly be dominated by another piece of
P&R and so performance will not get better by a factor of N. If you look at
all the things done during a compile -- synthesis, tech-mapping, clustering,
placement, routing, delay annotation/computation, timing analysis at various
steps, etc. -- that's a lot of very different algorithms that all need to be
parallelized in order to realize a significant gain in compile time.

In addition, the approximations/limitations we would need to impose on the
algorithms in order to parallize them may very well result in a reduction in
achieved performance & fitting. Let's say parallel place & route achieves
10% worse performance for 60% run-time gain. Would this be worth it? Not
if you can instead enable "Fast Fit" mode in serial place and route and try
a little less hard and thus lose 10% performance for a 50% gain...

There is a fair bit of academic literature on the subject. I can't recall
having seen any implementations of parallel place & route that competed well
with the best academic serial place & route tools, but then again I haven't
looked that hard.

Regards,

Paul Leventis
Altera Corp.
 

Welcome to EDABoard.com

Sponsor

Back
Top