Need to speed up Stratix compiles.

P

Pete Fraser

Guest
We're currently running a 3 GHz Pentium with 2 GB
memory under Windows 2000.

We hope to speed things up by 15-20%, by going
to AMD X86-64 and / or Linux.

Has anybody tried this?
Any feedback?
 
No, but I'd like to know the results on the AMD 64.

I spead up my 3.2GHz P4 compliles by 20% by making sure my memory was
running at dual channel 400MHz. Turned out it was running dual channel
333MHz.

I had to actually downgrade my memory slightly, because my motherboard saw
cas2 DDR and dropped to 333. With cas 2.5 it was confident to go to 400.

YMMV,
Ken

"Pete Fraser" <pete@rgb.com> wrote in message
news:103vj8jovcv761e@news.supernews.com...
We're currently running a 3 GHz Pentium with 2 GB
memory under Windows 2000.

We hope to speed things up by 15-20%, by going
to AMD X86-64 and / or Linux.

Has anybody tried this?
Any feedback?
 
I went from a 2.5GHz Pentium to a 3GHz Xeon and got a very consistent
33% speed increase in Stratix compiles and SOPC Builder generation. I
suspect the increased cache size is the most critical thing, since
clock rate increased by only 20%, but I'm only speculating. Both
machines ran XP and RAM was 1GB in both machines.

I'm curious to hear how your compiles improve with AMD/Linux.

-- Pete

"Pete Fraser" <pete@rgb.com> wrote in message news:<103vj8jovcv761e@news.supernews.com>...
We're currently running a 3 GHz Pentium with 2 GB
memory under Windows 2000.

We hope to speed things up by 15-20%, by going
to AMD X86-64 and / or Linux.

Has anybody tried this?
Any feedback?
 
Pete Fraser wrote:
We're currently running a 3 GHz Pentium with 2 GB
memory under Windows 2000.

We hope to speed things up by 15-20%, by going
to AMD X86-64 and / or Linux.

Has anybody tried this?
Any feedback?
I can't tell you what to expect from an AMD64 with today's software, but
I expect this will be the platform of choice for the next couple of
years. It may not be the best investment at the moment, but I expect by
the end of the year, much of the software will be optimized for 64 bit
operation and you will see over half the new engineering workstations
running an AMD64 processor.

IIRC, AMD is producing a low cost version of the AMD64. I expect sales
will take off very quickly. Once these start showing up on the software
developer's desks we will see them optimizing for it.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
"Pete Fraser" <pete@rgb.com> writes:

We're currently running a 3 GHz Pentium with 2 GB
memory under Windows 2000.

We hope to speed things up by 15-20%, by going
to AMD X86-64 and / or Linux.

Has anybody tried this?
Quartus II 3.0 does not run on X86-64. See
news:<87k76ox8ya.fsf@zener.home.gustad.com> or http://tinyurl.com/2pvlj

The fix should be trivial since it's just the driver script which does
not recognize the architecture. If it tried to run some X86 code
rather than checking uname it would work. I dunno about 4.0 though.
Anybody tried?

Petter
--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
On Fri, 27 Feb 2004 15:04:46 -0800, Pete Fraser wrote:

We're currently running a 3 GHz Pentium with 2 GB
memory under Windows 2000.
The server versions of Win2000 can handle up to 32GB of RAM, if that's
the main limiting factor. Can get pricey, though :eek:((

We hope to speed things up by 15-20%, by going
to AMD X86-64 and / or Linux.

Has anybody tried this?
Any feedback?
Unless there's a version of the compiler specifically for a 64-bit
architecture, then you're unlikely to see any real speed gain to
justify the cost, and even if there is, I doubt the gains would be all
that impressive.

64-bit CPUs really only come into their own in applications that need
to access very large virtual address spaces (>4GB). Mostly, that's
server-type apps. The need for 64-bit arithmetic is likely very small
in this case.

Can the compiler multi-thread? If so, a mobo with a couple of HT Xeons
(4 CPUs), will give you all the extra horsepower you'll need.
If not, a dual-processor system would still perform a lot better,
since one CPU can work flat-out on the compile, while the other is
handling the OS and other background tasks.

All this assumes that the compiler's performance is, in fact, CPU or
memory bound as you imply. Are you actually certain that this is
indeed the case? Might a faster disk system help?

--
Max
 
Seems to be a common misconception that 64bits just increases the amount of
addressable memory. More importantly for most applications is that twice
the data is moved or operated on per clock cycle.

Ken

"Max" <mtj2@btopenworld.com> wrote in message
news:6ig9409fn2h2futm3hjjmkd4uiv4lpkmu2@4ax.com...
On Fri, 27 Feb 2004 15:04:46 -0800, Pete Fraser wrote:

We're currently running a 3 GHz Pentium with 2 GB
memory under Windows 2000.

The server versions of Win2000 can handle up to 32GB of RAM, if that's
the main limiting factor. Can get pricey, though :eek:((

We hope to speed things up by 15-20%, by going
to AMD X86-64 and / or Linux.

Has anybody tried this?
Any feedback?

Unless there's a version of the compiler specifically for a 64-bit
architecture, then you're unlikely to see any real speed gain to
justify the cost, and even if there is, I doubt the gains would be all
that impressive.

64-bit CPUs really only come into their own in applications that need
to access very large virtual address spaces (>4GB). Mostly, that's
server-type apps. The need for 64-bit arithmetic is likely very small
in this case.

Can the compiler multi-thread? If so, a mobo with a couple of HT Xeons
(4 CPUs), will give you all the extra horsepower you'll need.
If not, a dual-processor system would still perform a lot better,
since one CPU can work flat-out on the compile, while the other is
handling the OS and other background tasks.

All this assumes that the compiler's performance is, in fact, CPU or
memory bound as you imply. Are you actually certain that this is
indeed the case? Might a faster disk system help?

--
Max
 
On the disk speed issue I have one data point. I upgraded my 1GHz PIII-M
laptop drive from a slow 4200 RPM to the fastest 7200 RPM available (for
laptops) and my Nios system build went from about 16 min. to about 15 min.
Not worth the pain and expense of swapping the drive.

On memory, I upgraded the memory in my 3.2 GHz P4 from 512 to 1GB and there
was no noticable difference until I set the memory from 333MHz to 400MHz
dual channel. Then my system build went from 5 min. to 4 min. - 20%.

Ken

"Max" <mtj2@btopenworld.com> wrote in message
news:6ig9409fn2h2futm3hjjmkd4uiv4lpkmu2@4ax.com...
On Fri, 27 Feb 2004 15:04:46 -0800, Pete Fraser wrote:

We're currently running a 3 GHz Pentium with 2 GB
memory under Windows 2000.

The server versions of Win2000 can handle up to 32GB of RAM, if that's
the main limiting factor. Can get pricey, though :eek:((

We hope to speed things up by 15-20%, by going
to AMD X86-64 and / or Linux.

Has anybody tried this?
Any feedback?

Unless there's a version of the compiler specifically for a 64-bit
architecture, then you're unlikely to see any real speed gain to
justify the cost, and even if there is, I doubt the gains would be all
that impressive.

64-bit CPUs really only come into their own in applications that need
to access very large virtual address spaces (>4GB). Mostly, that's
server-type apps. The need for 64-bit arithmetic is likely very small
in this case.

Can the compiler multi-thread? If so, a mobo with a couple of HT Xeons
(4 CPUs), will give you all the extra horsepower you'll need.
If not, a dual-processor system would still perform a lot better,
since one CPU can work flat-out on the compile, while the other is
handling the OS and other background tasks.

All this assumes that the compiler's performance is, in fact, CPU or
memory bound as you imply. Are you actually certain that this is
indeed the case? Might a faster disk system help?

--
Max
 
Hi Max,

All this assumes that the compiler's performance is, in fact, CPU or
memory bound as you imply. Are you actually certain that this is
indeed the case? Might a faster disk system help?
Provided the peak memory consumption of Quartus for the compilation in
question is less than the amount of physical memory in the system,
increasing the amount of memory will not help compile time. For non-trivial
designs, a Quartus compile will be most heavily influenced by CPU speed, and
then by memory sub-system speed -- disk speed will have little influence.

CAD tools process a lot of data. I don't know if a Xeon (bigger cache) is
much faster than a normal P4 (smaller cache), but I wouldn't be surprised if
this were the case for the same reason that a Xeon processor is supposedly
better for server applications -- bigger cache helps applications whose data
set doesn't fit into the cache.

Regards,

Paul Leventis
Altera Corp.
 
Seems to be a common misconception that 64bits just increases the amount
of
addressable memory. More importantly for most applications is that twice
the data is moved or operated on per clock cycle.
64-bitness _is_ mostly about addressable memory -- it is rare that 64-bit
integers help reduce run-time. Please see my previous postings on the topic
and some of the replies to it:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=Paul+Leventis+64-bit

Regards,

Paul Leventis
Altera Corp.
 
Paul Leventis (at home) wrote:
Seems to be a common misconception that 64bits just increases the amount
of addressable memory. More importantly for most applications is that twice
the data is moved or operated on per clock cycle.

64-bitness _is_ mostly about addressable memory -- it is rare that 64-bit
integers help reduce run-time. Please see my previous postings on the topic
and some of the replies to it:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=Paul+Leventis+64-bit
I think the OP was refering to the wider datapaths.
I don't know the cycle level details of the AMD or Intel 64 bit
but an obvious and simple speed gain can come from a wider HW fetch.
(even running < 64 bit opcodes ) and then a simple check if the next
opcode / next data value is in that block.

This works in systems where the CPU must wait for slower downstream
memories, and even the smaller single chip microcontrollers are
starting to do this. eg Philips ARM uC has 128 bit FETCH.
Clearly, random code or data will not be helped, but a large %
of code will be sequential.
I'm not sure the AMD/Intel offerings hit the SIMD (Single
instruction/multiple data ) of other cores, but even without that,
some HW gains would be expected.
-jg
 
Hi Jim,

I think the OP was refering to the wider datapaths.
I don't know the cycle level details of the AMD or Intel 64 bit
but an obvious and simple speed gain can come from a wider HW fetch.
(even running < 64 bit opcodes ) and then a simple check if the next
opcode / next data value is in that block.
Yes, wider memory interfaces/cache data lines can help, but as you say, this
is independent of op-code size. If I recall correctly, AMD and Intel
processors already fetch 64-bit blocks, but this may have been increased.
The latest m/b chipsets for both families of processors use dual-channel DDR
(128-bits wide) and so I would not be surprised if they've increased the
size of fetches.

As vendors introduce 64-bit capable processors (such as Opteron), they often
also enhance various aspects of the CPU architecture in ways that help both
32- and 64-bit code. And while the 64-bitness of x86-64 may not matter much
for speed, the doubling of the register files etc. could result in faster
performance.

It's every computer engineers dream to be a processor architect, isn't it?
:)

Regards,

- Paul
 
Max <mtj2@btopenworld.com> writes:

If you want a 64-bitter to really earn it's corn, use it in something
like a database server with 64GB of RAM and a multi-TB disk farm. Give
Or running synthesis, place & route, static timing analysis etc. on an
ASIC design requiring 6GB RAM.

Petter
--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
"Paul Leventis (at home)" wrote:
Hi Jim,

I think the OP was refering to the wider datapaths.
I don't know the cycle level details of the AMD or Intel 64 bit
but an obvious and simple speed gain can come from a wider HW fetch.
(even running < 64 bit opcodes ) and then a simple check if the next
opcode / next data value is in that block.

Yes, wider memory interfaces/cache data lines can help, but as you say, this
is independent of op-code size. If I recall correctly, AMD and Intel
processors already fetch 64-bit blocks, but this may have been increased.
The latest m/b chipsets for both families of processors use dual-channel DDR
(128-bits wide) and so I would not be surprised if they've increased the
size of fetches.

As vendors introduce 64-bit capable processors (such as Opteron), they often
also enhance various aspects of the CPU architecture in ways that help both
32- and 64-bit code. And while the 64-bitness of x86-64 may not matter much
for speed, the doubling of the register files etc. could result in faster
performance.

It's every computer engineers dream to be a processor architect, isn't it?
:)
We can all speculate about the relative merits of processor
enhancements, but these machines are very complex and the only real way
to tell what helps is to try it. Since we are not all ancient Greeks
philosophizing in our armchairs, it would be a good idea to pick a
design and to run it on a few different workstations, hopefully
including an AMD64.

I have always been surprised that the FPGA vendors don't put some effort
into evaluating platforms and releasing the results. I know this can be
a bit of a can of worms, but every time I look at buying a new machine,
the first question I research is how fast it will run the FPGA design
software. Then I am often trying to speculate on my own since I don't
have much info to go on.

I seem to recall that there at least used to be some available info on
how much memory was needed to optimize run time as a function of part
size. But I haven't seen new info on that in quite a while.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
On Tue, 2 Mar 2004 19:57:58 -0600, Kenneth Land wrote:

Seems to be a common misconception that 64bits just increases the amount of
addressable memory.
The only common misconception is that swapping for a 64-bit processor
in a desktop PC will lead to a large performance increase. It doesn't.
(Other than any gain from a higher clock speed, of course.)

Like to make a guess as to the extra overhead in a 64-bit version of
current OSs, btw?

More importantly for most applications is that twice
the data is moved or operated on per clock cycle.
Data is only data if it's meaningful. The use of 64-bit arithmetic
variables is comparatively rare in most applications. Certain
scientific and CAD packages do make heavy use of 64-bit floats, but I
doubt that's the case here (and high-end processors tend to use 80-bit
data paths around the FPU anyway). There's not a lot to be gained from
accessing memory in 64-bit chunks if you're only interested in 32 of
them (there is an effect on cache hits with vectors, but it's not
measurably worthwhile in practice).

There will be some effect on prefetch, but it depends on the state of
the L1 and L2 caches and the instruction pipeline(s) themselves. Tests
I've seen suggest an increase of memory bandwidth efficiency of only
around 1-2% at best.

If you want a 64-bitter to really earn it's corn, use it in something
like a database server with 64GB of RAM and a multi-TB disk farm. Give
the poor thing something *meaningful* to do with the extra 32 bits.
You'd still need 64-bit software though.

--
Max
 
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:404603D6.AA20F818@yahoo.com...

We can all speculate about the relative merits of processor
enhancements, but these machines are very complex and the only real way
to tell what helps is to try it. Since we are not all ancient Greeks
philosophizing in our armchairs, it would be a good idea to pick a
design and to run it on a few different workstations, hopefully
including an AMD64.

I have always been surprised that the FPGA vendors don't put some effort
into evaluating platforms and releasing the results.
I had assumed that had happened already. Silly me.

Perhaps we'll just buy an AMD machine and see what it does,
but I thought somebody might have tried that already.

Anybody know how solid the Quartus II 4.0 Linux port is?
I can't get an answer out of Altera.
 
On Tue, 2 Mar 2004 20:05:37 -0600, Kenneth Land wrote:

On the disk speed issue I have one data point. I upgraded my 1GHz PIII-M
laptop drive from a slow 4200 RPM to the fastest 7200 RPM available (for
laptops) and my Nios system build went from about 16 min. to about 15 min.
Not worth the pain and expense of swapping the drive.
Not in a low-spec machine like that, no. The options in a laptop are
limited, and there's no way to increase the disk controller bandwidth.
But the effect on a powerful workstation of installing a RAID with a
high-bandwidth controller and drives such as U-320 SCSI can have a
dramatic impact. As always though, it depends on the application.

On memory, I upgraded the memory in my 3.2 GHz P4 from 512 to 1GB and there
was no noticable difference until I set the memory from 333MHz to 400MHz
dual channel. Then my system build went from 5 min. to 4 min. - 20%.
That doesn't mean a lot. You only need to add more memory if you're
running out of it ;o)

--
Max
 
Max <mtj2@btopenworld.com> writes:

Is there any possibility of making Quartus multi-threaded? That
strikes me as the most likely way to get a dramatic performance
increase, though I know it's not always easy to achieve with heuristic
apps.
I would like to get see synthesis and place and route tools I could
run on a cluster of cheap PC's. I would be happy with less than linear
speedups, e.g. using a 16-node cluster to get a 8x speedup.

Petter
--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
On Wed, 03 Mar 2004 04:47:19 GMT, Paul Leventis (at home) wrote:

Provided the peak memory consumption of Quartus for the compilation in
question is less than the amount of physical memory in the system,
increasing the amount of memory will not help compile time. For non-trivial
designs, a Quartus compile will be most heavily influenced by CPU speed, and
then by memory sub-system speed -- disk speed will have little influence.
I suspected that might be the case, but I wasn't quite sure.
I'm more used to programming language tools that use library files
extensively, where a fast disk system (or a big ramdisk) can give very
worthwhile speed gains.

Is there any possibility of making Quartus multi-threaded? That
strikes me as the most likely way to get a dramatic performance
increase, though I know it's not always easy to achieve with heuristic
apps.

CAD tools process a lot of data. I don't know if a Xeon (bigger cache) is
much faster than a normal P4 (smaller cache), but I wouldn't be surprised if
this were the case for the same reason that a Xeon processor is supposedly
better for server applications -- bigger cache helps applications whose data
set doesn't fit into the cache.
While the extra cache is important in itself, much of the performance
gain of the Xeon is also due to the greater degree of parallelism and
deeper prefetch lookahead, thus making better use of memory bandwidth
throughout.

--
Max
 
On 03 Mar 2004 18:46:34 +0100, Petter Gustad wrote:

I would like to get see synthesis and place and route tools I could
run on a cluster of cheap PC's. I would be happy with less than linear
speedups, e.g. using a 16-node cluster to get a 8x speedup.
I doubt you'd get anywhere near. Trying to implement those algorithms
efficiently on the sort of loosely-coupled architecture you propose
would be nigh-on impossible. It's not easy on a single SMP box, but
it's doable.

A quad Xeon (8 x CPU) box would cost less than four single decent-spec
machines anyway.

--
Max
 

Welcome to EDABoard.com

Sponsor

Back
Top