S
Simon Peacock
Guest
One extra point.. it might be possible to implement C but less likely is
C++.. C++ supports abstract concepts which are difficult to put into
hardware. The trick of using VHDL or Verilog is often thinking "how would I
implement this in logic?" so abstract is out the door immediately. C is
relatively good at low level so consider a FPGA as something akin to a
driver.
One seriously huge advantage a FPGA has over any processor is massive
parallelisation.. that is it can do many things simultaneously. To an
extreme there are bruit force encryption breakers that simply try every
combination of key in parallel!! This is where you gain... If you are
considering the FPGA as a peripheral to a processor than it will most likely
run as fast as the processor can give it data and take back results. But
give an FPGA freedom and it will direct convert a 1.5 Ghz radio and not
break a sweat.
I would have to suggest that you look at Verilog or VHDL... for starters..
they support parallelism ... next they are typed languages and force
casting. Then there aren't any pointers or gotos
There are simulators for VHDL and Verilog which will allow you to see what
you are doing... Otherwise what you write in C.. which is a sequential
language ... may not be what actually happens... VHDL Processes which
roughly translate to functions (not well) in C will all execute at the same
time. C only handles this by kicking of threads and a C program with 200
threads might not be considered manageable by some.. but that's what VHDL
does! and C's support of semaphores would make a hardware designer cringe.
Simon
"JJ" <johnjakson@yahoo.com> wrote in message
news:1112666920.468559.68270@l41g2000cwc.googlegroups.com...
C++.. C++ supports abstract concepts which are difficult to put into
hardware. The trick of using VHDL or Verilog is often thinking "how would I
implement this in logic?" so abstract is out the door immediately. C is
relatively good at low level so consider a FPGA as something akin to a
driver.
One seriously huge advantage a FPGA has over any processor is massive
parallelisation.. that is it can do many things simultaneously. To an
extreme there are bruit force encryption breakers that simply try every
combination of key in parallel!! This is where you gain... If you are
considering the FPGA as a peripheral to a processor than it will most likely
run as fast as the processor can give it data and take back results. But
give an FPGA freedom and it will direct convert a 1.5 Ghz radio and not
break a sweat.
I would have to suggest that you look at Verilog or VHDL... for starters..
they support parallelism ... next they are typed languages and force
casting. Then there aren't any pointers or gotos
There are simulators for VHDL and Verilog which will allow you to see what
you are doing... Otherwise what you write in C.. which is a sequential
language ... may not be what actually happens... VHDL Processes which
roughly translate to functions (not well) in C will all execute at the same
time. C only handles this by kicking of threads and a C program with 200
threads might not be considered manageable by some.. but that's what VHDL
does! and C's support of semaphores would make a hardware designer cringe.
Simon
"JJ" <johnjakson@yahoo.com> wrote in message
news:1112666920.468559.68270@l41g2000cwc.googlegroups.com...
I'd agree, the PCI will kill you 1st, and any difficult for FPGA but
easy on the PC will kill you again, and finally C++ will not be so fast
as HDL by my estimate maybe 2-5x (my pure prejudice). If you must use C
take a look at HandelC, at least its based on Occam so its provably
able to synthesize into HW coz it ain't really C, just looks like it.
If you absolutely must use IEEE to get particular results forget it,
but I usually find these barriers are artificial, a good amount of
transforms can flip things around entirely.
To be fair an FPGA PCI could wipe out a PC only if the problem is a
natural, say continuously processing a large stream of raw data either
from converters or special interface and then reducing it in some way
to a report level. Perhaps a HD could be specially interfaced to PCI
card to bypass the OS, not sure if that can really help, getting high
end there. Better still if the operators involved are simple but occur
in the hundreds atleast in parallel.
The x86 has atleast a 20x starting clock advantage of 20ops per FPGA
clock for simple inline code. An FPGA solution would really have to be
several times faster to even make it worth considering. A couple of
years ago when PCI was relatively faster and PC & FPGAs relatively
slower, the bottleneck would have been less of a problem.
BUT, I also think that x86 is way overrated atleast when I measure nos.
One thing FPGAs do with relatively no penalty is randomized processing.
The x86 can take a huge hit if the application goes from entirely
inside cache to almost never inside by maybe a factor of 5 but depends
on how close data is temporally and spatially..
Now standing things upside down. Take some arbitrary HW function based
on some simple math that is unnatural to PC, say summing a vector of
13b saturated nos. This uses less HW than the 16b version by about a
quarter, but that sort of thing starts to torture x86 since now each
trivial operator now needs to do a couple of things maybe even perform
a test and bra per point which will hurt bra predictor. Imagine the
test is strictly a random choice, real murder on the predictor and
pipeline.
Taken to its logical extreme, even quite simple projects such as say a
cpu emulator can runs 100s of times slower as a C code than as the
actuall HW even at the FPGA leisurely rate of 1/20th PC clock.
It all depends. One thing to consider though is the system bandwidth in
your problem for moving data into & out of rams or buffers. Even a
modest FPGA can handle a 200 plus reads / writes per clock, where I
suspect most x86 can really only express 1 ld or st to cached location
about every 5 ops. Then the FPGA starts to shine with 200 v 20/4 ratio,
Also when you start in C++, you have already favored the PC since you
likely expressed ints as 32b nos and used FP. If you're using FP when
integer can work, you really stacked the deck but that can often be
undone. When you code in HDL for the data size you actually need you
are favoring the FPGA by the same margin in reverse. Mind you I have
never seen FP math get synthesized, you would have to instantiate a
core for that.
One final option to consider, use an FPGA cpu and take a 20x
performance cut and run the code on that, the hit might not even be 20x
because the SRAM or even DRAM is at your speed rather than 100s slower
than PC. Then look for opportunities to add a special purpose
instruction and see what the impact of 1 kernal op might be. A example
crypto op might easily replace 100 opcodes with just 1 op. Now also
consider you can gang up a few cpus too.
It just depends on what you are doing and whether its mostly IO or
mostly internal crunching.
johnjakson at usa dot com