graphic card accelarator vs. FPGA: which is better for the f

W

walala

Guest
Dear all,

I guess this is a ray-tracing problem... But I need to do this task in as
high as possible speed/throughput. Here is my problem:

Suppose I am given 25 rays and I am given a 3D cube and all parameters of
these rays and cube are given...

I need to compute the length of the intersecting segment of the rays with
this cube as fast as possible. If some rays completely fall outside of the
cube, then it outputs 0, otherwise gives the length.

I heard there are some very good graphic card with accelerator... and I
heard about the bus bandwidth to be as high as 500MHz... I am not sure if
they have good accelaration function for doing my task?

I also think of doing this using an FPGA which is hooked onto a Intel PC
with Linux... I don't know the details, but I guess it uses PCI or other bus
to interact with the CPU and serve as an coprocessor...

I want to know which method is better?

Considering that after solving this throughput problem, the next bottleneck
will be a 1GB memory that I need... I wonder if the graphic card has 1GB
cache/memory inside it? Since a lot time it needs to do triple-buffling, I
guess... it should have a high speed huge memory, right?

I also don't know what is the maximum processing speed of a high-end
graphical card comparing with a high end FPGA implementation?

Can anybody give me some comments/suggestions/advice/hints/pointers on this?

Thanks a lot,

-Walalal
 
Hi!

I guess this is a ray-tracing problem... But I need to do this task in as
high as possible speed/throughput. Here is my problem:

Suppose I am given 25 rays and I am given a 3D cube and all parameters of
these rays and cube are given...

I need to compute the length of the intersecting segment of the rays with
this cube as fast as possible. If some rays completely fall outside of the
cube, then it outputs 0, otherwise gives the length.
- Are the vertices of the cube parallel to the coordinate axles?
- Is the anything special about the cube (size, orientation, rotation,
location, etc.)?
- In what format are the rays and the cube defined?

I heard there are some very good graphic card with accelerator... and I
heard about the bus bandwidth to be as high as 500MHz... I am not sure if
they have good accelaration function for doing my task?
Possibly no. First, you would have to get data *back* from the accelerator
which is something they are not designed for. As someone said to me once,
they operate in a 'write and forget' mode. Second, none of the accelerators
I know of do ray-tracing. However your question is not a complete ray-trace
problem, so you might be able to tweak the functions of an accelerator to
give you your answer.

I also think of doing this using an FPGA which is hooked onto a Intel PC
with Linux... I don't know the details, but I guess it uses PCI or other
bus
to interact with the CPU and serve as an coprocessor...
That's a possiblity. You can find PCI FPGA prototyping cards for this
purpose.


BTW, if you need to process 1GB of data (assume that's the total amount of
traffic) you would need at least at least 7.75 seconds just to transfer the
data over a 33MHz PCI bus, not counting other PCI traffic, and other issues.
If that's too slow, you would need a) 66MHz b) 64bit c) PIX-X bus and of
course a PC that supports these.

I want to know which method is better?
Depends on many things:
- Speed requirement (as fast as possible is not enough)
- Numerical Precision
- Price concerns
- Project deadlines
- More precise description of the problem (see above)

Considering that after solving this throughput problem, the next
bottleneck
will be a 1GB memory that I need... I wonder if the graphic card has 1GB
cache/memory inside it?
None I've heard of.

Since a lot time it needs to do triple-buffling, I
guess... it should have a high speed huge memory, right?
Huge means 128-256MB nowadays.

I also don't know what is the maximum processing speed of a high-end
graphical card comparing with a high end FPGA implementation?
Impossible to answer in general.

Can anybody give me some comments/suggestions/advice/hints/pointers on
this?

What I would suggest is to write a SW only solution for your problem. That
would give you, or anyone else a pretty good definition of the problem.
After this, you probably will be able to state much better questions.

Regards,
Andras Tantos
 
"Andras Tantos" <andras_tantos@tantos.yahoo.com> wrote in message news:<3fbe66ee$1@news.microsoft.com>...
Hi!

I guess this is a ray-tracing problem... But I need to do this task in as
high as possible speed/throughput. Here is my problem:

Suppose I am given 25 rays and I am given a 3D cube and all parameters of
these rays and cube are given...

I need to compute the length of the intersecting segment of the rays with
this cube as fast as possible. If some rays completely fall outside of the
cube, then it outputs 0, otherwise gives the length.

- Are the vertices of the cube parallel to the coordinate axles?
- Is the anything special about the cube (size, orientation, rotation,
location, etc.)?
- In what format are the rays and the cube defined?

I heard there are some very good graphic card with accelerator... and I
heard about the bus bandwidth to be as high as 500MHz... I am not sure if
they have good accelaration function for doing my task?

Possibly no. First, you would have to get data *back* from the accelerator
which is something they are not designed for. As someone said to me once,
they operate in a 'write and forget' mode. Second, none of the accelerators
I know of do ray-tracing. However your question is not a complete ray-trace
problem, so you might be able to tweak the functions of an accelerator to
give you your answer.

I also think of doing this using an FPGA which is hooked onto a Intel PC
with Linux... I don't know the details, but I guess it uses PCI or other
bus
to interact with the CPU and serve as an coprocessor...

That's a possiblity. You can find PCI FPGA prototyping cards for this
purpose.


BTW, if you need to process 1GB of data (assume that's the total amount of
traffic) you would need at least at least 7.75 seconds just to transfer the
data over a 33MHz PCI bus, not counting other PCI traffic, and other issues.
If that's too slow, you would need a) 66MHz b) 64bit c) PIX-X bus and of
course a PC that supports these.

I want to know which method is better?

Depends on many things:
- Speed requirement (as fast as possible is not enough)
- Numerical Precision
- Price concerns
- Project deadlines
- More precise description of the problem (see above)

Considering that after solving this throughput problem, the next
bottleneck
will be a 1GB memory that I need... I wonder if the graphic card has 1GB
cache/memory inside it?

None I've heard of.

Since a lot time it needs to do triple-buffling, I
guess... it should have a high speed huge memory, right?

Huge means 128-256MB nowadays.

I also don't know what is the maximum processing speed of a high-end
graphical card comparing with a high end FPGA implementation?

Impossible to answer in general.

Can anybody give me some comments/suggestions/advice/hints/pointers on
this?

What I would suggest is to write a SW only solution for your problem. That
would give you, or anyone else a pretty good definition of the problem.
After this, you probably will be able to state much better questions.

Regards,
Andras Tantos


Hi, Andras,

Thank you very much for your answer!

I guess the first thing I need to make myself clear is that what is
the essence of this problem? Is it a ray-tracing problme or collision
detection problem?

I need to identify the name of the problem first then I can go out and
search for similar application cases...

Can you help me on that?

Thanks a lot,

-Walala
 
Hi, Andras,

Thank you very much for your answer!

I guess the first thing I need to make myself clear is that what is
the essence of this problem? Is it a ray-tracing problme or collision
detection problem?

I need to identify the name of the problem first then I can go out and
search for similar application cases...

Can you help me on that?

Thanks a lot,

-Walala
Hi!

I would think your problem is an intersection problem, but only you can find
out the true nature of your problem.

Andras
 
Hi, Andras,

Thank you very much for your answer!

I guess the first thing I need to make myself clear is that what is
the essence of this problem? Is it a ray-tracing problme or collision
detection problem?

I need to identify the name of the problem first then I can go out and
search for similar application cases...

Can you help me on that?

Thanks a lot,

-Walala
Hi!

I would think your problem is an intersection problem, but only you can find
out the true nature of your problem.

Andras
 
"walala" <mizhael@yahoo.com> wrote in message news:<bpjug7$1so$1@mozo.cc.purdue.edu>...
Dear all,

I guess this is a ray-tracing problem... But I need to do this task in as
high as possible speed/throughput. Here is my problem:

Suppose I am given 25 rays and I am given a 3D cube and all parameters of
these rays and cube are given...

I need to compute the length of the intersecting segment of the rays with
this cube as fast as possible. If some rays completely fall outside of the
cube, then it outputs 0, otherwise gives the length.
Let the cube be given by three normal vectors n1 to n3 and six points
p1 to p6 on the six planes.
(Actually you can use the same point multiple times)
Assume your rays start in the origin and are given by a vector r of
length 1.
Then the interscetions with the first plane happens at a distance d of
d1= n.p1/(n1.r)= n.p1 * (1/(n1.r))
See http://geometryalgorithms.com/Archive/algorithm_0104/algorithm_0104B.htm#Line-Plane%20Intersection
Then you order the planes according to d.
If the ray does not cross the three front planes first, the cube is
missed, otherwise the difference between the fourth and the third
distance is the length of the intersection.

So for each ray you get three devisions, four multiplications and a
couple of minmax cells.
(Many ore optimizations due to symmetries possible.)

With integers you should be able to do that in a small Spartan-III in
a pipeline a lot faster than you can get data into the chip.

With floating point numbers it should be still very fast in an FPGA,
but the design gets a lot more complicated and larger.

Have fun,

Kolja Sulimma
 
walala wrote:

Considering that after solving this throughput problem, the next
bottleneck will be a 1GB memory that I need... I wonder if the graphic
card has 1GB cache/memory inside it? Since a lot time it needs to do
triple-buffling, I guess... it should have a high speed huge memory,
right?
Graphics cards use AGP, now x8 that means
AGP x8 interface for 2.1 GB/sec bandwidth
Since this is MUCH faster than PCI bus they handle huge amounts of data
a lot better.

BUT suppose someone builds a AGP x8 board with a fast FPGA.
Then you suddenly have reduced the data transfer bottleneck.

Suppose you use a Xilinx Pro, or add a PowerPC as an option together
with monitor out circuits.
Then you will get quite an interesting board that could be used
to run the X server for Unix/Linux :)

But you can still buy PCI graphic boards...

/RogerL
--
Roger Larsson
Skellefteĺ
Sweden
 
"Kolja Sulimma" <news@sulimma.de> wrote in message
news:b890a7a.0311260523.6d9f4f77@posting.google.com...
Let the cube be given by three normal vectors n1 to n3 and six points
p1 to p6 on the six planes.
(Actually you can use the same point multiple times)
Assume your rays start in the origin and are given by a vector r of
length 1.
Then the interscetions with the first plane happens at a distance d of
d1= n.p1/(n1.r)= n.p1 * (1/(n1.r))
See
http://geometryalgorithms.com/Archive/algorithm_0104/algorithm_0104B.htm#Line-Plane%20Intersection
Then you order the planes according to d.
If the ray does not cross the three front planes first, the cube is
missed, otherwise the difference between the fourth and the third
distance is the length of the intersection.

So for each ray you get three devisions, four multiplications and a
couple of minmax cells.
(Many ore optimizations due to symmetries possible.)

With integers you should be able to do that in a small Spartan-III in
a pipeline a lot faster than you can get data into the chip.

With floating point numbers it should be still very fast in an FPGA,
but the design gets a lot more complicated and larger.

Have fun,

Kolja Sulimma
Thanks a lot, Koja,

Very informative,,... I need to digest your answer...

-Walala
 

Welcome to EDABoard.com

Sponsor

Back
Top