W
wallge
Guest
I was wondering if anyone had experience with using combinations of
FPGA based CPUs and surrounding logic to perform iterative algorithms.
For instance, if we want to implement different types of more complex
computer vision algorithms in an embedded system, we may wish to use
the parallelism of an fpga to do multiple parts of a 2d convolution or
matrix operation in parallel.
While the FPGA may be able to handle the number crunching requirements
of a given algorithm, it seems to me to be ill suited to handle the
iterative (often non-systolic) nature of many advanced image processing
algorithms. Often more complex computer vision algorithms seem to be
too complex to be handled solely by FPGA based logic.
I was thinking of the case were we have an FPGA connected directly to a
video source, and data is flowing into the system at some fixed rate.
We may wish to process this data at several scales, and iteratively
search the low scales up to the higher ones until we have found
features of interest in the video stream. Perhaps we wish to mark those
features by altering pixels in their local neighborhood.
We may need to iteratively process multiple scales of image data and
buffer the original video frame in off-FPGA DRAM, since there will not
be enough on-FPGA BRAM to store full images. Once we find the region of
interest, we may then wish to retrieve the original to be marked and
then sent off as output video. A good example of this process might be,
say, face detection.
It seems to me that the iterative nature of these kinds of algorithms
needs to be handled by a combination of CPU and FPGA logic. The FPGA
handling the number crunching and parallel data
paths, and the CPU handling the notion of when to iterate, or when to
stop, or in general, what decision to take next based on the results of
the FPGA's number crunching. The CPU could be built from programmable
logic, or placed off-FPGA.
Does anyone have experience with this kind of thing, or know of
somewhere I might be able to find more information about optimal ways
of coupling heterogenous processors?
I am aware of Altera's C2H compiler, but have not used it, and don't
know how optimally it combines FPGA/CPU resources.
I might be in the market to hire a consultant, if one were
knowledgeable in this area.
FPGA based CPUs and surrounding logic to perform iterative algorithms.
For instance, if we want to implement different types of more complex
computer vision algorithms in an embedded system, we may wish to use
the parallelism of an fpga to do multiple parts of a 2d convolution or
matrix operation in parallel.
While the FPGA may be able to handle the number crunching requirements
of a given algorithm, it seems to me to be ill suited to handle the
iterative (often non-systolic) nature of many advanced image processing
algorithms. Often more complex computer vision algorithms seem to be
too complex to be handled solely by FPGA based logic.
I was thinking of the case were we have an FPGA connected directly to a
video source, and data is flowing into the system at some fixed rate.
We may wish to process this data at several scales, and iteratively
search the low scales up to the higher ones until we have found
features of interest in the video stream. Perhaps we wish to mark those
features by altering pixels in their local neighborhood.
We may need to iteratively process multiple scales of image data and
buffer the original video frame in off-FPGA DRAM, since there will not
be enough on-FPGA BRAM to store full images. Once we find the region of
interest, we may then wish to retrieve the original to be marked and
then sent off as output video. A good example of this process might be,
say, face detection.
It seems to me that the iterative nature of these kinds of algorithms
needs to be handled by a combination of CPU and FPGA logic. The FPGA
handling the number crunching and parallel data
paths, and the CPU handling the notion of when to iterate, or when to
stop, or in general, what decision to take next based on the results of
the FPGA's number crunching. The CPU could be built from programmable
logic, or placed off-FPGA.
Does anyone have experience with this kind of thing, or know of
somewhere I might be able to find more information about optimal ways
of coupling heterogenous processors?
I am aware of Altera's C2H compiler, but have not used it, and don't
know how optimally it combines FPGA/CPU resources.
I might be in the market to hire a consultant, if one were
knowledgeable in this area.