J
JustJohn
Guest
On Jan 24, 11:36 am, "wallge" <wal...@gmail.com> wrote:
designers, is to use what is sometimes called tiling the image into the
((DDR)S)DRAM columns.
E.g., assume a 1Kx1K image with vertical and horizontal address bits
(V9..V0) and (H9..H0), and also DRAM with row and column address bits
(R9..R0) and (C9..C0). Do _not_ use the straight mapping of:
(V9..V0) <=> (R9..R0) and (H9..H0) <=> (C9..C0)
Instead, map the H/V LSBs into the DRAM column address, and the H/V
MSBs to the DRAM row address:
(V4..V0,H4..H0) <=> (C9..C0) and (V9..V5,H9..H5) <=> (R9..R0)
When warping, the image sample addresses are pipelined out to DRAM,
with time designed into the pipeline to examine the addresses for DRAM
row boundary crossings, and only stall when it's necessary to re-RAS.
Stalls only occur when the sampling area overlaps the edge of a tile,
instead of with every 2x2 or 3x3 fetch.
(You posed your question so well, I'll bet this already occurred to
you)
Caching can also be used to bypass the external RAM access pipeline
when the required pixels are already in the FPGA. There are lots of
different caching techniques, I haven't looked at that in a while.
Block processing is a kind of variant of caching, reading a tile from
external DRAM into BRAM, warping from that BRAM into another BRAM, then
sending the results back out, but border calculations get messy for
completely arbitrary warps.
HTH
Just John
A fairly simple technique, reasonably well-known among video systemI am doing some embedded video processing, where I store an incoming
frame of video, then based on some calculations in another part of the
system, I warp that buffered frame of video. Now when the frame goes
into the buffer
(an off-FPGA SDRAM chip), it is simply written in one pixel at a time
in row major ordering.
The problem with this is that I will not be accessing it in this way. I
may want to do some arbitrary image rotation. This means
the first pixel I want to access is not the first one I put in the
buffer, It might actually be the last one in the buffer. If I am doing
full page reads, or even burst reads, I will get a bunch of pixels that
I will not need to determine the output pixel value. If i just do
single reads, this waists a bunch of clock cycles setting up the SDRAM,
telling it which row to activate and which column to read from. After
the read is done, you then have to issue the precharge command to close
the row. There is a high degree of inefficiency to this. It takes 5,
maybe 10 clock cycles just to retrieve one
pixel value.
Does anyone know a good way to organize a frame buffer to be more
friendly (and more optimal) to nonsequential access (like the kind we
might need if we wanted to warp the input image via some
linear/nonlinear transformation)?
designers, is to use what is sometimes called tiling the image into the
((DDR)S)DRAM columns.
E.g., assume a 1Kx1K image with vertical and horizontal address bits
(V9..V0) and (H9..H0), and also DRAM with row and column address bits
(R9..R0) and (C9..C0). Do _not_ use the straight mapping of:
(V9..V0) <=> (R9..R0) and (H9..H0) <=> (C9..C0)
Instead, map the H/V LSBs into the DRAM column address, and the H/V
MSBs to the DRAM row address:
(V4..V0,H4..H0) <=> (C9..C0) and (V9..V5,H9..H5) <=> (R9..R0)
When warping, the image sample addresses are pipelined out to DRAM,
with time designed into the pipeline to examine the addresses for DRAM
row boundary crossings, and only stall when it's necessary to re-RAS.
Stalls only occur when the sampling area overlaps the edge of a tile,
instead of with every 2x2 or 3x3 fetch.
(You posed your question so well, I'll bet this already occurred to
you)
Caching can also be used to bypass the external RAM access pipeline
when the required pixels are already in the FPGA. There are lots of
different caching techniques, I haven't looked at that in a while.
Block processing is a kind of variant of caching, reading a tile from
external DRAM into BRAM, warping from that BRAM into another BRAM, then
sending the results back out, but border calculations get messy for
completely arbitrary warps.
HTH
Just John