DMA operation to 64-bits PC platform

Frank van Eijkelenburg · Jul 1, 2010

Hi,

I have a custom made PCIe board with a Virtex 5 FPGA on which I
implemented a DMA unit which uses the PCIe endpoint block plus v1.14.
I also implemented simple read/write operations from the PC to the
board (the board responds with completion TLPs). The read/write
operations are working, DMA is not working

The board is inserted in a pc with Windows 7 64 bits platform. An
application allocates virtual memory and passes the memory block to
the driver. The driver locks the memory and converts the virtual
addresses into physical addresses. These physical addresses are
written to the FPGA.

When I start an DMA operation, I can see in chipscope the correct
physical addresses in the TLP header. However, I do not see the
correct values in the allocated memory. What can I do to check where
it is going wrong?

Another question is about the memory request TLPs. What should I use,
32 or 64 bit write requests? Or do I have to check runtime if the
physical memory address is below or above the 4 GB (and use
respectively 32 and 64 bit requests)?

Thanks in advance,

Frank

maxascent · Jul 1, 2010

I have done a similar design myself but using Windows 7, 32-bit. I us
32-bit TLPs and have had no problems with the design. BTW I have use
Windriver to generate the device driver.

Jon

---------------------------------------
Posted through http://www.FPGARelated.com

Charles Gardiner · Jul 2, 2010

Frank van Eijkelenburg schrieb:

Hi,

I have a custom made PCIe board with a Virtex 5 FPGA on which I
implemented a DMA unit which uses the PCIe endpoint block plus v1.14.
I also implemented simple read/write operations from the PC to the
board (the board responds with completion TLPs). The read/write
operations are working, DMA is not working

The board is inserted in a pc with Windows 7 64 bits platform. An
application allocates virtual memory and passes the memory block to
the driver. The driver locks the memory and converts the virtual
addresses into physical addresses. These physical addresses are
written to the FPGA.

How are you doing this? Normally, an application requests a buffer using malloc()
or new() and gets a handle to the driver using CreateFile(). You then use
WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or
DeviceIoControl() to initiate a transfer to/from the device. Thats the
application side.

On the driver(kernel) side, I would strongly recommend that you write a KMDF based
driver. Download the windows WDK, all it costs is your email. (You have to log in
over Microsoft Connect, last time I looked). There are lots of examples there,
including for PCI(e) based DMA. To (very quickly) summarise, your driver requests
the scatter/gather list describing the buffers (see
WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a starting point)
above and passes these to your hardware one-by-one which then does DMA in or out.
With a call to WdfRequestComplete the buffers are released by the kernel and your
application can reuse them or free them up as required. (This is of course all
considerably more than a days work, by the way.)

You do not have to explicitly lock down the buffer yourself. Windows does this for
you while the I/O request is active. (Read/WriteFile from your app up to
WdfRequestComplete from the driver)

When I start an DMA operation, I can see in chipscope the correct
physical addresses in the TLP header. However, I do not see the
correct values in the allocated memory. What can I do to check where
it is going wrong?

In this case, I would first doubt whether the addresses are correct.

Another question is about the memory request TLPs. What should I use,
32 or 64 bit write requests? Or do I have to check runtime if the
physical memory address is below or above the 4 GB (and use
respectively 32 and 64 bit requests)?

The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a transfer
above 4 GB must use a 4 DWord header. i.e. a four dword header wth address[63:32]
set to zero is invalid.

Thanks in advance,

Frank

Michael S · Jul 2, 2010

On Jul 1, 5:03 pm, Frank van Eijkelenburg <fei.technolut...@gmail.com>
wrote:

Another question is about the memory request TLPs. What should I use,
32 or 64 bit write requests? Or do I have to check runtime if the
physical memory address is below or above the 4 GB (and use
respectively 32 and 64 bit requests)?

Thanks in advance,

Frank

Memory accesses below 4GB have to use 3DW (=32-bit) TLP headers.
4DW TLP headers addressing memory below 4GB are prohibited by PCIe
standard although they would occasionally work on some chipsets, e.g.
on Intel 5000P/5000X series.

Charles Gardiner · Jul 2, 2010

Hi Frank,

The way it works is as follows:
- the application allocates the memory (malloc).
- a pointer to this memory is passed to the driver (custom made
driver).
- the driver creates a scatter-gather list by using the
GetScatterGatherList method from the DMA_ADAPTER object.

You are aware of the following text from the Microsoft WDK Docu? Particularily the
first line.

GetScatterGatherList is not a system routine that can be called directly by name.

This routine is callable only by pointer from the address returned in a
DMA_OPERATIONS structure. Drivers obtain the address of this routine by calling
IoGetDmaAdapter.

As soon as the appropriate DMA channel and any necessary map registers are
available, GetScatterGatherList creates a scatter/gather list, initializes the map
registers, and then calls the driver-supplied AdapterListControl routine to carry
out the I/O operation.

GetScatterGatherList combines the actions of the AllocateAdapterChannel and
MapTransfer routines for drivers that perform scatter/gather DMA.
GetScatterGatherList determines how many map registers are required for the
transfer, allocates the map registers, maps the buffers for DMA, and fills in the
scatter/gather list. It then calls the supplied AdapterListControl routine,
passing a pointer to the scatter/gather list in ScatterGather. The driver should
retain this pointer for use when calling PutScatterGatherList. Note that
GetScatterGatherList does not have the queuing restrictions that apply to
AllocateAdapterChannel.

In its AdapterListControl routine, the driver should perform the I/O. On return
from the driver-supplied routine, GetScatterGatherList keeps the map registers but
frees the DMA adapter structure. The driver must call PutScatterGatherList (which
flushes the buffers) before it can access the data in the buffer.

- the driver writes each entry of the scatter-gather list (which
contains a physical address and length) to the FPGA.
- the FPGA receives data (though another interface) and writes this
data to the memory of the pc by use of DMA (just generates write
requests).
- after writing the data the FPGA generates an interrupt of PCIe (not
working yet, but we know when the FPGA finished a transaction).

I now understand I have to verify runtime if the physical address is
below or above 4 GB and use a 3 DW respectively 4 DW TLP header. I
will change that in the FPGA and give it a try.

About the addresses, these are correct. We did the following test:
write the virtual memory from the application and read the memory by
using the physical addresses in the driver. In the driver we read what
the application has written.

Any other suggestions?

If you are convinced the addresses are correct, I would look at two other things.

1) Is you driver completing the request properly IoCompleteRequest()

2) Are the data being cached somewhere, Here, I would try a zero length read (from
the driver. PCIe TLP with length 1 and all BEs zero) on the last address
transferred to memory. Just discard the resulting completion. The PCIe spec says
the system must intrepret this as a flush.

By the way which buffering method is your driver using for the DMA transfer
(Buffered, Direct, Neither)

Frank

Frank van Eijkelenburg · Jul 2, 2010

On Jul 2, 2:19 am, Charles Gardiner <charles.gardi...@invalid.invalid>
wrote:

Frank van Eijkelenburg schrieb:

Hi,

I have a custom made PCIe board with a Virtex 5 FPGA on which I
implemented a DMA unit which uses the PCIe endpoint block plus v1.14.
I also implemented simple read/write operations from the PC to the
board (the board responds with completion TLPs). The read/write
operations are working, DMA is not working

The board is inserted in a pc with Windows 7 64 bits platform. An
application allocates virtual memory and passes the memory block to
the driver. The driver locks the memory and converts the virtual
addresses into physical addresses. These physical addresses are
written to the FPGA.

How are you doing this? Normally, an application requests a buffer using malloc()
or new() and gets a handle to the driver using CreateFile(). You then use
WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or
DeviceIoControl() to initiate a transfer to/from the device. Thats the
application side.

On the driver(kernel) side, I would strongly recommend that you write a KMDF based
driver. Download the windows WDK, all it costs is your email. (You have to log in
over Microsoft Connect, last time I looked). There are lots of examples there,
including for PCI(e) based DMA. To (very quickly) summarise, your driver requests
the scatter/gather list describing the buffers (see
WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a starting point)
above and passes these to your hardware one-by-one which then does DMA in or out.
With a call to WdfRequestComplete the buffers are released by the kernel and your
application can reuse them or free them up as required. (This is of course all
considerably more than a days work, by the way.)

You do not have to explicitly lock down the buffer yourself. Windows does this for
you while the I/O request is active. (Read/WriteFile from your app up to
WdfRequestComplete from the driver)

When I start an DMA operation, I can see in chipscope the correct
physical addresses in the TLP header. However, I do not see the
correct values in the allocated memory. What can I do to check where
it is going wrong?

In this case, I would first doubt whether the addresses are correct.

Another question is about the memory request TLPs. What should I use,
32 or 64 bit write requests? Or do I have to check runtime if the
physical memory address is below or above the 4 GB (and use
respectively 32 and 64 bit requests)?

The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a transfer
above 4 GB must use a 4 DWord header. i.e. a four dword header wth address[63:32]
set to zero is invalid.

Thanks in advance,

Frank

The way it works is as follows:
- the application allocates the memory (malloc).
- a pointer to this memory is passed to the driver (custom made
driver).
- the driver creates a scatter-gather list by using the
GetScatterGatherList method from the DMA_ADAPTER object.
- the driver writes each entry of the scatter-gather list (which
contains a physical address and length) to the FPGA.
- the FPGA receives data (though another interface) and writes this
data to the memory of the pc by use of DMA (just generates write
requests).
- after writing the data the FPGA generates an interrupt of PCIe (not
working yet, but we know when the FPGA finished a transaction).

I now understand I have to verify runtime if the physical address is
below or above 4 GB and use a 3 DW respectively 4 DW TLP header. I
will change that in the FPGA and give it a try.

About the addresses, these are correct. We did the following test:
write the virtual memory from the application and read the memory by
using the physical addresses in the driver. In the driver we read what
the application has written.

Any other suggestions?

Frank

Frank van Eijkelenburg · Jul 2, 2010

On Jul 2, 9:41 am, Charles Gardiner <charles.gardi...@invalid.invalid>
wrote:

Hi Frank,

The way it works is as follows:
- the application allocates the memory (malloc).
- a pointer to this memory is passed to the driver (custom made
driver).
- the driver creates a scatter-gather list by using the
GetScatterGatherList method from the DMA_ADAPTER object.

You are aware of the following text from the Microsoft WDK Docu? Particularily the
first line.

GetScatterGatherList is not a system routine that can be called directly by name.
This routine is callable only by pointer from the address returned in a
DMA_OPERATIONS structure. Drivers obtain the address of this routine by calling
IoGetDmaAdapter.

As soon as the appropriate DMA channel and any necessary map registers are
available, GetScatterGatherList creates a scatter/gather list, initializes the map
registers, and then calls the driver-supplied AdapterListControl routine to carry
out the I/O operation.

GetScatterGatherList combines the actions of the AllocateAdapterChannel and
MapTransfer routines for drivers that perform scatter/gather DMA.
GetScatterGatherList determines how many map registers are required for the
transfer, allocates the map registers, maps the buffers for DMA, and fills in the
scatter/gather list. It then calls the supplied AdapterListControl routine,
passing a pointer to the scatter/gather list in ScatterGather. The driver should
retain this pointer for use when calling PutScatterGatherList. Note that
GetScatterGatherList does not have the queuing restrictions that apply to
AllocateAdapterChannel.

In its AdapterListControl routine, the driver should perform the I/O. On return
from the driver-supplied routine, GetScatterGatherList keeps the map registers but
frees the DMA adapter structure. The driver must call PutScatterGatherList (which
flushes the buffers) before it can access the data in the buffer.

- the driver writes each entry of the scatter-gather list (which
contains a physical address and length) to the FPGA.
- the FPGA receives data (though another interface) and writes this
data to the memory of the pc by use of DMA (just generates write
requests).
- after writing the data the FPGA generates an interrupt of PCIe (not
working yet, but we know when the FPGA finished a transaction).

I now understand I have to verify runtime if the physical address is
below or above 4 GB and use a 3 DW respectively 4 DW TLP header. I
will change that in the FPGA and give it a try.

About the addresses, these are correct. We did the following test:
write the virtual memory from the application and read the memory by
using the physical addresses in the driver. In the driver we read what
the application has written.

Any other suggestions?

If you are convinced the addresses are correct, I would look at two other things.

1) Is you driver completing the request properly IoCompleteRequest()

2) Are the data being cached somewhere, Here, I would try a zero length read (from
the driver. PCIe TLP with length 1 and all BEs zero) on the last address
transferred to memory. Just discard the resulting completion. The PCIe spec says
the system must intrepret this as a flush.

By the way which buffering method is your driver using for the DMA transfer
(Buffered, Direct, Neither)

Frank

Hi Charles,

I am not sure if we understand each other. What do you mean by
completing the request with IoCompleteRequest? There is no request
from software point of view. The FPGA will do a DMA write (data from
FPGA to PC memory) at its own initiative. The allocated memory is used
as long as the software is running. I do not allocate new memory for
each new DMA transfer, but at startup a large piece of memory is
allocated and the physical addresses are written to the FPGA by the
driver software.

And yes, we use a DMA adapter in combination with the
GetScatterGatherList method. We already used this in another project
but that was PCI and DMA read (data from PC memory to FPGA).

By the way, where can I set the type of DMA?

best regards,

Frank

Charles Gardiner · Jul 2, 2010

Hi Frank,

I am not sure if we understand each other.

Yes, it certainly sounds like that.

What do you mean by
completing the request with IoCompleteRequest? There is no request
from software point of view.

I think this might clear up the reason why your data is missing. (See also below
about the type of DMA). I don't think the S/G list you are getting is describing
your application buffer. This is best done by specifying DO_DIRECT_IO as the DMA
method for your device. If you specify DO_BUFFERED_IO you will get an S/G List
describing an intermediate buffer in kernel space and this probably never gets
copied over to your application space buffer unless you terminate the request.
I've never done the 'neither' method myself and from what I hear, it's a
complicated beast.

The FPGA will do a DMA write (data from
FPGA to PC memory) at its own initiative. The allocated memory is used
as long as the software is running. I do not allocate new memory for
each new DMA transfer, but at startup a large piece of memory is
allocated and the physical addresses are written to the FPGA by the
driver software.

Sounds like you are doing something like a circular buffer in memory which stays
alive as long as your device does?

And yes, we use a DMA adapter in combination with the
GetScatterGatherList method. We already used this in another project
but that was PCI and DMA read (data from PC memory to FPGA).

By the way, where can I set the type of DMA?

Typically, you set the DMA buffering method in your AddDevice function after you
create your device object. Quoting from Oney's book,

NTSTATUS AddDevice(..) {
PDEVICE_OBJECT fdo;

IoCreateDevice(....., &fdo);
fdo->Flags |= DO_BUFFERED_IO;
<or>
fdo->Flags |= DO_DIRECT_IO;
<or>
fdo->Flags |= 0; // i.e. neither Direct nor Buffered

And, you can't change your mind afterwards.

By the way if my assumption about the circular buffer in your design is correct,
there is a slightly more standard solution (standard in the sense that everybody
on the microsoft drivers newgroup seems to do it). It however requires two threads
in your application. The first one requests a buffer (using new or malloc) and
sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing this
buffer. This is performed as an asynchronous request.

The driver recognises this request and pends it indefinitely, (typically terminate
it when your driver is shutting down, otherwise windows will probably hang).
Pending the request has the nice side effect that the buffer now becomes locked
down permanently.

Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get the S/G
list describing the application space buffer as you are currently doing and feed
this to your FPGA.

Using the second thread in your application you can constantly read data from the
locked down pages (you app. space buffer) that are being written by your FPGA.

Assuming the DO_DIRECT_IO solves your problem (I think there is a good chance), I
would however still consider migrating to a KMDF based driver, particularily if
you are writing a new one. It's much easier to maintain and is probably more
portable for future MS versions.

best regards,

Frank

best regards,
Charles

Nico Coesel · Jul 2, 2010

Frank van Eijkelenburg <fei.technolution@gmail.com> wrote:

On Jul 2, 2:19=A0am, Charles Gardiner <charles.gardi...@invalid.invalid
wrote:
Frank van Eijkelenburg schrieb:

Hi,

I have a custom made PCIe board with a Virtex 5 FPGA on which I
implemented a DMA unit which uses the PCIe endpoint block plus v1.14.
I also implemented simple read/write operations from the PC to the
board (the board responds with completion TLPs). The read/write
operations are working, DMA is not working

The board is inserted in a pc with Windows 7 64 bits platform. An
application allocates virtual memory and passes the memory block to
the driver. The driver locks the memory and converts the virtual
addresses into physical addresses. These physical addresses are
written to the FPGA.

How are you doing this? Normally, an application requests a buffer using =
malloc()
or new() and gets a handle to the driver using CreateFile(). You then use
WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or
DeviceIoControl() to initiate a transfer to/from =A0the device. Thats the
application side.

On the driver(kernel) side, I would strongly recommend that you write a K=
MDF based
driver. Download the windows WDK, all it costs is your email. (You have t=
o log in
over Microsoft Connect, last time I looked). There are lots of examples t=
here,
including for PCI(e) based DMA. To (very quickly) summarise, your driver =
requests
the scatter/gather list describing the buffers (see
WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a starti=
ng point)
above and passes these to your hardware one-by-one which then does DMA in=
or out.
With a call to WdfRequestComplete the buffers are released by the kernel =
and your
application can reuse them or free them up as required. (This is of cours=
e all
considerably more than a days work, by the way.)

You do not have to explicitly lock down the buffer yourself. Windows does=
this for
you while the I/O request is active. (Read/WriteFile from your app up to
WdfRequestComplete from the driver)

When I start an DMA operation, I can see in chipscope the correct
physical addresses in the TLP header. However, I do not see the
correct values in the allocated memory. What can I do to check where
it is going wrong?

In this case, I would first doubt whether the addresses are correct.

Another question is about the memory request TLPs. What should I use,
32 or 64 bit write requests? Or do I have to check runtime if the
physical memory address is below or above the 4 GB (and use
respectively 32 and 64 bit requests)?

The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a tr=
ansfer
above 4 GB must use a 4 DWord header. i.e. a four dword header wth addres=
s[63:32]
set to zero is invalid.

Thanks in advance,

Frank

The way it works is as follows:
- the application allocates the memory (malloc).
- a pointer to this memory is passed to the driver (custom made
driver).

I strongly doubt you can use a malloc pointer to a driver. Actually
I'm quite sure this doesn't work. When the driver is active, the
application memory may be swapped to the hard-drive. And the pointer
must be translated to a physical address.

I'd go the other way around: have the driver allocate the memory and
pass a pointer to this memory to the application (this will require
some messing around with translation and access rights).

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

Michael S · Jul 3, 2010

On Jul 2, 11:35 pm, n...@puntnl.niks (Nico Coesel) wrote:

Frank van Eijkelenburg <fei.technolut...@gmail.com> wrote:

On Jul 2, 2:19=A0am, Charles Gardiner <charles.gardi...@invalid.invalid
wrote:
Frank van Eijkelenburg schrieb:

Hi,

I have a custom made PCIe board with a Virtex 5 FPGA on which I
implemented a DMA unit which uses the PCIe endpoint block plus v1.14..
I also implemented simple read/write operations from the PC to the
board (the board responds with completion TLPs). The read/write
operations are working, DMA is not working

The board is inserted in a pc with Windows 7 64 bits platform. An
application allocates virtual memory and passes the memory block to
the driver. The driver locks the memory and converts the virtual
addresses into physical addresses. These physical addresses are
written to the FPGA.

How are you doing this? Normally, an application requests a buffer using > >malloc()
or new() and gets a handle to the driver using CreateFile(). You then use
WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or
DeviceIoControl() to initiate a transfer to/from =A0the device. Thats the
application side.

On the driver(kernel) side, I would strongly recommend that you write a K> >MDF based
driver. Download the windows WDK, all it costs is your email. (You have t> >o log in
over Microsoft Connect, last time I looked). There are lots of examples t> >here,
including for PCI(e) based DMA. To (very quickly) summarise, your driver > >requests
the scatter/gather list describing the buffers (see
WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a starti> >ng point)
above and passes these to your hardware one-by-one which then does DMA in> > or out.
With a call to WdfRequestComplete the buffers are released by the kernel > >and your
application can reuse them or free them up as required. (This is of cours> >e all
considerably more than a days work, by the way.)

You do not have to explicitly lock down the buffer yourself. Windows does> > this for
you while the I/O request is active. (Read/WriteFile from your app up to
WdfRequestComplete from the driver)

When I start an DMA operation, I can see in chipscope the correct
physical addresses in the TLP header. However, I do not see the
correct values in the allocated memory. What can I do to check where
it is going wrong?

In this case, I would first doubt whether the addresses are correct.

Another question is about the memory request TLPs. What should I use,
32 or 64 bit write requests? Or do I have to check runtime if the
physical memory address is below or above the 4 GB (and use
respectively 32 and 64 bit requests)?

The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a tr> >ansfer
above 4 GB must use a 4 DWord header. i.e. a four dword header wth addres> >s[63:32]
set to zero is invalid.

Thanks in advance,

Frank

The way it works is as follows:
- the application allocates the memory (malloc).
- a pointer to this memory is passed to the driver (custom made
driver).

I strongly doubt you can use a malloc pointer to a driver. Actually
I'm quite sure this doesn't work. When the driver is active, the
application memory may be swapped to the hard-drive. And the pointer
must be translated to a physical address.

Nah, malloc() at application level is o.k.
If I/O operation is specified as DIRECT_IO then I/O manager takes care
of locking the pages.
If operation is specified as NEITHER then driver itself should call
MmProbeAndLockPages in user context (in this case you should never
install filter drivers in between your driver and app).
In both cases it is very important to not complete the IRP associated
with the user buffer until the finish all DMA activities.

If I/O operation is specified as BUFFERED_IO then I/O manager
allocates kernel buffer and passes it to the driver and copies the
results from kernel to user buffer after driver completed the IRP.
Obviously, BUFFERED_IO is not suitable for OPs case, since he want the
result back without completing the original I/O request.

I'd go the other way around: have the driver allocate the memory and
pass a pointer to this memory to the application (this will require
some messing around with translation and access rights).

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

On general-purpose system allocation of big buffers by driver is
rarely a good idea. On the system dedicated to just one task it could
be pragmatically o.k, but I still don't like it from pure theoretical
point of view.

Anyway, the discussion doesn't belong here. I recommend
http://groups.google.com/group/microsoft.public.development.device.drivers

Michael S · Jul 3, 2010

On Jul 2, 9:04 pm, Charles Gardiner <charles.gardi...@invalid.invalid>
wrote:

Hi Frank,

I am not sure if we understand each other.

Yes, it certainly sounds like that.

What do you mean by
completing the request with IoCompleteRequest? There is no request
from software point of view.

I think this might clear up the reason why your data is missing. (See also below
about the type of DMA). I don't think the S/G list you are getting is describing
your application buffer. This is best done by specifying DO_DIRECT_IO as the DMA
method for your device. If you specify DO_BUFFERED_IO you will get an S/G List
describing an intermediate buffer in kernel space and this probably never gets
copied over to your application space buffer unless you terminate the request.
I've never done the 'neither' method myself and from what I hear, it's a
complicated beast.

The FPGA will do a DMA write (data from
FPGA to PC memory) at its own initiative. The allocated memory is used
as long as the software is running. I do not allocate new memory for
each new DMA transfer, but at startup a large piece of memory is
allocated and the physical addresses are written to the FPGA by the
driver software.

Sounds like you are doing something like a circular buffer in memory which stays
alive as long as your device does?

And yes, we use a DMA adapter in combination with the
GetScatterGatherList method. We already used this in another project
but that was PCI and DMA read (data from PC memory to FPGA).

By the way, where can I set the type of DMA?

Typically, you set the DMA buffering method in your AddDevice function after you
create your device object. Quoting from Oney's book,

NTSTATUS AddDevice(..) {
PDEVICE_OBJECT fdo;

IoCreateDevice(....., &fdo);
fdo->Flags |= DO_BUFFERED_IO;
<or
fdo->Flags |= DO_DIRECT_IO;
<or
fdo->Flags |= 0; // i.e. neither Direct nor Buffered

And, you can't change your mind afterwards.

By the way if my assumption about the circular buffer in your design is correct,
there is a slightly more standard solution (standard in the sense that everybody
on the microsoft drivers newgroup seems to do it). It however requires two threads
in your application. The first one requests a buffer (using new or malloc) and
sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing this
buffer. This is performed as an asynchronous request.

There is absolutely no need for a second thread. Just issue "submit
buffer" overlapped I/O request in a first thread and don't wait for
completion.

The driver recognizes this request and pends it indefinitely, (typically terminate
it when your driver is shutting down, otherwise windows will probably hang).

That's pretty bad advice. As a minimum, you should install the cancel
handler and complete the request on cancellation. Besides, normally
you should stop all DMA activities and complete the request in CLEANUP
routine.
Also it is always a good idea to have custom "stop DMA and withdraw
circular buffer" IOCTL for a normal user-initiated termination.

Pending the request has the nice side effect that the buffer now becomes locked
down permanently.

Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get the S/G
list describing the application space buffer as you are currently doing and feed
this to your FPGA.

Using the second thread in your application you can constantly read data from the
locked down pages (you app. space buffer) that are being written by your FPGA.

Assuming the DO_DIRECT_IO solves your problem (I think there is a good chance), I
would however still consider migrating to a KMDF based driver, particularily if
you are writing a new one. It's much easier to maintain and is probably more
portable for future MS versions.

best regards,

Frank

best regards,
Charles

Yes, KMDF MUCH is easier than plain WDM.

Charles Gardiner · Jul 4, 2010

Michael S schrieb:

Anyway, the discussion doesn't belong here. I recommend
http://groups.google.com/group/microsoft.public.development.device.drivers

No it doesn't. It belongs just where it is. If you followed from the beginning,
the OP has two problems/questions

1) How to do 64-Bit addressing in PCIe. This is definitely FPGA related.

2) Why are his data not appearing as expected. It only became clear in the course
of discussion that this may be due to his driver DMA type setting. (He hasn't
confirmed yet)

Nit-picking when everything is (maybe) solved is easy when you haven't really been
contributing constructively.

Charles Gardiner · Jul 4, 2010

Michael S schrieb:

I insist that subthread started by Nico Coesel could be discussed in
mpddd with better depth and precision.

You insist? Do I care?

If you followed from the beginning, you would realize that I
constructively contributed the answer to the first question 4 minutes
ahead of you

4 mins.? WOW. Incredible. Impressive, Awesome.....

What I see in this posting is a terse reply to the easy part of the problem with
not a single thought spent on the more complex part, even though this is really
the OPs show stopper.

For the rest of your postings, I mainly see blowholed personal seals of approval
assigned to what others have said (aka nit-picking). All coming in "posthumously"
(still assuming that specifying DO_DIRECT_IO actually solves the problem)

I really love attending meetings when guys like you are involved, especially in
the last phase when those who havn't contributed much insist on getting a chance
to lift their leg. The german comedian Karl Valentin had perhaps a more literary
way of putting it: "I suppose everything has been said then, but not yet by
everybody".

Oh, don't take my personal aversion too much to heart. I'm sure somebody on this
planet recognises your contribution, even if it's only Dilbert

Michael S · Jul 4, 2010

On Jul 4, 2:01 am, Charles Gardiner <charles.gardi...@invalid.invalid>
wrote:

Michael S schrieb:

Anyway, the discussion doesn't belong here. I recommend
http://groups.google.com/group/microsoft.public.development.device.dr...

No it doesn't. It belongs just where it is. If you followed from the beginning,
the OP has two problems/questions

I insist that subthread started by Nico Coesel could be discussed in
mpddd with better depth and precision.

1) How to do 64-Bit addressing in PCIe. This is definitely FPGA related.

You mean, 32-bit addressing.

2) Why are his data not appearing as expected. It only became clear in the course
of discussion that this may be due to his driver DMA type setting. (He hasn't
confirmed yet)

Microsoft calls them "data buffer access methods". DMA type (or, by
KMDF terminology, profile) is something else.
I hope you realize how important it is to use the same terminology as
the rest of the world.

http://msdn.microsoft.com/en-us/library/ff554436%28v=VS.85%29.aspx

Nit-picking when everything is (maybe) solved is easy when you haven't really been
contributing constructively.

If you followed from the beginning, you would realize that I
constructively contributed the answer to the first question 4 minutes
ahead of you

Also, IMHO, pointer to the most appropriate newsgroup is very
constructive.

Michael S · Jul 4, 2010

On Jul 4, 11:59 am, Charles Gardiner
<charles.gardi...@invalid.invalid> wrote:

Michael S schrieb:

I insist that subthread started by Nico Coesel could be discussed in
mpddd with better depth and precision.

You insist? Do I care?

If you followed from the beginning, you would realize that I
constructively contributed the answer to the first question 4 minutes
ahead of you

4 mins.? WOW. Incredible. Impressive, Awesome.....

What I see in this posting is a terse reply to the easy part of the problem with
not a single thought spent on the more complex part, even though this is really
the OPs show stopper.

For the rest of your postings, I mainly see blowholed personal seals of approval
assigned to what others have said (aka nit-picking). All coming in "posthumously"
(still assuming that specifying DO_DIRECT_IO actually solves the problem)

And I am 90% sure that 3DW was his only problem.
The following sentence in his original post suggests that he doesn't
use BUFFERED_IO:
"The driver locks the memory". For me it sounds like he is doing
NEITHER I/O.
So it's either 3DW or, less likely, he erroneously too early completes
his ReadFile() request.

I really love attending meetings when guys like you are involved, especially in
the last phase when those who havn't contributed much insist on getting a chance
to lift their leg. The german comedian Karl Valentin had perhaps a more literary
way of putting it: "I suppose everything has been said then, but not yet by
everybody".

Oh, don't take my personal aversion too much to heart. I'm sure somebody on this
planet recognises your contribution, even if it's only Dilbert

You are wrong. I never really was obsessed with getting the credit.
More like obsessed with precise formulations, although more so in real
life than on the Web.
Also I am one of those types that don't like to quickly jump to
conclusion.

Charles Gardiner · Jul 6, 2010

Frank van Eijkelenburg schrieb:

We tried your suggestion (we were using BUFFERED_IO). Unfortunately it
was not the (final) solution.

Was there any noticeable change in the behaviour at all?

Is it still valid that your FPGA can _read_ data from the buffer when your
application writes it there?

With DO_DIRECT_IO specified, it's not clear to me off-hand why you are not seeing
the memory locations in both directions now.

Perhaps there are more causes for the

problem. Anyway, thanks for your suggestion. We are almost out of
ideas of what we can test. Do you have other ideas or tests we can do
to find the cause? I hope to fix the problem before my vacation (only
one day left

Oops, thats tight. I'm just on the way to a customers so I don't have my usual
references at hand. Have you tried the flush (zero length read from FPGA) after a
write to memory. Although, to be honest I don't think that's the solution (just a
straw to grab for in case your system has some caching behaviour I haven't seen
before). My last (KMDF based) design was similar to yours. The FPGA was streaming
to memory and the SW application reading from the buffer shared between
application memory and kernel memory. I never had any data loss, even without the
zero length read.

If you can send me as much relevant info as possible, I'll have another look this
evening.

Regards,
Charles

Frank van Eijkelenburg · Jul 6, 2010

On Jul 2, 9:04 pm, Charles Gardiner <charles.gardi...@invalid.invalid>
wrote:

Hi Frank,

I am not sure if we understand each other.

Yes, it certainly sounds like that.

What do you mean by
completing the request with IoCompleteRequest? There is no request
from software point of view.

I think this might clear up the reason why your data is missing. (See also below
about the type of DMA). I don't think the S/G list you are getting is describing
your application buffer. This is best done by specifying DO_DIRECT_IO as the DMA
method for your device. If you specify DO_BUFFERED_IO you will get an S/G List
describing an intermediate buffer in kernel space and this probably never gets
copied over to your application space buffer unless you terminate the request.
I've never done the 'neither' method myself and from what I hear, it's a
complicated beast.

The FPGA will do a DMA write (data from
FPGA to PC memory) at its own initiative. The allocated memory is used
as long as the software is running. I do not allocate new memory for
each new DMA transfer, but at startup a large piece of memory is
allocated and the physical addresses are written to the FPGA by the
driver software.

Sounds like you are doing something like a circular buffer in memory which stays
alive as long as your device does?

And yes, we use a DMA adapter in combination with the
GetScatterGatherList method. We already used this in another project
but that was PCI and DMA read (data from PC memory to FPGA).

By the way, where can I set the type of DMA?

Typically, you set the DMA buffering method in your AddDevice function after you
create your device object. Quoting from Oney's book,

NTSTATUS AddDevice(..) {
PDEVICE_OBJECT fdo;

IoCreateDevice(....., &fdo);
fdo->Flags |= DO_BUFFERED_IO;
<or
fdo->Flags |= DO_DIRECT_IO;
<or
fdo->Flags |= 0; // i.e. neither Direct nor Buffered

And, you can't change your mind afterwards.

By the way if my assumption about the circular buffer in your design is correct,
there is a slightly more standard solution (standard in the sense that everybody
on the microsoft drivers newgroup seems to do it). It however requires two threads
in your application. The first one requests a buffer (using new or malloc) and
sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing this
buffer. This is performed as an asynchronous request.

The driver recognises this request and pends it indefinitely, (typically terminate
it when your driver is shutting down, otherwise windows will probably hang).
Pending the request has the nice side effect that the buffer now becomes locked
down permanently.

Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get the S/G
list describing the application space buffer as you are currently doing and feed
this to your FPGA.

Using the second thread in your application you can constantly read data from the
locked down pages (you app. space buffer) that are being written by your FPGA.

Assuming the DO_DIRECT_IO solves your problem (I think there is a good chance), I
would however still consider migrating to a KMDF based driver, particularily if
you are writing a new one. It's much easier to maintain and is probably more
portable for future MS versions.

best regards,

Frank

best regards,
Charles

Hi Charles,

We tried your suggestion (we were using BUFFERED_IO). Unfortunately it
was not the (final) solution. Perhaps there are more causes for the
problem. Anyway, thanks for your suggestion. We are almost out of
ideas of what we can test. Do you have other ideas or tests we can do
to find the cause? I hope to fix the problem before my vacation (only
one day left

best regards,

Frank

Frank van Eijkelenburg · Jul 6, 2010

On Jul 2, 9:04 pm, Charles Gardiner <charles.gardi...@invalid.invalid>
wrote:

Hi Frank,

I am not sure if we understand each other.

Yes, it certainly sounds like that.

What do you mean by
completing the request with IoCompleteRequest? There is no request
from software point of view.

I think this might clear up the reason why your data is missing. (See also below
about the type of DMA). I don't think the S/G list you are getting is describing
your application buffer. This is best done by specifying DO_DIRECT_IO as the DMA
method for your device. If you specify DO_BUFFERED_IO you will get an S/G List
describing an intermediate buffer in kernel space and this probably never gets
copied over to your application space buffer unless you terminate the request.
I've never done the 'neither' method myself and from what I hear, it's a
complicated beast.

The FPGA will do a DMA write (data from
FPGA to PC memory) at its own initiative. The allocated memory is used
as long as the software is running. I do not allocate new memory for
each new DMA transfer, but at startup a large piece of memory is
allocated and the physical addresses are written to the FPGA by the
driver software.

Sounds like you are doing something like a circular buffer in memory which stays
alive as long as your device does?

And yes, we use a DMA adapter in combination with the
GetScatterGatherList method. We already used this in another project
but that was PCI and DMA read (data from PC memory to FPGA).

By the way, where can I set the type of DMA?

Typically, you set the DMA buffering method in your AddDevice function after you
create your device object. Quoting from Oney's book,

NTSTATUS AddDevice(..) {
PDEVICE_OBJECT fdo;

IoCreateDevice(....., &fdo);
fdo->Flags |= DO_BUFFERED_IO;
<or
fdo->Flags |= DO_DIRECT_IO;
<or
fdo->Flags |= 0; // i.e. neither Direct nor Buffered

And, you can't change your mind afterwards.

By the way if my assumption about the circular buffer in your design is correct,
there is a slightly more standard solution (standard in the sense that everybody
on the microsoft drivers newgroup seems to do it). It however requires two threads
in your application. The first one requests a buffer (using new or malloc) and
sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing this
buffer. This is performed as an asynchronous request.

The driver recognises this request and pends it indefinitely, (typically terminate
it when your driver is shutting down, otherwise windows will probably hang).
Pending the request has the nice side effect that the buffer now becomes locked
down permanently.

Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get the S/G
list describing the application space buffer as you are currently doing and feed
this to your FPGA.

Using the second thread in your application you can constantly read data from the
locked down pages (you app. space buffer) that are being written by your FPGA.

Assuming the DO_DIRECT_IO solves your problem (I think there is a good chance), I
would however still consider migrating to a KMDF based driver, particularily if
you are writing a new one. It's much easier to maintain and is probably more
portable for future MS versions.

best regards,

Frank

best regards,
Charles

Hi Charles,

We tried your suggestion (we were using BUFFERED_IO). Unfortunately it
was not the (final) solution. Perhaps there are more causes for the
problem. Anyway, thanks for your suggestion. We are almost out of
ideas of what we can test. Do you have other ideas or tests we can do
to find the cause? I hope to fix the problem before my vacation (only
one day left

best regards,

Frank

Michael S · Jul 6, 2010

On Jul 6, 11:00 am, Frank van Eijkelenburg
<fei.technolut...@gmail.com> wrote:

On Jul 2, 9:04 pm, Charles Gardiner <charles.gardi...@invalid.invalid
wrote:

Hi Frank,

I am not sure if we understand each other.

Yes, it certainly sounds like that.

What do you mean by
completing the request with IoCompleteRequest? There is no request
from software point of view.

I think this might clear up the reason why your data is missing. (See also below
about the type of DMA). I don't think the S/G list you are getting is describing
your application buffer. This is best done by specifying DO_DIRECT_IO as the DMA
method for your device. If you specify DO_BUFFERED_IO you will get an S/G List
describing an intermediate buffer in kernel space and this probably never gets
copied over to your application space buffer unless you terminate the request.
I've never done the 'neither' method myself and from what I hear, it's a
complicated beast.

The FPGA will do a DMA write (data from
FPGA to PC memory) at its own initiative. The allocated memory is used
as long as the software is running. I do not allocate new memory for
each new DMA transfer, but at startup a large piece of memory is
allocated and the physical addresses are written to the FPGA by the
driver software.

Sounds like you are doing something like a circular buffer in memory which stays
alive as long as your device does?

And yes, we use a DMA adapter in combination with the
GetScatterGatherList method. We already used this in another project
but that was PCI and DMA read (data from PC memory to FPGA).

By the way, where can I set the type of DMA?

Typically, you set the DMA buffering method in your AddDevice function after you
create your device object. Quoting from Oney's book,

NTSTATUS AddDevice(..) {
PDEVICE_OBJECT fdo;

IoCreateDevice(....., &fdo);
fdo->Flags |= DO_BUFFERED_IO;
<or
fdo->Flags |= DO_DIRECT_IO;
<or
fdo->Flags |= 0; // i.e. neither Direct nor Buffered

And, you can't change your mind afterwards.

By the way if my assumption about the circular buffer in your design is correct,
there is a slightly more standard solution (standard in the sense that everybody
on the microsoft drivers newgroup seems to do it). It however requires two threads
in your application. The first one requests a buffer (using new or malloc) and
sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing this
buffer. This is performed as an asynchronous request.

The driver recognises this request and pends it indefinitely, (typically terminate
it when your driver is shutting down, otherwise windows will probably hang).
Pending the request has the nice side effect that the buffer now becomes locked
down permanently.

Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get the S/G
list describing the application space buffer as you are currently doing and feed
this to your FPGA.

Using the second thread in your application you can constantly read data from the
locked down pages (you app. space buffer) that are being written by your FPGA.

Assuming the DO_DIRECT_IO solves your problem (I think there is a good chance), I
would however still consider migrating to a KMDF based driver, particularily if
you are writing a new one. It's much easier to maintain and is probably more
portable for future MS versions.

best regards,

Frank

best regards,
Charles

Hi Charles,

We tried your suggestion (we were using BUFFERED_IO).

If you were using BUFFERED_IO why was your driver locking the pages?
In case of BUFFERED_IO the pages come from kernel non-paged pool and
don't have to be specifically locked. The only case where the driver
is responsible for locking/unlocking pages is NEITHER I/O.

Unfortunately it
was not the (final) solution. Perhaps there are more causes for the
problem. Anyway, thanks for your suggestion. We are almost out of
ideas of what we can test. Do you have other ideas or tests we can do
to find the cause? I hope to fix the problem before my vacation (only
one day left

best regards,

Frank

Another typical mistake is driver forgets to call IoMarkIrpPending().
KMDF does it automatically, but in plain WDM it's responsibility of
your driver. However forgotten IoMarkIrpPending() normally shows
different symptoms.

Michael S · Jul 6, 2010

On Jul 6, 11:00 am, Frank van Eijkelenburg
<fei.technolut...@gmail.com> wrote:

I hope to fix the problem before my vacation (only one day left

Something, I most certainly DO NOT RECOMMEND for final solution, but
it could help to go to vacation in better mood.
Scrap all the schoolbook nice&complex Windows DMA API stuff. Instead,
take your Irp->MdlAddress, do MmGetMdlPfnArray() and access physical
addresses directly. It's wrong, it's immoral but on simple x86/x64 PC
or on small dual-processor server it always work.
Just don't forget to bring back the official DMA API when you are back
from vocation and have more time than a few hours.

DMA operation to 64-bits PC platform

Frank van Eijkelenburg

Guest

maxascent

Guest

Charles Gardiner

Guest

Michael S

Guest

Charles Gardiner

Guest

Frank van Eijkelenburg

Guest

Frank van Eijkelenburg

Guest

Charles Gardiner

Guest

Nico Coesel

Guest

Michael S

Guest

Michael S

Guest

Charles Gardiner

Guest

Charles Gardiner

Guest

Michael S

Guest

Michael S

Guest

Charles Gardiner

Guest

Frank van Eijkelenburg

Guest

Frank van Eijkelenburg

Guest

Michael S

Guest

Michael S

Guest

Log in

Welcome to EDABoard.com

Sponsor