M
Mark Schellhorn
Guest
If anyone can offer help with this (or a pointer to a more suitable forum) it
would be greatly appreciated. I am posting here because I know there is some
some folks with expertise who frequent this group (and we're using a Xilinx core
).
We are attempting to perform 64B burst PCI-X DMA write transfers from our add-in
card into host memory on a dual Xeon system.
Our linux device driver (kernel 2.24.x/2.26.x) is notified via an interrupt &
single dword "doorbell" write that the larger 64B entry is available for processing.
The order of operation on the PCI-X bus is as follows:
64B data write --> 4B doorbell write --> interrupt.
Upon receiving the interrupt, the device driver polls the location in memory
where the 4B doorbell write is expected to show up. Once he sees the doorbell
location written, he reads the 64B data location. PCI ordering should guarantee
that the 64B data location is written to system memory before the 4B doorbell
write is.
The above writes are performed as Memory Write Block transactions (we have also
tried Memory Write transactions), the No Snoop bit is cleared, and the Relaxed
Ordering bit is cleared.
We consistently encounter a situation where the device driver correctly receives
the interrupt & single dword doorbell write, but the 64B write fails to appear
in memory. Instead, the device driver reads a stale 64B block of data (data from
the last successful DMA write).
As a debug measure, we had the FPGA on our add-in card perform a readback
(Memory Read Block) of the 64B entry immediately after writing it. We obeserved
that the data read back was stale and matched the stale data that the device
driver saw. Eg:
1) Location 0xABCDEF00 is known to contain stale 64B data 0xAAAA....AAAA.
1) FPGA does Memory Write Block 64B 0xBBBB....BBBB at address 0xABCDEF00.
2) FPGA does Memory Read Block 64B at address 0xABCDEF00 (Split esponse).
3) Split Completion is returned by bridge with data 0xAAAA....AAAA.
This appears to be a violation of PCI ordering rules. Again, the No Snoop and
Relaxed Order bits are cleared for all of these transactions.
The device driver *never* writes to the 64B location, so there should be no
possibility of a collision occurring where he writes/flushes stale data that
overwrites the incoming DMA write.
This tells me that the location is NOT getting written because, according to PCI
ordering rules, the FPGA read *must* push the Memory Write Block into system
memory before reading back the location.
We observe this behaviour in dual Xeon systems with both the Intel E7501 chipset
and the Broadcom Serverworks GC-LE chipset.
We observe this in SMP and single processor configurations.
When bus traffic is light at 133MHz, or whenever the bus is running at 66MHz, we
do *not* observe this problem. We occasionally observe the problem when the bus
is running at 100MHz with heavy traffic. This suggests that we are hitting a
narrow timing window at higher bus speeds.
We are suspicious that we might be encountering a cache errata in the Xeon, and
are wondering if anyone can confirm this and possibly provide a workaround?
We've been banging our heads on this for a couple of weeks now.
Mark
would be greatly appreciated. I am posting here because I know there is some
some folks with expertise who frequent this group (and we're using a Xilinx core
).
We are attempting to perform 64B burst PCI-X DMA write transfers from our add-in
card into host memory on a dual Xeon system.
Our linux device driver (kernel 2.24.x/2.26.x) is notified via an interrupt &
single dword "doorbell" write that the larger 64B entry is available for processing.
The order of operation on the PCI-X bus is as follows:
64B data write --> 4B doorbell write --> interrupt.
Upon receiving the interrupt, the device driver polls the location in memory
where the 4B doorbell write is expected to show up. Once he sees the doorbell
location written, he reads the 64B data location. PCI ordering should guarantee
that the 64B data location is written to system memory before the 4B doorbell
write is.
The above writes are performed as Memory Write Block transactions (we have also
tried Memory Write transactions), the No Snoop bit is cleared, and the Relaxed
Ordering bit is cleared.
We consistently encounter a situation where the device driver correctly receives
the interrupt & single dword doorbell write, but the 64B write fails to appear
in memory. Instead, the device driver reads a stale 64B block of data (data from
the last successful DMA write).
As a debug measure, we had the FPGA on our add-in card perform a readback
(Memory Read Block) of the 64B entry immediately after writing it. We obeserved
that the data read back was stale and matched the stale data that the device
driver saw. Eg:
1) Location 0xABCDEF00 is known to contain stale 64B data 0xAAAA....AAAA.
1) FPGA does Memory Write Block 64B 0xBBBB....BBBB at address 0xABCDEF00.
2) FPGA does Memory Read Block 64B at address 0xABCDEF00 (Split esponse).
3) Split Completion is returned by bridge with data 0xAAAA....AAAA.
This appears to be a violation of PCI ordering rules. Again, the No Snoop and
Relaxed Order bits are cleared for all of these transactions.
The device driver *never* writes to the 64B location, so there should be no
possibility of a collision occurring where he writes/flushes stale data that
overwrites the incoming DMA write.
This tells me that the location is NOT getting written because, according to PCI
ordering rules, the FPGA read *must* push the Memory Write Block into system
memory before reading back the location.
We observe this behaviour in dual Xeon systems with both the Intel E7501 chipset
and the Broadcom Serverworks GC-LE chipset.
We observe this in SMP and single processor configurations.
When bus traffic is light at 133MHz, or whenever the bus is running at 66MHz, we
do *not* observe this problem. We occasionally observe the problem when the bus
is running at 100MHz with heavy traffic. This suggests that we are hitting a
narrow timing window at higher bus speeds.
We are suspicious that we might be encountering a cache errata in the Xeon, and
are wondering if anyone can confirm this and possibly provide a workaround?
We've been banging our heads on this for a couple of weeks now.
Mark