Question on PCI-express verssus Standard PCI performance

B

Benjamin Couillard

Guest
Hi everyone,

I'm working on a conversion project where we needed to convert a PCI
acquisition card to a PCI-express (x1) acquisition card. The project
is essentially the same except instead that the new acquisition card
is a PCI-express endpoint instead of being a standard-PCI endpoint.
The project is implemented on a Xilinx FPGA, but I don't think my
issue is Xilinx specific.

The conversion has worked fine on all levels except one. The read
latency of PCI express is about 4 times higher than standard PCI. For
example, on the old product, it takes about 0.9 us to perform a 1-
DWORD read. With the PCI-express product it takes about 3-4 us to
perform a 1-DWORD read. I've seen this read latency both in real-life
(with a real board) and in VHDL Simulation so I don't think that this
is a driver issue. Do any of you have experienced similar performance
issues?

Don't get me wrong, for me PCI-express is a major step ahead, the
write burst and read burst performance is way better than standard
PCI.. Perhaps this is the reason, since most PCI-express cards are
mostly used in burst transactions, the read latency does not really
matter, therefore they sacrificed some read latency in order to obtain
better performance.

Best regards
 
On Mon, 25 Jul 2011 13:23:12 -0700 (PDT), Benjamin Couillard
<benjamin.couillard@gmail.com> wrote:

Hi everyone,

I'm working on a conversion project where we needed to convert a PCI
acquisition card to a PCI-express (x1) acquisition card. The project
is essentially the same except instead that the new acquisition card
is a PCI-express endpoint instead of being a standard-PCI endpoint.
The project is implemented on a Xilinx FPGA, but I don't think my
issue is Xilinx specific.

The conversion has worked fine on all levels except one. The read
latency of PCI express is about 4 times higher than standard PCI. For
example, on the old product, it takes about 0.9 us to perform a 1-
DWORD read. With the PCI-express product it takes about 3-4 us to
perform a 1-DWORD read. I've seen this read latency both in real-life
(with a real board) and in VHDL Simulation so I don't think that this
is a driver issue. Do any of you have experienced similar performance
issues?

Don't get me wrong, for me PCI-express is a major step ahead, the
write burst and read burst performance is way better than standard
PCI.. Perhaps this is the reason, since most PCI-express cards are
mostly used in burst transactions, the read latency does not really
matter, therefore they sacrificed some read latency in order to obtain
better performance.

One lane PCIe 1.x should be able to turn a word read around in about
250ns assuming not too much else is going on. Of course an excessive
number of switches (or slow switches) or slow hardware on either end
are obviously possible issues. But PCIe is certainly much faster than
3-4us to read a word.
 
On Jul 25, 9:23 pm, Benjamin Couillard <benjamin.couill...@gmail.com>
wrote:
The conversion has worked fine on all levels except one. The read
latency of PCI express is about 4 times higher than standard PCI. For
example, on the old product, it takes about 0.9 us to perform a 1-
DWORD read. With the PCI-express product it takes about 3-4 us to
perform a 1-DWORD read.  I've seen this read latency both in real-life
(with a real board) and in  VHDL Simulation so I don't think that this
is a driver issue. Do any of you have experienced similar performance
issues?
I have no actual experience of experimenting with this, however, I
have
been interested in a latency sensitive device that may potentially use
PCI-E
so have been looking around for answers.

Have a look at this write up, of a comparison of HyperTransport and
PCI-E.
The authors claim around 250 nano-seconds (page 9) to read the first
byte:

http://www.hypertransport.org/docs/wp/Low_Latency_Final.pdf

It would be interesting to hear what is causing you to see 3-4 us?
That
would kill off my potential project, so I am hoping to be able to
match the
results in the above paper.

Could there be some inaccuracy in your measurements; how do you
measure the latency?

Rupert
 
When designing with PCI or PCIe you should really try to avoid reads
as much as possible.
What do you need it for anyway? In a multitasking operating system you
are going to have microseconds of jitter on the software side in
kernel mode and tens of miliseconds in user mode anyway. So I am
wondering what the scenario is that benefits from sub us latency for
software reads?

Kolja
cronologic.de
 
Generally speaking PCI Express much more prone to latency than
convertional PCI because packets have to be constructed, passed
through a structure of nodes, and checked at most levels. Data
checking isn't completed, and onward transmission, until last data
arrives and CRCs are checked.

If you do a "read" this will have a packet outgoing and one coming
back so doubly worse. If you can do a DMA like operation where data is
sent from the data source and then interrupt your system to use the
data in memory.

The latency will also vary from system to system because rooting
structures differ between motherboards. The amount of other things
going on will also affect latency as different things contend for the
data pipes. Generally speaking if you are trying to do anything real
time it is something of a nightmare if you are planning using the host
motherboard processor for control functions.

You can try and make the latency smaller by using smaller packet sizes
and this sometimes helps. Ultimately if there is a real time element
to this then putting the processing and/or control on your card is
probably best for performance and accuracy.

John Adair
Home of Raggedstone2. The Spartan-6 PCIe Development Board.


On Jul 25, 9:23 pm, Benjamin Couillard <benjamin.couill...@gmail.com>
wrote:
Hi everyone,

I'm working on a conversion project where we needed to convert a PCI
acquisition card to a PCI-express (x1) acquisition card. The project
is essentially the same except instead that the new acquisition card
is a PCI-express endpoint instead of being a standard-PCI endpoint.
The project is implemented on a Xilinx FPGA, but I don't think my
issue is Xilinx specific.

The conversion has worked fine on all levels except one. The read
latency of PCI express is about 4 times higher than standard PCI. For
example, on the old product, it takes about 0.9 us to perform a 1-
DWORD read. With the PCI-express product it takes about 3-4 us to
perform a 1-DWORD read.  I've seen this read latency both in real-life
(with a real board) and in  VHDL Simulation so I don't think that this
is a driver issue. Do any of you have experienced similar performance
issues?

Don't get me wrong, for me PCI-express is a major step ahead, the
write burst and read burst performance is way better than standard
PCI.. Perhaps this is the reason, since most PCI-express cards are
mostly used in burst transactions, the read latency does not really
matter, therefore they sacrificed some read latency in order to obtain
better performance.

Best regards
 
On Jul 26, 5:19 pm, John Adair <g...@enterpoint.co.uk> wrote:
If you do a "read" this will have a packet outgoing and one coming
back so doubly worse. If you can do a DMA like operation where data is
sent from the data source and then interrupt your system to use the
data in memory.
In the paper I posted a link to, I think the times are for an
interrupt, or for DMA, not a software initiated "read". Thanks for
explaining the difference.

Rupert
 
"Benjamin Couillard" <benjamin.couillard@gmail.com> wrote in message
news:62427806-eeec-499b-a0f0-15ffafa0e3ab@w27g2000yqk.googlegroups.com...
Hi everyone,

I'm working on a conversion project where we needed to convert a PCI
acquisition card to a PCI-express (x1) acquisition card. The project
is essentially the same except instead that the new acquisition card
is a PCI-express endpoint instead of being a standard-PCI endpoint.
The project is implemented on a Xilinx FPGA, but I don't think my
issue is Xilinx specific.

The conversion has worked fine on all levels except one. The read
latency of PCI express is about 4 times higher than standard PCI. For
example, on the old product, it takes about 0.9 us to perform a 1-
DWORD read. With the PCI-express product it takes about 3-4 us to
perform a 1-DWORD read. I've seen this read latency both in real-life
(with a real board) and in VHDL Simulation so I don't think that this
is a driver issue. Do any of you have experienced similar performance
issues?
Is it possible that time-stamping the data would disconnect you somewhat
from the latency problem?
Usually data can't be processed and presented real-time at those speeds
anyway..
 

Welcome to EDABoard.com

Sponsor

Back
Top