XILINX PCIe read of slow device

D

David Binette

Guest
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks
 
In article <d7e5311e-f5ea-4170-bd07-524c71da5c2b@googlegroups.com>,
David Binette <david.binette@gmail.com> wrote:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or
local registers and these work OK, but some addresses map to slow
devices.. like I2C or internal processes that need a few cycles to
process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

David,

What specific problem are you trying to address?

The Completion Timeout Mechanism of the PCIE spec is
optional, and must be enable by SW during device configuration.

Can you just disable this? You can force it disable on either end
(root complex, or endpoint). I don't think it's enabled by default,
but I can't check at the moment...

Or are you asking something else?

Regards,

Mark
 
On Monday, October 27, 2014 1:37:05 PM UTC-5, Mark Curry wrote:
In article <d7e5311e-f5ea-4170-bd07-524c71da5c2b@googlegroups.com>,
David Binette wrote:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or
local registers and these work OK, but some addresses map to slow
devices.. like I2C or internal processes that need a few cycles to
process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

David,

What specific problem are you trying to address?

The Completion Timeout Mechanism of the PCIE spec is
optional, and must be enable by SW during device configuration.

Can you just disable this? You can force it disable on either end
(root complex, or endpoint). I don't think it's enabled by default,
but I can't check at the moment...

Or are you asking something else?

Regards,

Mark

Mark, thanks, I will look into 'completion timeout mechanism' to see if it is the answer to my need. .. Am i asking something else? I don't know, it is all kind of new to me.

part of the difficulty is that the PCI system and the local app are on different clock domains, so when the PCIE read occurs I deal with the clock crossing but it takes clock cycles before I can return something to the PCI read request
 
Den mandag den 27. oktober 2014 19.05.32 UTC+1 skrev David Binette:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks

For peripherals that a slow like I2C on a normal MCU, you would normally have
a register to initiate the read, and a status register you can poll to see when the result is ready


-Lasse

-Lasse
 
On Monday, October 27, 2014 6:09:09 PM UTC-5, lang...@fonz.dk wrote:
Den mandag den 27. oktober 2014 19.05.32 UTC+1 skrev David Binette:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks

For peripherals that a slow like I2C on a normal MCU, you would normally have
a register to initiate the read, and a status register you can poll to see when the result is ready


-Lasse

-Lasse

yes, that is a good solution, but for a different problem.
In this case, the data is always 'ready' it is continuously changing, on a faster clock domain and I need a couple of cycles for the read request to cross domains.

I've tried unsuccessfully to manipulate the IP cores 'trn_tsrc_rdy_n' line to look at the read address and before setting the start of frame line in an effort to pre-fetch the data, but for some reason the core will not tolerate any delays.
 
Den tirsdag den 28. oktober 2014 15.12.33 UTC+1 skrev David Binette:
On Monday, October 27, 2014 6:09:09 PM UTC-5, lang...@fonz.dk wrote:
Den mandag den 27. oktober 2014 19.05.32 UTC+1 skrev David Binette:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks

For peripherals that a slow like I2C on a normal MCU, you would normally have
a register to initiate the read, and a status register you can poll to see when the result is ready


-Lasse

-Lasse

yes, that is a good solution, but for a different problem.
In this case, the data is always 'ready' it is continuously changing, on a faster clock domain and I need a couple of cycles for the read request to cross domains.

I've tried unsuccessfully to manipulate the IP cores 'trn_tsrc_rdy_n' line to look at the read address and before setting the start of frame line in an effort to pre-fetch the data, but for some reason the core will not tolerate any delays.

can't you just keep a copy of the data on the other clock domain?

-Lasse
 
On Tuesday, October 28, 2014 9:37:43 AM UTC-5, lang...@fonz.dk wrote:
Den tirsdag den 28. oktober 2014 15.12.33 UTC+1 skrev David Binette:
On Monday, October 27, 2014 6:09:09 PM UTC-5, lang...@fonz.dk wrote:
Den mandag den 27. oktober 2014 19.05.32 UTC+1 skrev David Binette:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks

For peripherals that a slow like I2C on a normal MCU, you would normally have
a register to initiate the read, and a status register you can poll to see when the result is ready


-Lasse

-Lasse

yes, that is a good solution, but for a different problem.
In this case, the data is always 'ready' it is continuously changing, on a faster clock domain and I need a couple of cycles for the read request to cross domains.

I've tried unsuccessfully to manipulate the IP cores 'trn_tsrc_rdy_n' line to look at the read address and before setting the start of frame line in an effort to pre-fetch the data, but for some reason the core will not tolerate any delays.

can't you just keep a copy of the data on the other clock domain?

-Lasse

yes that is feasible for a small number of items and it my be 'plan-b' if no PCI bus solution is available to me.

I like your suggestions, they are all reasonable and I'll take the best alternative I can get if I dont find a way to do this via PCIe
 
On Mon, 27 Oct 2014 11:05:29 -0700, David Binette wrote:

What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or
local registers and these work OK, but some addresses map to slow
devices.. like I2C or internal processes that need a few cycles to
process before they can produce valid data to be returned to the PCI
bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks

This is out of UG654, page 133, for a simple PIO access. I'm not
sure what your host driver might be using.

"While the read is being processed, the PIO design RX state machine
deasserts trn_rdst_rdy_n, causing the Receive TRN interface to stall
receiving any further TLPs until the internal Memory Read controller
completes the read access from the block RAM and generates the
completion. Deasserting trn_rst_rdy_n in this way is not required for all
designs using the core. The PIO design uses this method to simplify the
control logic of the RX state machine."

Also take a look at page 141

--
Chisolm
Republic of Texas
 
David Binette wrote:
yes that is feasible for a small number of items and it my be 'plan-b' if no PCI bus solution is available to me.

I like your suggestions, they are all reasonable and I'll take the best alternative I can get if I dont find a way to do this via PCIe

I'm still not sure on what exactly your requirement is. In one post you
write that you want to read from slow devices (like I2C). That would
mean the problem is this:
- you issue a PCIe read request
- this read request triggers something, e.g. a read from an I2C device,
which takes a certain time
- meanwhile, you cannot respond to the PCIe read request in time because
you haven't received the result yet

In that case, do what Lasse suggests: Have one register to trigger the
read and another one that can be polled via PCIe indicating when the
data is ready.

But in another post you write "the data is always 'ready' it is
continuously changing, on a faster clock domain", which is something
entirely different. Is it streaming data? Do you need to catch all the
data or do you want to read out only one single value occasionally? Is
it dependant on your read, meaning that your read requests initiates a
calculation or something that you want the result of, or is the data
totally independant and you only occasionally want to read the current
value?

Since I don't understand what you really want to do, here's a few other
possibilities:

- You could just always transfer the data you have to the PCIe clock
domain whenever it changes. Each time there is a new value, always
transfer it to the PCIe clock domain immediately and put it e.g. into a
BAR register. So when you issue a PCIe read request, there's data
already there that you can put into your reply message immediately.
Worst case is you don't get the very latest value but the one before that.

- If you need to catch all the values, I'd put the data into a FIFO. You
could then e.g. issue an MSI (Message signaled interrupt) when the FIFO
is e.g. half-full (or keep polling prog_full or something) and then read
it out in a burst from the PCIe side. No need for clock-domain-crossing
for the read request, as you only read from the FIFO that has its read
port in the PCIe clock domain. No need for PCIe to wait for data too
long, since data from the FIFO is available one or two clock cycles
after the read request was issued (depending on how you configure the FIFO).

- If in your design the read request itself triggers something that
takes a while, do what Lasse suggests.

HTH,
Sean
 
On Tuesday, October 28, 2014 5:37:12 PM UTC-5, Joe Chisolm wrote:
On Mon, 27 Oct 2014 11:05:29 -0700, David Binette wrote:

What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or
local registers and these work OK, but some addresses map to slow
devices.. like I2C or internal processes that need a few cycles to
process before they can produce valid data to be returned to the PCI
bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks

This is out of UG654, page 133, for a simple PIO access. I'm not
sure what your host driver might be using.

"While the read is being processed, the PIO design RX state machine
deasserts trn_rdst_rdy_n, causing the Receive TRN interface to stall
receiving any further TLPs until the internal Memory Read controller
completes the read access from the block RAM and generates the
completion. Deasserting trn_rst_rdy_n in this way is not required for all
designs using the core. The PIO design uses this method to simplify the
control logic of the RX state machine."

Also take a look at page 141

--
Chisolm
Republic of Texas

I understand this
"deasserts trn_rdst_rdy_n, causing the Receive TRN interface to stall
receiving any further TLPs"

but I'm not so much interested in "any further TLPs' as allowing the current TLP to continue processing, it seems that if i delay even a single extra cycle it causes distress to the linux host.
 
On Monday, October 27, 2014 1:05:32 PM UTC-5, David Binette wrote:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks

Hi Sean,
Thanks for the suggestions, but I think what I really need is a way
to stall the current TLP to allow the read/access to complete.

-- Is it streaming data? Do you need to catch all the
-- data or do you want to read out only one single value occasionally? Is

The data is always changing, and only needs to be read occassionally.

-- You could just always transfer the data you have to the PCIe clock
-- domain whenever it changes. Each time there is a new value, always
-- transfer it to the PCIe clock domain immediately and put it e.g. into a
-- BAR register. So when you issue a PCIe read request, there's data
-- already there that you can put into your reply message immediately.
-- Worst case is you don't get the very latest value but the one before that.

That would be OK for most cases but some reads have side effects
, such as clearing another register upon read. This could be overcome
and is not a show stopper, that part could be redesigned.

also since the external device has a lot of registers and they are
typically accessed by setting their address and reading the result
(sometimes a calculated result) it would require significant changes
to create a bank of shadow values to capture them all for
instantaneous retrieval instead of indexed on-demand access

How do other ppl handle things like doing SMBus reads over PCIe or
an I2C device.. the first read is certainly going to need some time
to complete before it can return data.

Perhaps I just fumbled something during my tests and subsequently discarded
what should have been a viable approach.

If I knew exactly how it should be done I could focus my efforts on that.
 
On Wednesday, October 29, 2014 9:33:54 AM UTC-5, David Binette wrote:
On Monday, October 27, 2014 1:05:32 PM UTC-5, David Binette wrote:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks


Hi Sean,
Thanks for the suggestions, but I think what I really need is a way
to stall the current TLP to allow the read/access to complete.

-- Is it streaming data? Do you need to catch all the
-- data or do you want to read out only one single value occasionally? Is

The data is always changing, and only needs to be read occassionally.

-- You could just always transfer the data you have to the PCIe clock
-- domain whenever it changes. Each time there is a new value, always
-- transfer it to the PCIe clock domain immediately and put it e.g. into a
-- BAR register. So when you issue a PCIe read request, there's data
-- already there that you can put into your reply message immediately.
-- Worst case is you don't get the very latest value but the one before that.

That would be OK for most cases but some reads have side effects
, such as clearing another register upon read. This could be overcome
and is not a show stopper, that part could be redesigned.

also since the external device has a lot of registers and they are
typically accessed by setting their address and reading the result
(sometimes a calculated result) it would require significant changes
to create a bank of shadow values to capture them all for
instantaneous retrieval instead of indexed on-demand access

How do other ppl handle things like doing SMBus reads over PCIe or
an I2C device.. the first read is certainly going to need some time
to complete before it can return data.

Perhaps I just fumbled something during my tests and subsequently discarded
what should have been a viable approach.

If I knew exactly how it should be done I could focus my efforts on that.

ps, i know that SMBus is an independant bus on the PCIe connector, I don't mean to complicate the topic with that. It was an example to illustrate only.
 
On Wednesday, October 29, 2014 2:33:54 PM UTC, David Binette wrote:

That would be OK for most cases but some reads have side effects
, such as clearing another register upon read. This could be overcome
and is not a show stopper, that part could be redesigned.

It's generally best to avoid side-effects if at all possible and make all reads idempotent. Life is much easier for software that way.

For example, TLPs may be re-ordered, accesses above a certain size may not occur in the order you expect, the root complex may attempt to pre-fetch a value, in future you may be using this device over a lossy medium like Ethernet.

All of these things can be controlled (or worked around) in software but often lead to inefficiencies. If you have the choice, it's always better to design your interface with a view to simplifying the software interaction. This generally also yields simpler hardware and fewer gotchas in the documentation so everyone's a winner!

Thanks,

Chris
 
In article <b22fff2a-6bf2-4285-8632-7cda5fa59541@googlegroups.com>,
David Binette <david.binette@gmail.com> wrote:
On Monday, October 27, 2014 1:05:32 PM UTC-5, David Binette wrote:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks



How do other ppl handle things like doing SMBus reads over PCIe or
an I2C device.. the first read is certainly going to need some time
to complete before it can return data.

Perhaps I just fumbled something during my tests and subsequently discarded
what should have been a viable approach.

David,

I can't offer any specific advise - but generally all PCIE transcations
are "stalled", whether they're reading from a slow device on another clock
or a "fast" device on the same clock.

For A PIO read you get:
1. The host issues a PIO read.
2. A TLP MRd packet is formed and sent across the serial interface.
3. The xilinx endpoint decodes the packet, determines that the packet
is meant for the user logic - you. It sends the information
out to the user interface logic.
4. Your logic issues the read, and responds.
5. The CPLd packet is formatted and transmitted back across the PCIE
link.
...

All of that takes quite a bit of time. The fact that step 4 takes
a few cycles (give or take 10s or perhaps even 100s) is almost irrelavant.
The PCIE time mechanism doesn't come into play until this number is
very high (I've not used it, but I'd think we're talking 10s of ms)

The whole process has quite a bit of latency. A few cycles
here or there aren't going to matter.

I don't use that specific PCIE core, nor Xilinx logic (I'm using the Virtex7 core,
with AXIS interfaces tied to my logic). But the general flow should be the
same. I'd review the interfaces specification to fully understand
what's required. Are you running sims with the Xilinx logic?

Regards,

Mark
 
On Monday, October 27, 2014 1:05:32 PM UTC-5, David Binette wrote:
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks

Thanks Mark
for your time and comments, which were helpful.

I haven't put it on the simulator, just doing compiles and tests but the turn time is long.
 
David Binette <david.binette@gmail.com> writes:

> I haven't put it on the simulator, just doing compiles and tests but the turn time is long.

Does Xilinx provide a realistic Root Complex model or some other type of
PCIe verification environment?

Rolling your own can be some amount of work. However, it might be
possible to instantiate a Xilinx Root Complex in your testbench and use
that to stimulate your DUT.


//Petter


--
..sig removed by request.
 
On Friday, October 31, 2014 6:58:53 AM UTC-4, Petter Gustad wrote:
David Binette <david.binette@gmail.com> writes:

I haven't put it on the simulator, just doing compiles and tests but the turn time is long.

Does Xilinx provide a realistic Root Complex model or some other type of
PCIe verification environment?

Rolling your own can be some amount of work. However, it might be
possible to instantiate a Xilinx Root Complex in your testbench and use
that to stimulate your DUT.


//Petter


--
.sig removed by request.

Yes, the example design provided with the PCIe EP Block contains a root port model.

I've recently worked a Spartan 6 design similar to the OP in which the FPGA is a bridge between the processor over PCIe and a local bus with several peripherals. I started with the example design and modified the PIO Rx and Tx engines to work for my application. Most of the local bus cycles are fast enough that software is not having to wait. A timeout was implemented on the local bus cycles that issues an MSI interrupt on the PCIe link if the peripheral doesn't respond within the timeout period (~1 us). One issue we ran into WRT PCIe packet timing is that the MSI interrupt was not being seen by software before the next transaction was issued on the link. We ended up using a status register for software to poll instead.
 

Welcome to EDABoard.com

Sponsor

Back
Top