serial protocol specs and verification

alb · Jul 26, 2013

Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is used.
The bitrate is set to 20 Mbps.
Data encoding on the LVDS line is NRZI:
- bit '1' is represented by a transition of the physical level,
- bit '0' is represented by no transition of the physical level,
- insertion of an additional bit '1' after 6 consecutive bits '0'.

Isn't there a missing requirement on reset condition of the line?
System clock is implicitly defined on a different section of the specs
and is set at 40MHz.

At the next layer there's a definition of a 'frame' as a sequence of 16
bit words preceded by a 3 bit sync pattern (111) and a header of 16 bits
defining the type of the packet and the length of the packet (in words).

I'm writing a test bench for it and I was wondering whether there's any
recommendation you would suggest. Should I take care about randomly
select the phase between the system clock and the data?

Any pointer is appreciated.
Cheers,

Al

--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

rickman · Jul 26, 2013

On 7/26/2013 11:22 AM, alb wrote:

Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is used.
The bitrate is set to 20 Mbps.
Data encoding on the LVDS line is NRZI:
- bit '1' is represented by a transition of the physical level,
- bit '0' is represented by no transition of the physical level,
- insertion of an additional bit '1' after 6 consecutive bits '0'.

Isn't there a missing requirement on reset condition of the line?
System clock is implicitly defined on a different section of the specs
and is set at 40MHz.

At the next layer there's a definition of a 'frame' as a sequence of 16
bit words preceded by a 3 bit sync pattern (111) and a header of 16 bits
defining the type of the packet and the length of the packet (in words).

I'm writing a test bench for it and I was wondering whether there's any
recommendation you would suggest. Should I take care about randomly
select the phase between the system clock and the data?

Async, eh? At 2x clock to data? Not sure I would want to design this.
I assume you have to phase lock to the data stream somehow? I think
that is the part I would worry about.

In simulation I would recommend that you both jitter the data clock at a
high bandwidth and also with something fairly slow. The slow variation
will test the operation of your data extraction with a variable phase
and the high bandwidth jitter will check for problems from only having
two samples per bit. I don't know how this can be expected to work myself.

I did something similar where I had to run a digital phase locked loop
on standard NRZ data (no encoding) and used a 4x clock, but I think I
proved to myself I could do it with a 3x clock, it just becomes
impossible to detect when you have a sample error... lol.

--

Rick

alb · Jul 28, 2013

On 27/07/2013 01:59, rickman wrote:

On 7/26/2013 11:22 AM, alb wrote:
Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is
used.
The bitrate is set to 20 Mbps.
Data encoding on the LVDS line is NRZI:
- bit '1' is represented by a transition of the physical level,
- bit '0' is represented by no transition of the physical level,
- insertion of an additional bit '1' after 6 consecutive bits '0'.

[]

Async, eh? At 2x clock to data? Not sure I would want to design this.
I assume you have to phase lock to the data stream somehow? I think
that is the part I would worry about.

currently they are experiencing a large loss of packets as well as many
corrupted packets (CRC errors). I'm not sure the current implementation
is doing phase lock.

In simulation I would recommend that you both jitter the data clock at a
high bandwidth and also with something fairly slow. The slow variation
will test the operation of your data extraction with a variable phase
and the high bandwidth jitter will check for problems from only having
two samples per bit. I don't know how this can be expected to work myself.

Since modules are likely to have different temperatures being far apart,
I would certainly expect a phase problem. Your idea to have a slow and a
high frequency variation in the phase generation might bring out some
additional info.

I did something similar where I had to run a digital phase locked loop
on standard NRZ data (no encoding) and used a 4x clock, but I think I
proved to myself I could do it with a 3x clock, it just becomes
impossible to detect when you have a sample error... lol.

what do you mean by saying 'it becomes impossible to detect when you
have a sample error'?

Richard Damon · Jul 29, 2013

On 7/26/13 11:22 AM, alb wrote:

Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is used.
The bitrate is set to 20 Mbps.
Data encoding on the LVDS line is NRZI:
- bit '1' is represented by a transition of the physical level,
- bit '0' is represented by no transition of the physical level,
- insertion of an additional bit '1' after 6 consecutive bits '0'.

Isn't there a missing requirement on reset condition of the line?
System clock is implicitly defined on a different section of the specs
and is set at 40MHz.

At the next layer there's a definition of a 'frame' as a sequence of 16
bit words preceded by a 3 bit sync pattern (111) and a header of 16 bits
defining the type of the packet and the length of the packet (in words).

I'm writing a test bench for it and I was wondering whether there's any
recommendation you would suggest. Should I take care about randomly
select the phase between the system clock and the data?

Any pointer is appreciated.
Cheers,

Al

You don't need to specify a reset state, as either level will work. At
reset the line will be toggling every 7 bit times due to the automatic
insertion of a 1 after 6 0s.

I would be hard pressed to use 40 MHz as a system clock, unless I was
allowed to use both edges of the clock (so I could really sample at a 4x
rate).

For a test bench, I would build something that could be set to work
slightly "off frequency" and maybe even with some phase jitter in the
data clock. I am assuming that system clock does NOT travel between
devices, or there wouldn't be as much need for the auto 1 bit, unless
this is just a bias leveling, but if isn't real great for that.

alb · Jul 29, 2013

On 29/07/2013 03:05, Richard Damon wrote:

On 7/26/13 11:22 AM, alb wrote:
Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is used.
The bitrate is set to 20 Mbps.
Data encoding on the LVDS line is NRZI:
- bit '1' is represented by a transition of the physical level,
- bit '0' is represented by no transition of the physical level,
- insertion of an additional bit '1' after 6 consecutive bits '0'.

Isn't there a missing requirement on reset condition of the line?
System clock is implicitly defined on a different section of the specs
and is set at 40MHz.
[]
You don't need to specify a reset state, as either level will work. At
reset the line will be toggling every 7 bit times due to the automatic
insertion of a 1 after 6 0s.

Uhm, since there's a sync pattern of '111' I have to assume that no
frame is transmitted when only zeros are flowing (with the '1' stuffed
every 6 zeros).

I would be hard pressed to use 40 MHz as a system clock, unless I was
allowed to use both edges of the clock (so I could really sample at a 4x
rate).

I'm thinking about having a system clock multiplied internally via PLL
and then go for a x4 or x8 in order to center the bit properly.

For a test bench, I would build something that could be set to work
slightly "off frequency" and maybe even with some phase jitter in the
data clock.

Rick was suggesting a phase jitter with a high and a low frequency
component. This can be even a more realistic case since it models slow
drifts due to temperature variations... I do not know how critical would
be to simulate *all* jitter components of a clock (they may depend on
temperature, power noise, ground noise, ...).

I am assuming that system clock does NOT travel between
devices, or there wouldn't be as much need for the auto 1 bit, unless
this is just a bias leveling, but if isn't real great for that.

Your assumption is correct. No clock distribution between devices.

rickman · Jul 29, 2013

On 7/28/2013 2:32 PM, alb wrote:

On 27/07/2013 01:59, rickman wrote:
On 7/26/2013 11:22 AM, alb wrote:
Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is
used.
The bitrate is set to 20 Mbps.
Data encoding on the LVDS line is NRZI:
- bit '1' is represented by a transition of the physical level,
- bit '0' is represented by no transition of the physical level,
- insertion of an additional bit '1' after 6 consecutive bits '0'.

[]

Async, eh? At 2x clock to data? Not sure I would want to design this.
I assume you have to phase lock to the data stream somehow? I think
that is the part I would worry about.

currently they are experiencing a large loss of packets as well as many
corrupted packets (CRC errors). I'm not sure the current implementation
is doing phase lock.

In simulation I would recommend that you both jitter the data clock at a
high bandwidth and also with something fairly slow. The slow variation
will test the operation of your data extraction with a variable phase
and the high bandwidth jitter will check for problems from only having
two samples per bit. I don't know how this can be expected to work myself.

Since modules are likely to have different temperatures being far apart,
I would certainly expect a phase problem. Your idea to have a slow and a
high frequency variation in the phase generation might bring out some
additional info.

I did something similar where I had to run a digital phase locked loop
on standard NRZ data (no encoding) and used a 4x clock, but I think I
proved to myself I could do it with a 3x clock, it just becomes
impossible to detect when you have a sample error... lol.

what do you mean by saying 'it becomes impossible to detect when you
have a sample error'?

I was assuming that perhaps you were doing something I didn't quite
understand, but I'm pretty sure I am on target with this. You *must* up
your sample rate by a sufficient amount so that you can guarantee you
get a minimum of two samples per bit. Otherwise you have no way to
distinguish a slipped sample due to clock mismatch. Clock frequency
mismatch is guaranteed, unless you are using the same clock somehow. Is
that the case? If so, the sampling would just be synchronous and I
don't follow where the problem is.

It is not just a matter of phase, but of frequency. With a 2x clock,
seeing a transition 3 clocks later doesn't distinguish one bit time from
two bit times.

I'm having trouble expressing myself I think, but I'm trying to say the
basic premise of this design is flawed because the sample clock is only
2x the data rate. I say you need 3x and I strongly encourage 4x. At 4x
the samples have four states, expected timing, fast timing, slow timing
and "error" timing meaning the loop control isn't working.

Data ____----____----____----____----____----____----____
SmplClk --__--__--__--__--__--__--__--__--__--__--__--__--__
SmplData -----____----____----____----____----____----____----

This is how you expect it to work. But if the data is sampled slightly
off it looks like this.

Data ____---____----____----____----____----____----____
SmplClk --__--__--__--__--__--__--__--__--__--__--__--__--__
SmplData -----________----____----____----____----____----___

You can't use a locked loop like this because you have no info on
whether you are sampling fast or slow.

The sample clock does not need to be any particular ratio to the data
stream if you use an NCO to control the sample rate. Then the phase
detection will bump the rate up and down to suit.

Do you follow what I am saying? Or have I mistaken what you are doing?

--

Rick

rickman · Jul 29, 2013

On 7/29/2013 5:09 AM, alb wrote:

Rick was suggesting a phase jitter with a high and a low frequency
component. This can be even a more realistic case since it models slow
drifts due to temperature variations... I do not know how critical would
be to simulate *all* jitter components of a clock (they may depend on
temperature, power noise, ground noise, ...).

Just to be clear my suggestion for simulating with both fast and slow
clock frequency variations is not intended to match any real world
conditions so much, but just to exercise the circuit in two ways that I
would expect to detect failures.

If the clock is sampling the data on the edge, it is random which level
is measured. This can be simulated by a fast jitter in the clock. A
slow noise component in the clock frequency would provide for simulation
of mismatched clock frequencies in both the positive and negative
directions. Another way of implementing the slow drift is to just
simulate at a very slightly higher frequency and at a very slightly
lower frequency. That might show errors faster and more deterministically.

--

Rick

glen herrmannsfeldt · Jul 29, 2013

rickman <gnuarm@gmail.com> wrote:

On 7/28/2013 2:32 PM, alb wrote:

(snip)

For the communication with Frontend asynchronous LVDS
connection is used.

(snip)

Async, eh? At 2x clock to data? Not sure I would want to
design this.
I assume you have to phase lock to the data stream somehow?
I think that is the part I would worry about.
two samples per bit. I don't know how this can be expected to work myself.

(snip)

Since modules are likely to have different temperatures being far apart,
I would certainly expect a phase problem. Your idea to have a slow and a
high frequency variation in the phase generation might bring out some
additional info.

(snip)

I was assuming that perhaps you were doing something I didn't quite
understand, but I'm pretty sure I am on target with this.
You *must* up your sample rate by a sufficient amount so that
you can guarantee you get a minimum of two samples per bit.
Otherwise you have no way to distinguish a slipped sample due
to clock mismatch. Clock frequency mismatch is guaranteed,
unless you are using the same clock somehow.

Everyone's old favorite asynchronous serial RS232 usually uses a
clock at 16x, though I have seen 64x. From the beginning of the
start bit, it counts half a bit time (in clock cycles), verifies
the start bit (and not random noise) then counts whole bits and
decodes at that point. So, the actual decoding is done with a 1X
clock, but with 16 (or 64) possible phase values. It resynchronizes
at the beginning of each character, so it can't get too far off.

It is not just a matter of phase, but of frequency. With a 2x clock,
seeing a transition 3 clocks later doesn't distinguish one bit
time from two bit times.

For 10Mbit ethernet, on the other hand, as well as I understand it
the receiver locks (PLL) to the transmitter. Manchester coding is
wasteful of bandwidth, but allows for a simpler receiver.
I believe it is usual to feed the transmit clock to the PLL to keep
it close to the right frequency until a signal comes in. Speeds up
the lock time.

I'm having trouble expressing myself I think, but I'm trying to say the
basic premise of this design is flawed because the sample clock is only
2x the data rate. I say you need 3x and I strongly encourage 4x. At 4x
the samples have four states, expected timing, fast timing, slow timing
and "error" timing meaning the loop control isn't working.

Seems to me that it should depend on how far of you can get.
For async RS232, you have to stay within about a quarter bit time
over 10 bits, so even if the clock is 2% off, it still works.
But as above, that depends on having a clock of the appropriate
phase.

-- glen

rickman · Jul 29, 2013

On 7/29/2013 1:40 PM, glen herrmannsfeldt wrote:

rickman<gnuarm@gmail.com> wrote:
On 7/28/2013 2:32 PM, alb wrote:

(snip)

For the communication with Frontend asynchronous LVDS
connection is used.

(snip)
Async, eh? At 2x clock to data? Not sure I would want to
design this.
I assume you have to phase lock to the data stream somehow?
I think that is the part I would worry about.
two samples per bit. I don't know how this can be expected to work myself.

(snip)
Since modules are likely to have different temperatures being far apart,
I would certainly expect a phase problem. Your idea to have a slow and a
high frequency variation in the phase generation might bring out some
additional info.

(snip)
I was assuming that perhaps you were doing something I didn't quite
understand, but I'm pretty sure I am on target with this.
You *must* up your sample rate by a sufficient amount so that
you can guarantee you get a minimum of two samples per bit.
Otherwise you have no way to distinguish a slipped sample due
to clock mismatch. Clock frequency mismatch is guaranteed,
unless you are using the same clock somehow.

Everyone's old favorite asynchronous serial RS232 usually uses a
clock at 16x, though I have seen 64x. From the beginning of the
start bit, it counts half a bit time (in clock cycles), verifies
the start bit (and not random noise) then counts whole bits and
decodes at that point. So, the actual decoding is done with a 1X
clock, but with 16 (or 64) possible phase values. It resynchronizes
at the beginning of each character, so it can't get too far off.

Yes, that protocol requires a clock matched to the senders clock to at
least 2.5% IIRC. The protocol the OP describes has much longer char
sequences which implies much tighter clock precision at each end and I'm
expecting it to use a clock recovery circuit... but maybe not. I think
he said they don't use one but get "frequent" errors.

It is not just a matter of phase, but of frequency. With a 2x clock,
seeing a transition 3 clocks later doesn't distinguish one bit
time from two bit times.

For 10Mbit ethernet, on the other hand, as well as I understand it
the receiver locks (PLL) to the transmitter. Manchester coding is
wasteful of bandwidth, but allows for a simpler receiver.
I believe it is usual to feed the transmit clock to the PLL to keep
it close to the right frequency until a signal comes in. Speeds up
the lock time.

I'm having trouble expressing myself I think, but I'm trying to say the
basic premise of this design is flawed because the sample clock is only
2x the data rate. I say you need 3x and I strongly encourage 4x. At 4x
the samples have four states, expected timing, fast timing, slow timing
and "error" timing meaning the loop control isn't working.

Seems to me that it should depend on how far of you can get.
For async RS232, you have to stay within about a quarter bit time
over 10 bits, so even if the clock is 2% off, it still works.
But as above, that depends on having a clock of the appropriate
phase.

Not sure why you mention phase. In 232 type character async you have
*no* phase relationship between clocks. There is no PLL so you aren't
phase locked to the data either. I guess you mean a clock with enough
precision?

I've never analyzed an async design with longer data streams so I don't
know how much precision would be required, but I"m sure you can't do
reliable data recovery with a 2x clock (without a pll). I think this
would contradict the Nyquist criterion.

In my earlier comments when I'm talking about a PLL I am referring to a
digital PLL. I guess I should have said a DPLL.

--

Rick

glen herrmannsfeldt · Jul 29, 2013

rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)

Everyone's old favorite asynchronous serial RS232 usually uses a
clock at 16x, though I have seen 64x. From the beginning of the
start bit, it counts half a bit time (in clock cycles), verifies
the start bit (and not random noise) then counts whole bits and
decodes at that point. So, the actual decoding is done with a 1X
clock, but with 16 (or 64) possible phase values. It resynchronizes
at the beginning of each character, so it can't get too far off.

Yes, that protocol requires a clock matched to the senders clock to at
least 2.5% IIRC. The protocol the OP describes has much longer char
sequences which implies much tighter clock precision at each end and I'm
expecting it to use a clock recovery circuit... but maybe not. I think
he said they don't use one but get "frequent" errors.

(snip)

Seems to me that it should depend on how far of you can get.
For async RS232, you have to stay within about a quarter bit time
over 10 bits, so even if the clock is 2% off, it still works.
But as above, that depends on having a clock of the appropriate
phase.

Not sure why you mention phase. In 232 type character async you have
*no* phase relationship between clocks. There is no PLL so you aren't
phase locked to the data either. I guess you mean a clock with enough
precision?

The reason for the 16x clock is that it can then clock the bits
in one at a time with any of 16 different phases. That is, the actual
bits are only looked at once (usually).

I've never analyzed an async design with longer data streams
so I don't know how much precision would be required, but I"m
sure you can't do reliable data recovery with a 2x clock (without
a pll). I think this would contradict the Nyquist criterion.

If you start from the leading edge of the start bit, choose which
cycle of the 2x clock is closest to the center, and count from there,
seems to me you do pretty well if the clocks are close enough. Also,
the bit times should be pretty close to correct.

In my earlier comments when I'm talking about a PLL I am
referring to a digital PLL. I guess I should have said a DPLL.

I was thinking of an analog one. I still remember when analog (PLL
based) data separators were better for floppy disk reading.
Most likely by now, digital ones are better, possibly because
of a higher clock frequency.

-- glen

rickman · Jul 30, 2013

On 7/29/2013 4:36 PM, glen herrmannsfeldt wrote:

rickman<gnuarm@gmail.com> wrote:

(snip, I wrote)

Everyone's old favorite asynchronous serial RS232 usually uses a
clock at 16x, though I have seen 64x. From the beginning of the
start bit, it counts half a bit time (in clock cycles), verifies
the start bit (and not random noise) then counts whole bits and
decodes at that point. So, the actual decoding is done with a 1X
clock, but with 16 (or 64) possible phase values. It resynchronizes
at the beginning of each character, so it can't get too far off.

Yes, that protocol requires a clock matched to the senders clock to at
least 2.5% IIRC. The protocol the OP describes has much longer char
sequences which implies much tighter clock precision at each end and I'm
expecting it to use a clock recovery circuit... but maybe not. I think
he said they don't use one but get "frequent" errors.

(snip)
Seems to me that it should depend on how far of you can get.
For async RS232, you have to stay within about a quarter bit time
over 10 bits, so even if the clock is 2% off, it still works.
But as above, that depends on having a clock of the appropriate
phase.

Not sure why you mention phase. In 232 type character async you have
*no* phase relationship between clocks. There is no PLL so you aren't
phase locked to the data either. I guess you mean a clock with enough
precision?

The reason for the 16x clock is that it can then clock the bits
in one at a time with any of 16 different phases. That is, the actual
bits are only looked at once (usually).

I've never analyzed an async design with longer data streams
so I don't know how much precision would be required, but I"m
sure you can't do reliable data recovery with a 2x clock (without
a pll). I think this would contradict the Nyquist criterion.

If you start from the leading edge of the start bit, choose which
cycle of the 2x clock is closest to the center, and count from there,
seems to me you do pretty well if the clocks are close enough. Also,
the bit times should be pretty close to correct.

That is the point. With a 2x clock there isn't enough resolution to
"pick" an edge. The clock that detects the edge is somewhere in the
first *half* of the start bit and the following clock is somewhere in
the second half of the start bit... which do you use? Doesn't matter,
if the clock detecting the start bit is close enough to the wrong point,
one or the other will be far too close to the next transition to
guarantee that you are sampling data from the correct bit.

In my earlier comments when I'm talking about a PLL I am
referring to a digital PLL. I guess I should have said a DPLL.

I was thinking of an analog one. I still remember when analog (PLL
based) data separators were better for floppy disk reading.
Most likely by now, digital ones are better, possibly because
of a higher clock frequency.

If you have an analog PLL then you just need to make sure your sample
clock is *faster* than 2x the bit rate. Then you can be certain of how
many bits are between adjacent transitions. But if at any time due to
frequency error or jitter you sample on the wrong side of a transition
you will get an unrecoverable error.

When it comes to analog media like disk drives where the position of the
bit pulse can jitter significantly I would expect a significantly higher
clock rate would be very useful. It all comes down to distinguishing
which half of the bit time the transition falls into. With a run of six
zeros (no transition) between 1 bits (transition) it becomes more
important to sample with adequate resolution with a DPLL or to use an
analog PLL.

I did a DPLL design for a data input to an IP circuit to packet card.
It worked well in simulation and in product test and verification. I'm
not sure they have used this feature in the field though. It was added
to the product "just in case" and that depends on the customer needing
the feature.

--

Rick

Jul 30, 2013

On Saturday, July 27, 2013 1:59:46 AM UTC+2, rickman wrote:

On 7/26/2013 11:22 AM, alb wrote:

Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is used.

The bitrate is set to 20 Mbps.

Data encoding on the LVDS line is NRZI:

- bit '1' is represented by a transition of the physical level,

- bit '0' is represented by no transition of the physical level,

- insertion of an additional bit '1' after 6 consecutive bits '0'.

Isn't there a missing requirement on reset condition of the line?

System clock is implicitly defined on a different section of the specs

and is set at 40MHz.

At the next layer there's a definition of a 'frame' as a sequence of 16

bit words preceded by a 3 bit sync pattern (111) and a header of 16 bits

defining the type of the packet and the length of the packet (in words).

I'm writing a test bench for it and I was wondering whether there's any

recommendation you would suggest. Should I take care about randomly

select the phase between the system clock and the data?

Async, eh? At 2x clock to data? Not sure I would want to design this.

I assume you have to phase lock to the data stream somehow? I think

that is the part I would worry about.

In simulation I would recommend that you both jitter the data clock at a

high bandwidth and also with something fairly slow. The slow variation

will test the operation of your data extraction with a variable phase

and the high bandwidth jitter will check for problems from only having

two samples per bit. I don't know how this can be expected to work myself.

I did something similar where I had to run a digital phase locked loop

on standard NRZ data (no encoding) and used a 4x clock, but I think I

proved to myself I could do it with a 3x clock, it just becomes

impossible to detect when you have a sample error... lol.

Doesn't sound so different from usb (full speed)

usually done by sampling the 12mbit/s using a 48MHz clk or
rising and falling edge on 24MHz clock

-Lasse

glen herrmannsfeldt · Jul 30, 2013

rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)

If you start from the leading edge of the start bit, choose which
cycle of the 2x clock is closest to the center, and count from there,
seems to me you do pretty well if the clocks are close enough. Also,
the bit times should be pretty close to correct.

That is the point. With a 2x clock there isn't enough resolution to
"pick" an edge. The clock that detects the edge is somewhere in the
first *half* of the start bit and the following clock is somewhere in
the second half of the start bit... which do you use?

The easy way is to use the opposite edge of the clock. I suppose that
really means that the clock is 4x, though, so maybe that doesn't count.
Say you clock on the falling edge. If the clock is currently high,
the next falling edge will be less than half a cycle away. If it
is currently low, then it will be more. Using that, you can find the
falling edge closest to the center.

The hard way is to have the receive clock slightly faster or slightly
slower. That is, the speed such that if the first edge is in the
first half, later edges will be later in the bit time, and not past
the 3/4 mark. Now, having different receive and transmit clocks is
inconvenient, but not impossible.

Doesn't matter, if the clock detecting the start bit is close
enough to the wrong point, one or the other will be far too
close to the next transition to guarantee that you are sampling
data from the correct bit.

(snip)

I was thinking of an analog one. I still remember when analog (PLL
based) data separators were better for floppy disk reading.
Most likely by now, digital ones are better, possibly because
of a higher clock frequency.

If you have an analog PLL then you just need to make sure your sample
clock is *faster* than 2x the bit rate. Then you can be certain of how
many bits are between adjacent transitions. But if at any time due to
frequency error or jitter you sample on the wrong side of a transition
you will get an unrecoverable error.

It is interesting in the case of magnetic media. The read head reads
changes in the recorded magnetic field. For single density (FM) there
is a flux transition at the edge of the bit cell (clock bit), and either
is or isn't one in the center (data bit). So, including jitter, the
data bit is +/- one quarter bit time from the center, and the clock
bits are +/- one quarter from the cell boundary. The data rate is half
the maximum flux transition rate. The time between transitions is
either 1/2 or 1 bit time.

For the usual IBM double density (MFM), the data bits are again in the
center of the bit cell, but clock bits only occur on bit cell boundaries
between two zero (no transition) bits. The data rate is then equal to
the maximum flux transition rate. The time between transitions is then
either one or 1.5 bit times. The result, though, as you noted, is that
it is more sensitive to jitter. In the case of magnetic media response,
though, there is a predictable component to the transition times.
As the field doesn't transition infinitely fast, the result is that
as two transitions get closer together, when read back they come
slightly farther apart than you might expect. Precompensation is then
used to correct for this. Transitions are moved slightly earlier
or slightly later, depending on the expected movement of the read
pulse.

When it comes to analog media like disk drives where the position
of the bit pulse can jitter significantly I would expect a
significantly higher clock rate would be very useful.

One way to do the precompensation is to run a clock fast enough
such that you can move the transition one cycle early or late.
The other way is with an analog delay line.

It all comes down to distinguishing which half of the bit time
the transition falls into. With a run of six zeros (no transition)
between 1 bits (transition) it becomes more important to sample
with adequate resolution with a DPLL or to use an analog PLL.

The early magnetic tape used NRZI coding, flux transition for one,
no transition for zero. Odd parity means at least one bit will
change for every character written to tape. Even parity means
at least two will change, but you can't write the character
with all bits zero. Both were used for 7 track (six bit characters)
and odd parity was used for 800 BPI 9 track tapes. There can be
long runs of zero (no transition) for any individual track, but
taken together there is at least one.

For 1600 BPI tapes, IBM changed to PE, which is pretty similar to
that used for single density floppies. The flux transition rate can
be twice the bit rate (3200/inch) but each track has its own clock
pulse. It is fairly insensitive to head azimuth, unlike 800 BPI NRZI.
There are no long periods without a transition on any track.
Reading tapes is much more reliable, especially on a different
drive than the data was written on.

IBM 6250 tapes use GCR, with more complicated patterns of bit
transitions, and more variation in time between transitions.
Again, much more reliable than its predecessor.

I did a DPLL design for a data input to an IP circuit to packet card.
It worked well in simulation and in product test and verification. I'm
not sure they have used this feature in the field though. It was added
to the product "just in case" and that depends on the customer needing
the feature.

-- glen

Richard Damon · Jul 30, 2013

On 7/29/13 5:09 AM, alb wrote:

On 29/07/2013 03:05, Richard Damon wrote:
On 7/26/13 11:22 AM, alb wrote:
Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is used.
The bitrate is set to 20 Mbps.
Data encoding on the LVDS line is NRZI:
- bit '1' is represented by a transition of the physical level,
- bit '0' is represented by no transition of the physical level,
- insertion of an additional bit '1' after 6 consecutive bits '0'.

Isn't there a missing requirement on reset condition of the line?
System clock is implicitly defined on a different section of the specs
and is set at 40MHz.
[]
You don't need to specify a reset state, as either level will work. At
reset the line will be toggling every 7 bit times due to the automatic
insertion of a 1 after 6 0s.

Uhm, since there's a sync pattern of '111' I have to assume that no
frame is transmitted when only zeros are flowing (with the '1' stuffed
every 6 zeros).

My assumption for the protocol would be that between frames an "all
zero" pattern is sent. (note that this is on the layer above the raw
transport level, where every time 6 zeros are sent, a 1 is added). Thus
all frames will begin with three 1s in a row, as a signal for start of
frame (and also gives a lot of transitions to help lock the clock if
using a pll).

I would be hard pressed to use 40 MHz as a system clock, unless I was
allowed to use both edges of the clock (so I could really sample at a 4x
rate).

I'm thinking about having a system clock multiplied internally via PLL
and then go for a x4 or x8 in order to center the bit properly.

I would think that sampling at 4x of the data rate is an minimum, faster
will give you better margins for frequency errors. So with a 20 MHz data
rate, you need to sample the data at 80 MHz, faster can help, and will
cause less jitter in your recovered data clock out.

Note that the first level of processing will perform data detection and
clock recovery, and this might be where the 40 MHz came from, a 40 MHz
processing system can be told most of the time to take data every other
clock cycle, but have bandwidth to at times if the data is coming in
slightly faster to take data on two consecutive clocks. You don't want
to make this clock much faster than that, as then it becomes harder to
design for no benefit. Any higher speed bit detection clock needs to
have the results translated to this domain for further processing. (You
could also generate a recovered clock, but that starts you down the road
to an async design as the recovered clock isn't well related to your
existing clock, being a combinatorial result of registers clocked on
your sampling clock.)

For a test bench, I would build something that could be set to work
slightly "off frequency" and maybe even with some phase jitter in the
data clock.

Rick was suggesting a phase jitter with a high and a low frequency
component. This can be even a more realistic case since it models slow
drifts due to temperature variations... I do not know how critical would
be to simulate *all* jitter components of a clock (they may depend on
temperature, power noise, ground noise, ...).

I am assuming that system clock does NOT travel between
devices, or there wouldn't be as much need for the auto 1 bit, unless
this is just a bias leveling, but if isn't real great for that.

Your assumption is correct. No clock distribution between devices.

alb · Jul 30, 2013

Hi Rick,

On 29/07/2013 17:19, rickman wrote:
[]

what do you mean by saying 'it becomes impossible to detect when you
have a sample error'?

I was assuming that perhaps you were doing something I didn't quite
understand, but I'm pretty sure I am on target with this. You *must* up
your sample rate by a sufficient amount so that you can guarantee you
get a minimum of two samples per bit. Otherwise you have no way to
distinguish a slipped sample due to clock mismatch. Clock frequency
mismatch is guaranteed, unless you are using the same clock somehow. Is
that the case? If so, the sampling would just be synchronous and I
don't follow where the problem is.

There's no clock distribution, therefore each end has its own clock
on-board. We are certainly talking about same oscillator frequency, but
how well they match is certainly something we *do not* want to rely on.

It is not just a matter of phase, but of frequency. With a 2x clock,
seeing a transition 3 clocks later doesn't distinguish one bit time from
two bit times.

I agree with you, the 2x clock is not fine enough to adjust for phase
shifts and/or frequency mismatch.

I'm having trouble expressing myself I think, but I'm trying to say the
basic premise of this design is flawed because the sample clock is only
2x the data rate. I say you need 3x and I strongly encourage 4x. At 4x
the samples have four states, expected timing, fast timing, slow timing
and "error" timing meaning the loop control isn't working.

uhm, I didn't quite follow what you mean by 'fast timing' and 'slow
timing'. With perfect frequency matching I would expect a bit to have a
transition on cycle #2 (see graph). If the bit is slightly shifted I
would either notice the transition in cycle 2 or cycle 3 depending on
being it slightly earlier or slightly later than the clock edge.

bit
center
^
|
cycles 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
Data ________--------________--------________--------_____
SmplClk -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
SmplData __________--------________--------________--------___

On perfect frequency match SmplData will be 1 clock delayed.

Data ____----____----____----____----____----____----____
SmplClk --__--__--__--__--__--__--__--__--__--__--__--__--__
SmplData -----____----____----____----____----____----____----

This is how you expect it to work. But if the data is sampled slightly
off it looks like this.

Uhm this graphics shows a clock frequency which is 1x the clock
frequency of the data... Am I missing something??? This will never work
of course...

The sample clock does not need to be any particular ratio to the data
stream if you use an NCO to control the sample rate. Then the phase
detection will bump the rate up and down to suit.

I might use the internal PLL to multiply the clock frequency to x4 data
frequency (=80 MHz) and then phase lock on data just looking at the
transition. If for some reason I see a transition earlier or later I
would adjust my recovered clock accordingly.

I'm sure this stuff has been implemented a gazillions of times.

Do you follow what I am saying? Or have I mistaken what you are doing?

I follow partially...I guess you understood what I'm saying, but I'm
loosing you somewhere in the middle of the explanation (especially with
the graph representing a 1x clock rate...).

alb · Jul 30, 2013

On 29/07/2013 19:40, glen herrmannsfeldt wrote:
[]

Everyone's old favorite asynchronous serial RS232 usually uses a
clock at 16x, though I have seen 64x. From the beginning of the
start bit, it counts half a bit time (in clock cycles), verifies
the start bit (and not random noise) then counts whole bits and
decodes at that point. So, the actual decoding is done with a 1X
clock, but with 16 (or 64) possible phase values. It resynchronizes
at the beginning of each character, so it can't get too far off.

I believe that with 4x or 8x you could easily resync at the bit level.
First transition comes in a shift register (4ff or 8ff), when the shift
register has half of the bit set and half reset you generate a clock to
sample data. Second transition comes in and the same mechanism happens.
The clock recovered is adjust to match when the transition happens in
the middle of the shift register.

Since the protocol is bit stuffed, it won't get too far off.

[]

I'm having trouble expressing myself I think, but I'm trying to say the
basic premise of this design is flawed because the sample clock is only
2x the data rate. I say you need 3x and I strongly encourage 4x. At 4x
the samples have four states, expected timing, fast timing, slow timing
and "error" timing meaning the loop control isn't working.

Seems to me that it should depend on how far of you can get.
For async RS232, you have to stay within about a quarter bit time
over 10 bits, so even if the clock is 2% off, it still works.
But as above, that depends on having a clock of the appropriate
phase.

IMO a phase shift does not matter too much, while the frequency mismatch
will accumulate time differences and lead the transmitter and receiver
to have different timings. But if you lock on phase shift it means you
lock on frequency as well.

rickman · Jul 30, 2013

On 7/30/2013 1:01 PM, alb wrote:

Hi Rick,

On 29/07/2013 17:19, rickman wrote:
[]
what do you mean by saying 'it becomes impossible to detect when you
have a sample error'?

I was assuming that perhaps you were doing something I didn't quite
understand, but I'm pretty sure I am on target with this. You *must* up
your sample rate by a sufficient amount so that you can guarantee you
get a minimum of two samples per bit. Otherwise you have no way to
distinguish a slipped sample due to clock mismatch. Clock frequency
mismatch is guaranteed, unless you are using the same clock somehow. Is
that the case? If so, the sampling would just be synchronous and I
don't follow where the problem is.

There's no clock distribution, therefore each end has its own clock
on-board. We are certainly talking about same oscillator frequency, but
how well they match is certainly something we *do not* want to rely on.

It is not just a matter of phase, but of frequency. With a 2x clock,
seeing a transition 3 clocks later doesn't distinguish one bit time from
two bit times.

I agree with you, the 2x clock is not fine enough to adjust for phase
shifts and/or frequency mismatch.

Ok, we are on the same page then.

I'm having trouble expressing myself I think, but I'm trying to say the
basic premise of this design is flawed because the sample clock is only
2x the data rate. I say you need 3x and I strongly encourage 4x. At 4x
the samples have four states, expected timing, fast timing, slow timing
and "error" timing meaning the loop control isn't working.

uhm, I didn't quite follow what you mean by 'fast timing' and 'slow
timing'. With perfect frequency matching I would expect a bit to have a
transition on cycle #2 (see graph). If the bit is slightly shifted I
would either notice the transition in cycle 2 or cycle 3 depending on
being it slightly earlier or slightly later than the clock edge.

bit
center
^
|
cycles 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
Data ________--------________--------________--------_____
SmplClk -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
SmplData __________--------________--------________--------___

On perfect frequency match SmplData will be 1 clock delayed.

No point in even discussing the "perfect" frequency match.

Data ____----____----____----____----____----____----____
SmplClk --__--__--__--__--__--__--__--__--__--__--__--__--__
SmplData -----____----____----____----____----____----____----

This is how you expect it to work. But if the data is sampled slightly
off it looks like this.

Uhm this graphics shows a clock frequency which is 1x the clock
frequency of the data... Am I missing something??? This will never work
of course...

Yes, you are right, still your diagram above shows a 4x clock. That
will work all day long. It is the 2x clock that doesn't work well. A
3x clock will work but can't provide any info on whether it is sync'd or
not. A 4x clock can tell if the data has slipped giving an error.

What I meant further up by the timing is that your circuit will detect
the data transitions and try to sample near the middle of the stable
portion. So with a 4x clock if it sees a transition where it expects
one, it is "on time". If it sees a transition one clock early it knows
it is "slow", if it sees a transition clock one late it knows it is
"fast". When it sees a transition in the fourth phase, it should assume
that it is out of sync and needs to go into hunt mode. Or you can get
fancier and use some hysteresis for the transitions between "hunt" and
"locked" modes.

I designed this with an NCO controlled PLL. With your async protocol
you should be able to receive a packet based on the close frequency
matching of the two ends. This would really just be correcting for the
phase of the incoming data and not worrying about the frequency
mismatch... like a conventional UART. This circuit can realign every 7
pulses max. That would work I think.

I was making this a bit more complicated because in my case I didn't
have matched frequency clocks, it was specified in the software to maybe
1-2% and the NCO had to PLL to the incoming data to get a frequency
lock. I also didn't have bit stuffing so a long enough string without
transitions would cause a lock slip.

The sample clock does not need to be any particular ratio to the data
stream if you use an NCO to control the sample rate. Then the phase
detection will bump the rate up and down to suit.

I might use the internal PLL to multiply the clock frequency to x4 data
frequency (=80 MHz) and then phase lock on data just looking at the
transition. If for some reason I see a transition earlier or later I
would adjust my recovered clock accordingly.

Yes, that is it exactly. The bit stuffing will give you enough
transitions that you should never lose lock. It is trying to do this at
2x that won't work well because you can't distinguish early from late.

I'm sure this stuff has been implemented a gazillions of times.

Do you follow what I am saying? Or have I mistaken what you are doing?

I follow partially...I guess you understood what I'm saying, but I'm
loosing you somewhere in the middle of the explanation (especially with
the graph representing a 1x clock rate...).

Sorry. If this is not clear now, I'll try the diagram again... lol

I would give you my code, but in theory it is proprietary to someone
else. Just think state machine that outputs a clock enable every four
states, then either adds a state or skips a state to stay in alignment
only when it sees data transitions. If it sees a transition in the
fourth state, it is not in alignment. If there is no transition the FSM
just counts...

A timing diagram is worth a thousand words.

--

Rick

alb · Jul 31, 2013

On 29/07/2013 22:14, rickman wrote:
[]

Everyone's old favorite asynchronous serial RS232 usually uses a
clock at 16x, though I have seen 64x. From the beginning of the
start bit, it counts half a bit time (in clock cycles), verifies
the start bit (and not random noise) then counts whole bits and
decodes at that point. So, the actual decoding is done with a 1X
clock, but with 16 (or 64) possible phase values. It resynchronizes
at the beginning of each character, so it can't get too far off.

Yes, that protocol requires a clock matched to the senders clock to at
least 2.5% IIRC. The protocol the OP describes has much longer char
sequences which implies much tighter clock precision at each end and I'm
expecting it to use a clock recovery circuit... but maybe not. I think
he said they don't use one but get "frequent" errors.

At the physical level the bit stuffing will allow to resync continuously
therefore I'm not concerned if there's a clock recovery circuit.

We are using 40MHz (0.5 ppm stability) but after few seconds you can
already see how many cycles two clocks can drift apart.

I've never analyzed an async design with longer data streams so I don't
know how much precision would be required, but I"m sure you can't do
reliable data recovery with a 2x clock (without a pll). I think this
would contradict the Nyquist criterion.

<neatpick mode on>
Nyquist criterion has nothing to do with being able to sample data. As a
matter of fact your internal clock is perfectly capable to sample data
flowing in your fpga without the need to be 2x the data rate.
<neatpick mode off>

In my earlier comments when I'm talking about a PLL I am referring to a
digital PLL. I guess I should have said a DPLL.

Why bothering? If you have a PLL on your FPGA you can profit of it,
otherwise you need something fancier.

alb · Jul 31, 2013

On 30/07/2013 06:45, Richard Damon wrote:
[]

Uhm, since there's a sync pattern of '111' I have to assume that no
frame is transmitted when only zeros are flowing (with the '1' stuffed
every 6 zeros).

My assumption for the protocol would be that between frames an "all
zero" pattern is sent. (note that this is on the layer above the raw
transport level, where every time 6 zeros are sent, a 1 is added). Thus
all frames will begin with three 1s in a row, as a signal for start of
frame (and also gives a lot of transitions to help lock the clock if
using a pll).

A frame is defined as follows:

- sync :'111'
- header: dtype (4) - n.u.(2) - length (10)
- data : (16) * length

in principle between frames there can be any number of zeros (with bit
stuffing). An 'all zero' pattern in this sense might be of any number of
bits.

[]

I'm thinking about having a system clock multiplied internally via PLL
and then go for a x4 or x8 in order to center the bit properly.

I would think that sampling at 4x of the data rate is an minimum, faster
will give you better margins for frequency errors. So with a 20 MHz data
rate, you need to sample the data at 80 MHz, faster can help, and will
cause less jitter in your recovered data clock out.

I also agree with you, no way a 2x would be sufficient to recover a
phase shift.

Note that the first level of processing will perform data detection and
clock recovery, and this might be where the 40 MHz came from, a 40 MHz
processing system can be told most of the time to take data every other
clock cycle, but have bandwidth to at times if the data is coming in
slightly faster to take data on two consecutive clocks.

A 40 MHz would be sampling 2x, which is clearly not sufficient.

You don't want
to make this clock much faster than that, as then it becomes harder to
design for no benefit. Any higher speed bit detection clock needs to
have the results translated to this domain for further processing. (You
could also generate a recovered clock, but that starts you down the road
to an async design as the recovered clock isn't well related to your
existing clock, being a combinatorial result of registers clocked on
your sampling clock.)

The deframed data (the data portion of the above mentioned frame
structure) are going into a fifo, I think I can rework it to be a dual
clock fifo to cross domain.

rickman · Jul 31, 2013

On 7/31/2013 3:36 AM, alb wrote:

On 29/07/2013 22:14, rickman wrote:
[]
Everyone's old favorite asynchronous serial RS232 usually uses a
clock at 16x, though I have seen 64x. From the beginning of the
start bit, it counts half a bit time (in clock cycles), verifies
the start bit (and not random noise) then counts whole bits and
decodes at that point. So, the actual decoding is done with a 1X
clock, but with 16 (or 64) possible phase values. It resynchronizes
at the beginning of each character, so it can't get too far off.

Yes, that protocol requires a clock matched to the senders clock to at
least 2.5% IIRC. The protocol the OP describes has much longer char
sequences which implies much tighter clock precision at each end and I'm
expecting it to use a clock recovery circuit... but maybe not. I think
he said they don't use one but get "frequent" errors.

At the physical level the bit stuffing will allow to resync continuously
therefore I'm not concerned if there's a clock recovery circuit.

We are using 40MHz (0.5 ppm stability) but after few seconds you can
already see how many cycles two clocks can drift apart.

I've never analyzed an async design with longer data streams so I don't
know how much precision would be required, but I"m sure you can't do
reliable data recovery with a 2x clock (without a pll). I think this
would contradict the Nyquist criterion.

neatpick mode on
Nyquist criterion has nothing to do with being able to sample data. As a
matter of fact your internal clock is perfectly capable to sample data
flowing in your fpga without the need to be 2x the data rate.
neatpick mode off

I don't know what you are talking about. If you asynchronously sample,
you very much do have to satisfy the Nyquist criterion. A 2x clock,
because it isn't *exactly* 2x, can *not* be used to capture a bitstream
so that you can find the the transitions and know which bit is which.
Otherwise there wouldn't be so many errors in the existing circuit.

In my earlier comments when I'm talking about a PLL I am referring to a
digital PLL. I guess I should have said a DPLL.

Why bothering? If you have a PLL on your FPGA you can profit of it,
otherwise you need something fancier.

Not sure of your context. You can't use the PLL on the FPGA to recover
the clock from an arbitrary data stream. It is not designed for that
and will not work because of the gaps in data transitions. It is
designed to allow the multiplication of clock frequencies. A DPLL can
be easily designed to recover the clock, but needs to be greater than 3x
the data rate in order to distinguish the fast condition from the slow
condition.

You can use the FPGA PLL to multiply your clock from 2x to 4x to allow
the DPLL to work correctly.

--

Rick

serial protocol specs and verification

alb

Guest

rickman

Guest

alb

Guest

Richard Damon

Guest

alb

Guest

rickman

Guest

rickman

Guest

glen herrmannsfeldt

Guest

rickman

Guest

glen herrmannsfeldt

Guest

rickman

Guest

Guest

glen herrmannsfeldt

Guest

Richard Damon

Guest

alb

Guest

alb

Guest

rickman

Guest

alb

Guest

alb

Guest

rickman

Guest

Log in

Welcome to EDABoard.com

Sponsor