Phase relationship management

C

Chuck McManis

Guest
So here is a "simple" thing that clearly isn't as simple as one would like.

I'm constructing a PWM unit for a robotics application, given the shortage
of pins on my microprocessor, I'm serially clocking in my data.

My "module" has sdata_in, sdata_out, sclock_in, sclock_out, clock_in,
pwm_out.

As I'm doing this in an inexpensive CPLD, I'd like to be able to gang a
couple together and just tie the sclock_out, sdata_out, to the next chip in
series and then create a chain of these things. However, inside my CPLD I'm
using code like

process (sclock_in, sdata_in) is
begin
if rising_edge(sclock_in) then
data_reg <= data_reg(14 downto 0) & sdata_in;
sdata_out <= data_reg(15);
end if;
end process;

sclock_out <= sclock_in;

Now the trick is that I want to insure that the rising edge of sclock_out
really happens when sdata_out is valid. So basically I'd like to "push" the
assignment of sdata_out ahead of the clock or delay clock out. In a non-CPLD
design I could simply put a buffer between the output clock and the next
chip to add some setup time (at the cost of system throughput I know, I
know) but since we're way under the speed limits here (400K serial
bitstream) it wouldn't be too egregious.

Thoughts? Just let it ride? pointers to a discussion on clock management?
--
--Chuck McManis
Email to the devnull address is discarded
http://www.mcmanis.com/chuck/
 
Now the trick is that I want to insure that the rising edge of sclock_out
really happens when sdata_out is valid. So basically I'd like to "push" the
assignment of sdata_out ahead of the clock or delay clock out. In a non-CPLD
design I could simply put a buffer between the output clock and the next
chip to add some setup time (at the cost of system throughput I know, I
know) but since we're way under the speed limits here (400K serial
bitstream) it wouldn't be too egregious.
buffers like that are kludges. They are asking for troubles
with CPLDs or FPGAs since the software is likely to eat them
and/or the routing may be much slower than a buffer. (They
can be made to work if you don't have any other choices and
are willing to hand route and hand check that area.)

Can you clock data_out on the falling edge? You really want
to do something like that. (both in your repeater logic and
at the source)

Another approach which takes more logic is to have a local
clock that runs much faster than the bit rate and just watch
for transistions on the "clock" line. Then you can insert
delays by whole clocks. (including negative delays)

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
Hal Murray wrote:
Now the trick is that I want to insure that the rising edge of sclock_out
really happens when sdata_out is valid. So basically I'd like to "push" the
assignment of sdata_out ahead of the clock or delay clock out. In a non-CPLD
design I could simply put a buffer between the output clock and the next
chip to add some setup time (at the cost of system throughput I know, I
know) but since we're way under the speed limits here (400K serial
bitstream) it wouldn't be too egregious.


buffers like that are kludges. They are asking for troubles
with CPLDs or FPGAs since the software is likely to eat them
and/or the routing may be much slower than a buffer. (They
can be made to work if you don't have any other choices and
are willing to hand route and hand check that area.)

Can you clock data_out on the falling edge? You really want
to do something like that. (both in your repeater logic and
at the source)

Another approach which takes more logic is to have a local
clock that runs much faster than the bit rate and just watch
for transistions on the "clock" line. Then you can insert
delays by whole clocks. (including negative delays)
Besides Hal's comments of 'Care and watch the tools like a hawk',
I'd suggest :
Look at the 4094 / HC4094 data, which show a cascade scheme,
but with a common clock.

Buffering the clock can sometimes be necessary (eg cascade
opto-isolations ), and if you do that, it is good practice to
derive the SHIFT CLOCK from the OUTPUT pin - that avoids
creeping race conditions, and also if ANY form of additional
buffering is being used, you will need to latch your cascade
data on the opposite edge. ( gives 1/2 CLK Tsu.Th )

If you want to cascade a lot of these, you will also need to
watch skew degradation.

-jg
 
On Sun, 16 May 2004 23:59:33 GMT, "Chuck McManis" <devnull@mcmanis.com> wrote:
....
cascaded devices with serial data In and Out, also considering
daisy chained clock. Freq is 400 KHz.
....
A good example of how this is solved is the serial daisy chain
configuration of the Xilinx FPGAs. It even uses one less pin than
what you are thinking of, since there is no need for a clock out
pin. The clock goes to all devices.

What Xilinx does, is it samples the data coming into each chip on
the rising edge, and the daisy chained data out is clocked on the
falling edge.

For your application with a 2.5 us cycle time, that gives you a
setup and hold window of 1.25 us, which should work regardless of
logic family, and the size of your PCB.

Depending on the length of the shift register within each device,
the latency is N + 0.5 clock cycles per device. But this does not
accumulate across devices, as the .5 cycle of latency is
treated as part of the device to device delay.


Example of devices with 5 bits of SR.

DEV1:
rising edge , clocks in data bit 0.

5 th rising edge clocks bit 0 into last
position of SR in device 1.
(DEV 1 now has bits 0..4)

Following falling edge (5.5 cycles after
start), Serial out is updated with bit 0

DEV2:
6 th rising edge clocks bit 0 into the
second device. (and data bit 5 goes into
DEV 1)

This scheme (at your clock rates) is extremely
tolerant of variations in both clock and data
routing. It can tolerate 100's of ns skew.

Obviously it would be best if the original data
source follows this protocol too, changing the
source data at about the same time as the falling
edge of the clock source. This is easily done even
in a bit-banged micro interface.

Philip Freidin



Philip Freidin
Fliptronics
 
Good description. Thanks.

For your application with a 2.5 us cycle time, that gives you a
setup and hold window of 1.25 us, which should work regardless of
logic family, and the size of your PCB.
Unless the chain gets too long and the signal integrity on the
clock isn't good enough.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
"Philip Freidin" <philip@fliptronics.com> wrote in message
news:btnga0lfqn852p9rnj2ff1kjant6m4p76q@4ax.com...
On Sun, 16 May 2004 23:59:33 GMT, "Chuck McManis" <devnull@mcmanis.com
wrote:

....
cascaded devices with serial data In and Out, also considering
daisy chained clock. Freq is 400 KHz.
....


A good example of how this is solved is the serial daisy chain
configuration of the Xilinx FPGAs. It even uses one less pin than
what you are thinking of, since there is no need for a clock out
pin. The clock goes to all devices.

What Xilinx does, is it samples the data coming into each chip on
the rising edge, and the daisy chained data out is clocked on the
falling edge.
An excellent example Philip. I've been re-reading Chang[1] on shift
registers as well. Chang's code for the "generic" shift register is :

1 entity SHIFTR is
2 port {
3 CLK, RSTn, SI : in std_logic;
4 SO : out std_logic;
5 }
6
7 architecture RTL of SHIFTR is
8 signal FF8 : std_logic_vector(7 downto 0);
9 begin
10 posedge: process (RSTn, CLK) is
11 begin
12 if (RSTn = '0') then
13 FF8 <= (FF8'range => '0');
14 elsif (CLK'event and CLK = '1') then
15 FF8 <= SI & FF8(FF8'length-1 downto 1);
16 end if;
17 end process;
18
19 SO <= FF8(0);
20 end RTL;

(typos are mine)

Where he uses concurrent assignment in line 19 to assign the Shift Out
signal to that of the last bit of the shift register.

The inferred hardware is a line of flip-flops configured as you would expect
Q->D from one to the next, clocks in parallel and resets in parallel.

This doesn't explicitly clock out SO on the falling edge, however SO should
have the correct value on the rising edge of a parallel clock right? Since
the propagation delay of the clocking in the new data is non-zero, is it
reasonable to assume that S0 will have the correct data on the rising edge?

Depending on the length of the shift register within each device,
the latency is N + 0.5 clock cycles per device. But this does not
accumulate across devices, as the .5 cycle of latency is
treated as part of the device to device delay.
This was where I get confused. Given that the clock is in parallel,
regardless of the length of the register, should the delay be simply 'x'
where X is the propogation of the Flip flop from D to Q ? I'm thinking that
on any clock the data that is going into the next flip flop is already
sitting on the Q output of its predecessor in the chain.

Example of devices with 5 bits of SR.
[elided]

Obviously it would be best if the original data
source follows this protocol too, changing the
source data at about the same time as the falling
edge of the clock source. This is easily done even
in a bit-banged micro interface.
That I think I can manage :) The driver is probably some 8 bit micro like a
PIC or AVR chip. My next challenge is to figure out how to infer a
transparent latch so that I can clock in new data "behind" the old data and
then "expose" it all at once (keeps my PWM units in sync which is important
in some cases).

--Chuck

-----

[1] "Digital Systems Design with VHDL and Synthesis", K.C. Chang, Chapter 6,
Basic sequential circuits. Pb IEEE Computer Society, ISBN 0-7695-0023-4
 
On Tue, 18 May 2004 04:32:17 GMT, "Chuck McManis" <devnull@mcmanis.com> wrote:
"Philip Freidin" <philip@fliptronics.com> wrote
What Xilinx does, is it samples the data coming into each chip on
the rising edge, and the daisy chained data out is clocked on the
falling edge.

An excellent example Philip. I've been re-reading Chang[1] on shift
registers as well. Chang's code for the "generic" shift register is :

...
vhdl 8 bit shifter, doing right shifts, with shift out comming
directly from the LSB
...

Where he uses concurrent assignment in line 19 to assign the Shift Out
signal to that of the last bit of the shift register.

This doesn't explicitly clock out SO on the falling edge, however SO should
have the correct value on the rising edge of a parallel clock right?
That is correct, SO will be updated on the rising edge of the clock. It
has 1 full clock cycle to get to the input of the first flipflop of
the next shifter in the next chip.

Since
the propagation delay of the clocking in the new data is non-zero, is it
reasonable to assume that S0 will have the correct data on the rising edge?
This is not a good assumption, because you are going across a PCB. The clock
could arrive early or late with respect to the destination of SO.

Given your 2.5 us cycle time, the half clock trick burns half of the cycle
to make this a non issue, as the data changes happen 1.25 us away from the
clock rising edge, thus easilly meeting any setup and hold requirements at
the destination, regardless of any reasonably conceivable clock and data
skew.

Depending on the length of the shift register within each device,
the latency is N + 0.5 clock cycles per device. But this does not
accumulate across devices, as the .5 cycle of latency is
treated as part of the device to device delay.
So the last FF in the SR (FF8(0)) changes on the rising edge. Take
its output to another FF, clocked on the fallin edge. The output of
this FF is the new, 1/2 cycle shifted SO signal.

This was where I get confused. Given that the clock is in parallel,
regardless of the length of the register, should the delay be simply 'x'
where X is the propogation of the Flip flop from D to Q ?
The delay for any flop is CLK to Q, not D to Q.

I'm thinking that
on any clock the data that is going into the next flip flop is already
sitting on the Q output of its predecessor in the chain.
That is right. This is even true of the SO FF I am describing, as it has
had 1/2 a cycle (1.25 us) to get to the output FF.

The latency I was describing is what you see while you are debugging
with your Tek 465. If you look at the SO pin of (for example) a 5 bit
shift register, you will see the data coming out 5.5 cycles after it
went in, but you only use 1/2 a cycle to get to the next FF in the next
chip, so over all, the shifter as seen from the sw point of view is
oblivious to the extra SO FF, and the 1/2 cycle delay.

Philip

Philip Freidin
Fliptronics
 
This doesn't explicitly clock out SO on the falling edge, however SO should
have the correct value on the rising edge of a parallel clock right? Since
the propagation delay of the clocking in the new data is non-zero, is it
reasonable to assume that S0 will have the correct data on the rising edge?
The rules of the game are that you have to meet both setup and hold times.

If you have a string of FFs like a shift register, you are depending
on the clock-out delay of the source FF to cover the hold time of the
target FF. You also have to leave room for clock skew.

If the manufacturer promises that things like that will work in their
silicon, the tools don't have to bother checking for hold times.
Xilinx works that way. I assume others do, but I'm not familiar
with the details.


When you go between chips, you can't ignore hold time anymore. You
have to check them just like you check the setup times.

The case you describe will probably work, but it's possible to
make things like that fail. Clock skew is probably the easiest
way.


The classic clock problem with (really) old CMOS logic was a clock
feeding a long string of DIPs. The capacitive load of the clock
pins turns the clock trace into a loaded transmission line which is
quite a bit slower than the speed of light. Unloaded data bits could
beat the clock and cause hold problems.

Or the input thresholds could be slightly different on a slowly
rising clock so one chip of an adjacent pair clocks ahead of
its neighbor.


Using the other edge of the clock avoids all that nonsense.
It gives you a half cycle of setup and a half cycle of hold.
It will work with horrible clock skew between chips.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 

Welcome to EDABoard.com

Sponsor

Back
Top