Pipelining on Multiple Clock Edges

R

rickman

Guest
I recall a processor implementation where the guy tried to say that one
particular part of the pipeline design had a register inserted which was
clocked on the negative edge. I could never see how this would
positively impact anything. In fact, the setup and hold time of the
register, not to mention the routing time, would add to the delay in
that pipeline stage.

Was I missing something or is this ever used to advantage?

--

Rick C
 
On Saturday, 5/13/2017 5:52 PM, rickman wrote:
I recall a processor implementation where the guy tried to say that one
particular part of the pipeline design had a register inserted which was
clocked on the negative edge. I could never see how this would
positively impact anything. In fact, the setup and hold time of the
register, not to mention the routing time, would add to the delay in
that pipeline stage.

Was I missing something or is this ever used to advantage?

Opposite edge pipe registers can be useful if your clock distribution
scheme is not able to guarantee the required hold time. I've used
this in early Xilinx parts that had only 4 internal clock buffers
and I needed to bring in more (relatively slow) inputs using an
additional clock. In those parts you could use "low skew nets" to
route a clock, but even then you'd have hold time issues. In that
particular design everything on the poorly routed clocks went back
and forth between clock edges. That included things like counters,
which would typically use a single N-wide register and feedback from
their own outputs. Instead I needed two N-wide registers (one on
each clock) to remove hold time in the feedback paths. Obviously
this would be painful to do a whole design in, but for me it worked
enough to get the data into distributed RAM for transfer to one of
the internal global clock domains.

--
Gabor
 
On 5/14/2017 4:14 PM, Gabor wrote:
On Saturday, 5/13/2017 5:52 PM, rickman wrote:
I recall a processor implementation where the guy tried to say that
one particular part of the pipeline design had a register inserted
which was clocked on the negative edge. I could never see how this
would positively impact anything. In fact, the setup and hold time of
the register, not to mention the routing time, would add to the delay
in that pipeline stage.

Was I missing something or is this ever used to advantage?


Opposite edge pipe registers can be useful if your clock distribution
scheme is not able to guarantee the required hold time. I've used
this in early Xilinx parts that had only 4 internal clock buffers
and I needed to bring in more (relatively slow) inputs using an
additional clock. In those parts you could use "low skew nets" to
route a clock, but even then you'd have hold time issues. In that
particular design everything on the poorly routed clocks went back
and forth between clock edges. That included things like counters,
which would typically use a single N-wide register and feedback from
their own outputs. Instead I needed two N-wide registers (one on
each clock) to remove hold time in the feedback paths. Obviously
this would be painful to do a whole design in, but for me it worked
enough to get the data into distributed RAM for transfer to one of
the internal global clock domains.

This is an issue of poor clock distribution. The guy using the opposite
edge registers was saying it added a pipeline stage the same as the
positive edge registers. Even if this was done for all logic on all
stages it would not be the same as adding more positive edge registers
because it doesn't speed up the clock. In fact the added setup and hold
time of the added register slows down the circuit.

--

Rick C
 
Den lørdag den 13. maj 2017 kl. 23.52.37 UTC+2 skrev rickman:
I recall a processor implementation where the guy tried to say that one
particular part of the pipeline design had a register inserted which was
clocked on the negative edge. I could never see how this would
positively impact anything. In fact, the setup and hold time of the
register, not to mention the routing time, would add to the delay in
that pipeline stage.

Was I missing something or is this ever used to advantage?

I guess there could be some way that the logic going to and from that register
is fast enough that it would be possible to get and extra cycle for free
 
On Saturday, May 13, 2017 at 6:52:37 PM UTC-3, rickman wrote:
I recall a processor implementation where the guy tried to say that one
particular part of the pipeline design had a register inserted which was
clocked on the negative edge. I could never see how this would
positively impact anything. In fact, the setup and hold time of the
register, not to mention the routing time, would add to the delay in
that pipeline stage.

Sometimes you want a pipeline stage to work in a different clock phase from other stages. This is sometimes done to fit the write-back stage and the op fetch stage in the same clock cycle. Another example was the original MIPS 2000 and how it used the same pins for both the instruction and data caches by using a different phase for the fetch pipeline stage.

And while it is something different, see how the three stage ARM Cortex M0+ pipeline is made to look like a two stage pipeline:

http://microchipdeveloper.com/32arm:m0-pipeline

The alternative is to use a clock with twice the frequency and have enables that make some stages work on even clocks and others on odd ones.

-- Jecel
 
> Was I missing something or is this ever used to advantage?

I imagine it was used to transfer slack from one stage to another. Imagine it's 1976, and you have everything laid out, but then you find that you have some stage with negative slack (let's say a multiplier) followed by a stage with positive slack (let's say a mux). It's hard to move registers back into the multiplier, partly because it would increase the number of FFs, and partly because it's 1976 and you'd have to re-tape everything. So you just have the mux grab the data on the falling clock edge, transferring half a period of slack from the mux to the multiplier so the multiplier has 1.5 cycles and the mux has 0.5. Something like that.
 
On Saturday, May 13, 2017 at 5:52:37 PM UTC-4, rickman wrote:
I recall a processor implementation where the guy tried to say that one
particular part of the pipeline design had a register inserted which was
clocked on the negative edge. I could never see how this would
positively impact anything. In fact, the setup and hold time of the
register, not to mention the routing time, would add to the delay in
that pipeline stage.

Was I missing something or is this ever used to advantage?

I don't know if you have seen this before, but something similar is
described in the book, "But How Do It Know?" by J. Scott Clark:

https://www.amazon.com/But-How-Know-Principles-Computers/dp/0615303765

Someone made a video describing how it is useful for certain types
of slow-clock CPUs:

https://www.youtube.com/watch?v=cNN_tTXABUA

If you look, the computation takes place nearer to the positive edge,
and the write operating takes place nearer to the negative edge, so
that enough time takes place in-between to conduct the workload.

I've seen several designs which trigger in this way. There are also
several methods described in (I believe) Lattice documentation, which
shows how to merge multiple clock signals together to obtain a clock
signal that will dwell fire around the negative edge, and dwell fire
around the positive edge for various purposes.

Thank you,
Rick C. Hodgin
 
On Monday, 5/15/2017 2:29 PM, Kevin Neilson wrote:
Was I missing something or is this ever used to advantage?

I imagine it was used to transfer slack from one stage to another. Imagine it's 1976, and you have everything laid out, but then you find that you have some stage with negative slack (let's say a multiplier) followed by a stage with positive slack (let's say a mux). It's hard to move registers back into the multiplier, partly because it would increase the number of FFs, and partly because it's 1976 and you'd have to re-tape everything. So you just have the mux grab the data on the falling clock edge, transferring half a period of slack from the mux to the multiplier so the multiplier has 1.5 cycles and the mux has 0.5. Something like that.

That implies that the minimum prop delay of the multiplier is
guaranteed to be more than 1/2 clock period. Probably also a
good bet in 1976. In any case this doesn't represent a pipe
stage for 1/2 clock but rather for 1 1/2 clocks.

--
Gabor
 
That implies that the minimum prop delay of the multiplier is
guaranteed to be more than 1/2 clock period. Probably also a
good bet in 1976. In any case this doesn't represent a pipe
stage for 1/2 clock but rather for 1 1/2 clocks.

Yes, it depends on mintimes so it's a poor design technique and would probably stop working when you shrink the die.
 

Welcome to EDABoard.com

Sponsor

Back
Top