always blocks and delays

Strake · Jan 4, 2007

Hi,

I am using verilog to create a control unit for a microprocessor. The
basic premise of the processor is that data transfer is done on the
posedge and that calculation and execution is done on the negedge. The
instruction is is stored in a 32 bit register, of which bits 0-5 are
the opcode. Data transfer is done with 4 64-bit wires known as
path[0:3]; generally, path[2:3] are the operands and path[1] is the
result. (path[0] is reserved for future use.) My code looks something
like this:

// begin verilog code

always @(posedge ck) case(instruction[0:5])
14: begin // add immediate
path[2] <= D_FORM_RA; // load register operand A
path[3] <= D_FORM_SI; // load immediate operand B
alop <= ALU_ADD; // tell ALU that we want it to add
D_FORM_RT = path[1] @(posedge ck);
end
endcase

// end verilog code

Obviously, the actual code will handle more than just one opcode, but
this is a mere example.

My questions are these:
Firstly, will the line
D_FORM_RT = path[1] @(posedge ck);
have the intended effect of transferring the data on the _next_
posedge?
Secondly, will the always block reexecute on the next posedge,
alongside the delayed assignment?

Jonathan Bromley · Jan 4, 2007

On 4 Jan 2007 06:53:33 -0800, "Strake" <strake888@gmail.com> wrote:

I am using verilog to create a control unit for a microprocessor. The
basic premise of the processor is that data transfer is done on the
posedge and that calculation and execution is done on the negedge.

Be aware that this may give your synthesis tool a harder job for
timing optimization, and it means that the duty cycle of your clock
becomes critically important. You may have excellent reasons
for doing this two-phase pipeline, but often it's better to stick
with a simple single-phase clock. Note that this most definitely
was NOT the case in older, latch-based technologies where
polyphase clocks were the order of the day.

)
14: begin // add immediate
path[2] <= D_FORM_RA; // load register operand A
path[3] <= D_FORM_SI; // load immediate operand B
alop <= ALU_ADD; // tell ALU that we want it to add
D_FORM_RT = path[1] @(posedge ck);
end
endcase

Will the line
D_FORM_RT = path[1] @(posedge ck);
have the intended effect of transferring the data on the _next_
posedge?

No; in fact it's illegal syntax. If you add a semicolon just
before the @ then it becomes a legal procedural delay,
but it's probably wrong - see below.

Secondly, will the always block reexecute on the next posedge,
alongside the delayed assignment?

Not if it's a procedural delay, because the body of the "always"
is still executing and hasn't yet worked its way back to the
@(posedge ck) control at the top.

What you *probably* want is this:

D_FORM_RT <= @(posedge ck) path[1];

This nonblocking assignment executes in zero time,
but postpones its signal update to the next clock. Consequently,
the next execution of the always @(posedge ck) will indeed
overlap with the update of D_FORM_RT.

HOWEVER, delays in nonblocking assignments are not
synthesizable. You need to describe the pipeline explicitly.
Something like this... In each branch of the CASE, you
decide what you're going to do to D_FORM_RT on the
*next* clock:

NEXT_D_FORM_RT <= path[1];

And then, in the same clocked always block, but outside
the CASE so that it happens unconditionally, you provide
the pipeline delay:

D_FORM_RT <= NEXT_D_FORM_RT;

Hope this helps
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Strake · Jan 4, 2007

Be aware that this may give your synthesis tool a harder job for
timing optimization, and it means that the duty cycle of your clock
becomes critically important. You may have excellent reasons
for doing this two-phase pipeline, but often it's better to stick
with a simple single-phase clock. Note that this most definitely
was NOT the case in older, latch-based technologies where
polyphase clocks were the order of the day.

This seemed the natural way to do it. If we use every second posedge
for processing and the rest for transfer, an execution unit could
possibly become desynchronized with the rest of the processor (because
of, for example, clock skew or propagation delays) and the control unit
would assume that the execution's or calculation's result is ready when
in fact the unit is in mid-execution. If, however, a distinction is
made between "transfer" clock pulses (posedge) and "execute" clock
pulses (negedge) then this is far less likely to happen.

Perhaps i am wrong, but this is what my sense of logic tells me.

HOWEVER, delays in nonblocking assignments are not
synthesizable. You need to describe the pipeline explicitly.
Something like this... In each branch of the CASE, you
decide what you're going to do to D_FORM_RT on the
*next* clock:

NEXT_D_FORM_RT <= path[1];

And then, in the same clocked always block, but outside
the CASE so that it happens unconditionally, you provide
the pipeline delay:

D_FORM_RT <= NEXT_D_FORM_RT;

Thing is, i need D_FORM_RT to be assigned the (value of path[1] at the
next posedge), not (D_FORM_RT to be assigned the value of path[1]) at
the next clock pulse. The execution unit only starts executing on the
negedge (the inputs are not guaranteed to be stable before then) and
the result must only be saved to a register after the posedge (the
calculation or whatever the execution unit is doing is not guaranteed
to be finished before then). Will this code not assign path[1] to
NEXT_D_FORM_RT before the result is ready, and then transfer that to
D_FORM_RT?

Another thing: by that time, the next instruction will have been loaded
into the instruction register, from which the values D_FORM_RT and the
rest are taken, so one would also need to save D_FORM_RT.

Jonathan Bromley · Jan 4, 2007

On 4 Jan 2007 07:53:55 -0800, "Strake" <strake888@gmail.com> wrote:

Be aware that this may give your synthesis tool a harder job for
timing optimization, and it means that the duty cycle of your clock
becomes critically important. You may have excellent reasons
for doing this two-phase pipeline, but often it's better to stick
with a simple single-phase clock. Note that this most definitely
was NOT the case in older, latch-based technologies where
polyphase clocks were the order of the day.

This seemed the natural way to do it. If we use every second posedge
for processing and the rest for transfer, an execution unit could
possibly become desynchronized with the rest of the processor (because
of, for example, clock skew or propagation delays)

It's a fundamental assumption of synchronous design that this does
not happen. Synthesis and place/route tools will do whatever it
takes to maintain the synchronous assumption, and will report
a timing violation if for some reason they can't achieve it.

However, your biphase clock may well be OK too. Do whatever
you need to do, in order to get a design that you understand.
I wasn't trying to be dogmatic; I just wanted to flag some
possible issues.

Something like this... In each branch of the CASE, you
decide what you're going to do to D_FORM_RT on the
*next* clock:

NEXT_D_FORM_RT <= path[1];

And then, in the same clocked always block, but outside
the CASE so that it happens unconditionally, you provide
the pipeline delay:

D_FORM_RT <= NEXT_D_FORM_RT;

Thing is, i need D_FORM_RT to be assigned the (value of path[1] at the
next posedge), not (D_FORM_RT to be assigned the value of path[1]) at
the next clock pulse.

Ah. Feature creep

So, instead of pipelining path[1], you pipeline an enable signal
that - when it matures one cycle later - enables the then-current
value of path[1] onto D_FORM_RT. One fairly sensible way to
do this might be to keep a pipelined (one cycle old) copy of the
instruction register, and decode it again on the next clock cycle;
so that on each clock you are actually working on two instructions -
the one that you just fetched, and the one you fetched on the
previous clock. The current one decides some things, and the
old one decides other things. You'll need two separate CASE
statements, one to decode each of the two instructions.
Alternatively it may be better to pipeline just the set of enable
signals that you decoded from the instruction on its first clock.
Hey, this is starting to look like a real pipelined CPU

The point here is that, regardless of exactly what you're
trying to do, the pipeline stages need to be described explicitly
for synthesizable RTL coding. Whatever you're doing, there
may well be neater ways of describing it for simulation by
using some of Verilog's concurrency features - assignments
with intra-assignment delay, fork-join blocks... But if you
use those language features, you have a behavioural
model that won't synthesize. It may nevertheless be
useful to help you in visualising and debugging the design,
and it may provide you with a reference model that can
be compared with your synthesizable RTL code in a
testbench.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Strake · Jan 4, 2007

However, your biphase clock may well be OK too. Do whatever
you need to do, in order to get a design that you understand.
I wasn't trying to be dogmatic; I just wanted to flag some
possible issues.

I meant no offense. It's just, if i were to use a monophase clock, i'd
need to count clock pulses and then, for every cycle, check to see
whether the count is even or odd. A biphase clock seemed like a much
more elegant way to do it. Also, there's no guarantee that all
components of the processor will be constantly running; not yet, but
eventually, the processor will shut off any units which have been idle
for many cycles to save power. Execution units and other components may
be stopped and started at a moment's notice, and may start one cycle or
one half-cycle ahead of or behind the other components.

Ah. Feature creep

More or less. I just figured that it would be easier to design it this
way from the start than overhaul it later.

, you pipeline an enable signal
that - when it matures one cycle later - enables the then-current
value of path[1] onto D_FORM_RT. One fairly sensible way to
do this might be to keep a pipelined (one cycle old) copy of the
instruction register, and decode it again on the next clock cycle;
so that on each clock you are actually working on two instructions -
the one that you just fetched, and the one you fetched on the
previous clock. The current one decides some things, and the
old one decides other things. You'll need two separate CASE
statements, one to decode each of the two instructions.
Alternatively it may be better to pipeline just the set of enable
signals that you decoded from the instruction on its first clock.
Hey, this is starting to look like a real pipelined CPU

Wha...? Sorry, i just don't see what you're doing with all of these
saved values, enables, etc.

The point here is that, regardless of exactly what you're
trying to do, the pipeline stages need to be described explicitly
for synthesizable RTL coding. Whatever you're doing, there
may well be neater ways of describing it for simulation by
using some of Verilog's concurrency features - assignments
with intra-assignment delay, fork-join blocks... But if you
use those language features, you have a behavioural
model that won't synthesize. It may nevertheless be
useful to help you in visualising and debugging the design,
and it may provide you with a reference model that can
be compared with your synthesizable RTL code in a
testbench.

When i test it, i'll be using an emulator running on my x86 box.
Slower, but simpler to code

Chris F Clark · Jan 4, 2007

Preliminary caution, I'm not a chip designer (although these days my
official title is "hw architect"), much less an expert one, so take
what I say with the appropriate sized grain of salt.

"Strake" <strake888@gmail.com> writes:

Be aware that this may give your synthesis tool a harder job for
timing optimization, and it means that the duty cycle of your clock
becomes critically important. You may have excellent reasons
for doing this two-phase pipeline, but often it's better to stick
with a simple single-phase clock. Note that this most definitely
was NOT the case in older, latch-based technologies where
polyphase clocks were the order of the day.

This seemed the natural way to do it. If we use every second posedge
for processing and the rest for transfer, an execution unit could
possibly become desynchronized with the rest of the processor (because
of, for example, clock skew or propagation delays) and the control unit
would assume that the execution's or calculation's result is ready when
in fact the unit is in mid-execution. If, however, a distinction is
made between "transfer" clock pulses (posedge) and "execute" clock
pulses (negedge) then this is far less likely to happen.

Perhaps i am wrong, but this is what my sense of logic tells me.

I think that's really more the reason why there is both pre- and
post-synthesis verification. The pre-synthesis verification is to
prove the logical concepts, does the design work abstractly. The
post-synthesis simulation is to show that the design works as
implemented, and that implementation artifacts haven't broken it.

HOWEVER, delays in nonblocking assignments are not
synthesizable. You need to describe the pipeline explicitly.
Something like this... In each branch of the CASE, you
decide what you're going to do to D_FORM_RT on the
*next* clock:

NEXT_D_FORM_RT <= path[1];

And then, in the same clocked always block, but outside
the CASE so that it happens unconditionally, you provide
the pipeline delay:

D_FORM_RT <= NEXT_D_FORM_RT;

Thing is, i need D_FORM_RT to be assigned the (value of path[1] at the
next posedge), not (D_FORM_RT to be assigned the value of path[1]) at
the next clock pulse. The execution unit only starts executing on the
negedge (the inputs are not guaranteed to be stable before then) and
the result must only be saved to a register after the posedge (the
calculation or whatever the execution unit is doing is not guaranteed
to be finished before then). Will this code not assign path[1] to
NEXT_D_FORM_RT before the result is ready, and then transfer that to
D_FORM_RT?

Another thing: by that time, the next instruction will have been loaded
into the instruction register, from which the values D_FORM_RT and the
rest are taken, so one would also need to save D_FORM_RT.

Yes, those are all good concerns, and you should reflect those in the
design. That is Jonathan's point about describing the pipeline
explicitly. You need to figure out what values you need to save in
registers, and when those values are available. You may need to
create extra registers and extra states or even an extra state machine
to describe exactly which transfers should take place.

So, from your description you've got some clock pulses where you need
path[1] assigned to D_FORM_RT. Model that.

// This always block will transfer path(1) to D_FORM_RT on the "desired"
// clock edges (as specified by the setting of path1_to_dform_rt_xfer_needed).
always @posedge(clk)
if path1_to_dform_rt_xfer_needed then begin
D_FORM_RT <= path[1];
end;

Now, in your main body of code, you decide whether you will need the
transfer on the next clock of not with something like this:

always @(posedge ck)
begin
path1_to_dform_rt_xfer_needed <= 0; // don't need this xfer generally
case(instruction[0:5])
14: begin // add immediate
path[2] <= D_FORM_RA; // load register operand A
path[3] <= D_FORM_SI; // load immediate operand B
alop <= ALU_ADD; // tell ALU that we want it to add
path1_to_dform_rt_xfer_needed <= 1; // will need an xfer in this case
end
endcase
end

repeating a section:

Another thing: by that time, the next instruction will have been loaded
into the instruction register, from which the values D_FORM_RT and the
rest are taken, so one would also need to save D_FORM_RT.

If D_FORM_RT is an input from the instruction, then it probably isn't
set from path[1]. However, what I suspect you are saying is that
path[1] should be transfered to some location, perhaps a specific
register in the register file based on D_FORM_RT. That is, what you
really want in the body of the new transfer always block is something
like:

reg[D_FORM_RT] <= path[1];

And, you are right, if the new instruction will have changed the
value, you will need to save it. Thus, your code might look like
this.

always @posedge(clk)
if path1_to_dform_rt_xfer_needed then begin
reg[saved_d_form_rt] <= path[1];
end;

always @(posedge ck)
begin
path1_to_dform_rt_xfer_needed <= 0; // don't need this xfer generally
case(instruction[0:5])
14: begin // add immediate
path[2] <= D_FORM_RA; // load register operand A
path[3] <= D_FORM_SI; // load immediate operand B
alop <= ALU_ADD; // tell ALU that we want it to add
path1_to_dform_rt_xfer_needed <= 1; // will need an xfer in this case
saved_d_form_rt <= D_FORM_RT; // save the transfer destination
end
endcase
end

Note, designing pipelined architectures is non-trivial, and one needs
to lay out exactly what happens when, often creating timing diagrams
to allow one to visualize the steps and validate that all of the steps
have a stage when they do their work and that the work is completed by
the stage when the work is needed.

On the other hand, there are now new tools that attempt to do some of
that work for you, by automatically creating pipelines with a variety
of tradeoffs. The companies Celoxica and Forte come to mind.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------

Jonathan Bromley · Jan 4, 2007

On 4 Jan 2007 08:32:10 -0800, "Strake" <strake888@gmail.com> wrote:

I meant no offense.

None taken.

Wha...? Sorry, i just don't see what you're doing with all of these
saved values, enables, etc.

I sense that we're coming at this from two different ends of
a spectrum, with you very much at the algorithmic/software
end and me at the RTL/implementation/hardware end.
Unfortunately, synthesis tools are much closer to my end
than to yours

The basic problem is this. If, on clock cycle N, I wish
to schedule some activity that will take place on
clock cycle N+1, then I must keep copies of:
- any data that will be needed to accomplish that activity;
- enough information to allow me to decide, *at the time of
clock cycle N+1*, what it is that I'm supposed to do then.
The first of these is my "saved values", the second is my
"enables". These saved copies, which are given their
values on clock cycle N and will take effect on cycle N+1,
must be stored in real physical registers; and if I
wish to synthesise this design, I must declare those
registers and define their behaviour explicitly.

When i test it, i'll be using an emulator running on my x86 box.
Slower, but simpler to code

Yes, but then you're testing your ARCHITECTURE, not the
IMPLEMENTATION! At some point you will need a Verilog
testbench to verify that the synthesisable RTL design
indeed behaves in the same way as your instruction set
emulator. The reference model - your emulator - may
be written in C rather than Verilog, in which case you'll
need the Verilog simulator's PLI (Programming Language
Interface) to link the two languages. Alternatively you
may be able to get the emulation and the Verilog
simulation to both read and write data files in the
same format, and you can then diff the output files
from simulation and emulation to check that the
Verilog gave the same results as emulation.

--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Strake · Jan 5, 2007

If D_FORM_RT is an input from the instruction, then it probably isn't
set from path[1]. However, what I suspect you are saying is that
path[1] should be transfered to some location, perhaps a specific
register in the register file based on D_FORM_RT. That is, what you
really want in the body of the new transfer always block is something
like:

reg[D_FORM_RT] <= path[1];

Apologies. I forgot to mention this, but D_FORM_RT is defined as the
_register_ encoded by that part of the instruction, not the bits
themselves from the instruciton.

And, you are right, if the new instruction will have changed the
value, you will need to save it. Thus, your code might look like
this.

always @posedge(clk)
if path1_to_dform_rt_xfer_needed then begin
reg[saved_d_form_rt] <= path[1];
end;

always @(posedge ck)
begin
path1_to_dform_rt_xfer_needed <= 0; // don't need this xfer generally
case(instruction[0:5])
14: begin // add immediate
path[2] <= D_FORM_RA; // load register operand A
path[3] <= D_FORM_SI; // load immediate operand B
alop <= ALU_ADD; // tell ALU that we want it to add
path1_to_dform_rt_xfer_needed <= 1; // will need an xfer in this case
saved_d_form_rt <= D_FORM_RT; // save the transfer destination
end
endcase
end

D form, however, is not the only instruction form. There are others: X,
B, I, etc with operands encoded in different locations. How is the
first always block to know which form the previous instruction took?

Strake · Jan 5, 2007

I just found the problem with the biphase clock

Not sure if this is the one that you were thinking of, but i just
realized that the control unit is transferring the operands from the
registers to the ALU at the same time as it transfers the result of the
previous instruction to the registers, creating a race condition. :S

I concede this point.

The basic problem is this. If, on clock cycle N, I wish
to schedule some activity that will take place on
clock cycle N+1, then I must keep copies of:
- any data that will be needed to accomplish that activity;
- enough information to allow me to decide, *at the time of
clock cycle N+1*, what it is that I'm supposed to do then.
The first of these is my "saved values", the second is my
"enables". These saved copies, which are given their
values on clock cycle N and will take effect on cycle N+1,
must be stored in real physical registers; and if I
wish to synthesise this design, I must declare those
registers and define their behaviour explicitly.

So, if i understand correctly, you're saying that i must have 2 control
units? one for "now", and one for "1 instruction ago"?

Jonathan Bromley · Jan 5, 2007

On 4 Jan 2007 18:48:52 -0800, "Strake"
<strake888@gmail.com> wrote:

I just found the problem with the biphase clock

Urrrrm, I don't think so...

Not sure if this is the one that you were thinking of, but i just
realized that the control unit is transferring the operands from the
registers to the ALU at the same time as it transfers the result of the
previous instruction to the registers, creating a race condition. :S

This is NOT a race condition in synchronous hardware, and that's
exactly the point.

Consider this very simple pipeline:

always @(posedge clk)
stage1 <= input_value;

always @(posedge clk)
stage2 <= stage1;

If you're a software bod you might imagine that there's a
race here, because it transfers input data to stage1 at the
same clock edge as it transfers stage1 to stage2. However,
THERE IS NO RACE; the nonblocking (<=) assignments
see to that. (If you use regular blocking (=) assignment
then the race condition is there, in all its horrible glory.
Don't do that.)

In hardware, freedom from races is assured
by two things:
(1) after a clock edge, each flip-flop holds its old output
value for long enough that the next flip-flop has a chance
to capture the old value on the same clock;
(2) clock distribution networks ensure that your clocks
are supplied to all the flops at near enough the same time
for (1) to work correctly.

In Verilog simulation, the same effect is achieved with zero
time delay because the updating of variables assigned using
nonblocking <= assignment is postponed until all other
activity has occurred as a result of the clock edge - in
other words, both my "always" blocks get executed, both
assignments have their right-hand side expressions
evaluated, but the two target variables are not yet
updated until after ALL that stuff has taken place.

I hate to be grumpy and "I-told-you-so", but this is
absolutely basic Verilog-for-design stuff, and if you
are struggling with it then you very much need the
help of either a good textbook (plenty of suggestions
on the comp.lang.verilog FAQ) or a good training
course, which is where my company comes in

[...]

So, if i understand correctly, you're saying that i
must have 2 control units? one for "now",
and one for "1 instruction ago"?

Yes, or something equivalent; it may be that you
can determine all your control effects "now",
and then save some representation of those
effects (all the control signals they wiggle)
for use one cycle later. Those saved control
signals then represent "control effects of
the instruction I processed on the last clock".
It's up to you to decide which implemetation
is more maintainable, or cheaper (= less logic),
or could run at a higher clock rate (= less
logic between pairs of registers). As others
have pointed out, the design of pipelined processors
isn't trivial, and there are many possible ways to
go about it.

Of course, if you're prepared to accept poorer
performance you can skip all the ghastly
pipelining stuff and just hang on to an instruction
until you're done with it several clock cycles later,
and only then move on to the next. Somehow
I feel that won't satisfy you, though ;-)
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Strake · Jan 5, 2007

This is NOT a race condition in synchronous hardware, and that's
exactly the point.

Sorry. I've been programming software for years, but i started with
Verilog much more recently.

I hate to be grumpy and "I-told-you-so", but this is
absolutely basic Verilog-for-design stuff, and if you
are struggling with it then you very much need the
help of either a good textbook (plenty of suggestions
on the comp.lang.verilog FAQ) or a good training
course, which is where my company comes in

I'm in Canada, so i doubt that your company offers a course where i
live, but i'll certainly look around for some of those books that were
mentioned in the FAQ.

Up till now, the only resources i've been using were various online
tutorial and reference works and, of course, Usenet

Yes, or something equivalent; it may be that you
can determine all your control effects "now",
and then save some representation of those
effects (all the control signals they wiggle)
for use one cycle later. Those saved control
signals then represent "control effects of
the instruction I processed on the last clock".
It's up to you to decide which implemetation
is more maintainable, or cheaper (= less logic),
or could run at a higher clock rate (= less
logic between pairs of registers). As others
have pointed out, the design of pipelined processors
isn't trivial, and there are many possible ways to
go about it.

Instead of saving all of this information, could one not have multiple
stages: one to load the data from registers, one to process it, etc,
and have each stage pass the data to the next when it's done? (Not sure
whether this is considered a proper pipeline or not).

Of course, if you're prepared to accept poorer
performance you can skip all the ghastly
pipelining stuff and just hang on to an instruction
until you're done with it several clock cycles later,
and only then move on to the next. Somehow
I feel that won't satisfy you, though ;-)

It probably won't

I figure, if it's possible to run one instruction every clock cycle,
why not do it? May as well, i'd get more instructions per second out of
it.

Jonathan Bromley · Jan 5, 2007

On 5 Jan 2007 03:03:51 -0800, "Strake"
<strake888@gmail.com> wrote:

I'm in Canada

In which case, what the blazes are you doing awake NOW?

Instead of saving all of this information, could one not have multiple
stages: one to load the data from registers, one to process it, etc,
and have each stage pass the data to the next when it's done? (Not sure
whether this is considered a proper pipeline or not).

It most certainly is a proper pipeline. However, it's not quite that
simple. In a CPU, the processing (etc) is not the same every time;
it's affected by the instruction. So as the data flows down the
pipeline, it needs to be accompanied by some control information
to say what the next pipeline stage is required to do.

In a sense, I suppose that's just part of the data. But it's usually
easier to handle control information somewhat separately.

I figure, if it's possible to run one instruction every clock cycle,
why not do it? May as well, i'd get more instructions per
second out of it.

A laudable objective. Don't forget, though, that you may
be obliged to stall the pipeline if one stage is waiting on
the results of another that hasn't yet completed, or
if more than one stage needs the same resource such
as access to the external data bus. And you also need
to worry about flushing the pipeline when you take a
conditional branch. Fun, fun, fun.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Strake · Jan 5, 2007

In which case, what the blazes are you doing awake NOW?

We have six time zones...

It most certainly is a proper pipeline. However, it's not quite that
simple. In a CPU, the processing (etc) is not the same every time;
it's affected by the instruction. So as the data flows down the
pipeline, it needs to be accompanied by some control information
to say what the next pipeline stage is required to do.

In a sense, I suppose that's just part of the data. But it's usually
easier to handle control information somewhat separately.

Well, i suppose each stage could decode the instruction separately, but
there's probably a flaw in that somewhere.

A laudable objective. Don't forget, though, that you may
be obliged to stall the pipeline if one stage is waiting on
the results of another that hasn't yet completed, or
if more than one stage needs the same resource such
as access to the external data bus. And you also need
to worry about flushing the pipeline when you take a
conditional branch. Fun, fun, fun.

To stall, could one just "stop the clock" on the stalled pipeline
stages, or would it have to be more involved?

If the branch is accurately predicted, no flush is needed. If not,
well, that's when stickiness ensues. But as long as the incorrectly
predicted instruction hasn't yet been executed, all of the pipeline
stages can simply be given NOPs and the pipeline can begin execution
afresh.

Jonathan Bromley · Jan 5, 2007

On 5 Jan 2007 04:25:47 -0800, "Strake" <strake888@gmail.com> wrote:

In which case, what the blazes are you doing awake NOW?
We have six time zones...

in none of which are normal people awake at 10:00 GMT!

Well, i suppose each stage could decode the instruction separately, but
there's probably a flaw in that somewhere.

No, I don't think there is; not in principle, anyway. But it's worth
observing that in some architectures the instruction decode
process is rather complicated, so it may be cheaper to
replicate the *results* of the decode rather than the
decode logic itself. This is one of those YMMV things:
there is no one right answer.

To stall, could one just "stop the clock" on the stalled pipeline
stages, or would it have to be more involved?

You could; but it's typically easier to do a clock enable on
each stage:

always @(posedge clk) if (enable_this_stage) begin
.....
end

and then some overall pipeline manager distributes
enable signals to each stage of the pipeline.

If the branch is accurately predicted,
and the moons of Saturn are aligned with the

ley line passing through Stonehenge, and...

no flush is needed.
But as long as the incorrectly
predicted instruction hasn't yet been executed, all of the pipeline
stages can simply be given NOPs and the pipeline can begin execution
afresh.

Precisely so. It still needs managing, though; I suspect that
when you cone to do the detailed implementation of it, you'll
find that it is not something you can do over a coffee break.

Good luck with it. Obviously it's something that's been
done many times before, but designing even a simple CPU
is still a fantastic learning experience.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Strake · Jan 6, 2007

I think i can take it from here. Thanks for all of the help.

always blocks and delays

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Chris F Clark

Guest

Jonathan Bromley

Guest

Strake

Guest

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

always blocks and delays

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Chris F Clark

Guest

Jonathan Bromley

Guest

Strake

Guest

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Jonathan Bromley

Guest

Strake

Guest

Log in

Welcome to EDABoard.com

Sponsor