C
Chris Hinsley
Guest
Folks, I'm having a hard time getting this bit of code to do what I want.
I have an execution unit I've done that has N results that can be
written back per clock to the register file, and I don't want a ladder
of multiplexers working out what value get's written to each register,
it's too serial and takes too long for my likeing, and get slower as I
add functional units.
So I thought 'aha, I'll put all the results onto a single bus to each
register useing a demuz (demux with z on none written paths) and just
write the values off the bus into the registers that are getting
updated'. Easy ?
Not wanting to put my entire source here I'll show the relevant bits.
And I should add that the design garentees not to write to the same
register twice from any of the inputs.
This is how I prepare the inputs.
tri [(NUM_REGS - 1):0] [(REG_WIDTH - 1):0] val;
wire [(NUM_FUS - 1):0] [(NUM_REGS - 1):0] fuwe;
genvar u;
generate
for (u = 0; u < NUM_FUS; u++)
begin : G1
demuz #(.SEL_BITS(REG_SEL_BITS), .WIDTH(REG_WIDTH))
U(.i_sel(i_c), .i(i_valc), .o_d(fuwe), .o(val));
end
endgenerate
That puts all the inputs onto a shared bus to each register, and keeps
the decoder bits so I can work out what registers have to be written to
later on.
reg [(NUM_REGS - 1):0] [(REG_WIDTH - 1):0] r;
reg [(NUM_REGS - 1):0] p;
reg [(NUM_REGS - 1):0] we;
// calculate write enble bits for the registers
integer i, j;
we[0] = 0;
for (i = 1; i < NUM_REGS; i++)
begin
we = 0;
for (j = 0; j < NUM_FUS; j++)
begin
we |= fuwe[j];
end
end
// write the registers and clear pending bits for those that get written
r[0] = 0;
p[0] = 0;
for (i = 1; i < NUM_REGS; i++)
begin
if (we)
begin
r = val;
p = 0;
end
end
This bit of code in an 'always@(posedge i_clk)' and writes the new
values off the bus into any register that gets written to.
My problem is Quartus keeps turning the tri state bus into selector
logic, warning me that it's done so, and creates the piles of
multiplexers I'm desperate seeking to remove. Unless I've somhow got it
all wrong (which I might) I should be able to have a single shared bus
to each register that contains a correct value for the case where it
gets written.
What I'm trying to do is produce a muliport register write (from
multiple functional units) that dosn't get slower and slower the more
functional units I add. Which it won't if the shared bus idea is
correct. I know I'm going to take a hit on wires, but at least all the
values come through in parralell.
Anyone care to take a look ? Or have the key to getting Quartus to do
what I want ?
Best regards
Chris
I have an execution unit I've done that has N results that can be
written back per clock to the register file, and I don't want a ladder
of multiplexers working out what value get's written to each register,
it's too serial and takes too long for my likeing, and get slower as I
add functional units.
So I thought 'aha, I'll put all the results onto a single bus to each
register useing a demuz (demux with z on none written paths) and just
write the values off the bus into the registers that are getting
updated'. Easy ?
Not wanting to put my entire source here I'll show the relevant bits.
And I should add that the design garentees not to write to the same
register twice from any of the inputs.
This is how I prepare the inputs.
tri [(NUM_REGS - 1):0] [(REG_WIDTH - 1):0] val;
wire [(NUM_FUS - 1):0] [(NUM_REGS - 1):0] fuwe;
genvar u;
generate
for (u = 0; u < NUM_FUS; u++)
begin : G1
demuz #(.SEL_BITS(REG_SEL_BITS), .WIDTH(REG_WIDTH))
U(.i_sel(i_c), .i(i_valc), .o_d(fuwe), .o(val));
end
endgenerate
That puts all the inputs onto a shared bus to each register, and keeps
the decoder bits so I can work out what registers have to be written to
later on.
reg [(NUM_REGS - 1):0] [(REG_WIDTH - 1):0] r;
reg [(NUM_REGS - 1):0] p;
reg [(NUM_REGS - 1):0] we;
// calculate write enble bits for the registers
integer i, j;
we[0] = 0;
for (i = 1; i < NUM_REGS; i++)
begin
we = 0;
for (j = 0; j < NUM_FUS; j++)
begin
we |= fuwe[j];
end
end
// write the registers and clear pending bits for those that get written
r[0] = 0;
p[0] = 0;
for (i = 1; i < NUM_REGS; i++)
begin
if (we)
begin
r = val;
p = 0;
end
end
This bit of code in an 'always@(posedge i_clk)' and writes the new
values off the bus into any register that gets written to.
My problem is Quartus keeps turning the tri state bus into selector
logic, warning me that it's done so, and creates the piles of
multiplexers I'm desperate seeking to remove. Unless I've somhow got it
all wrong (which I might) I should be able to have a single shared bus
to each register that contains a correct value for the case where it
gets written.
What I'm trying to do is produce a muliport register write (from
multiple functional units) that dosn't get slower and slower the more
functional units I add. Which it won't if the shared bus idea is
correct. I know I'm going to take a hit on wires, but at least all the
values come through in parralell.
Anyone care to take a look ? Or have the key to getting Quartus to do
what I want ?
Best regards
Chris