M
Matthias Alles
Guest
Hi!
I'm trying to implement an add-compare-select (ACS) unit in a Spartan3
but I am not satisfied with the speed. The code looks like this:
....
type sum_array is array(0 to 7) of signed(13 downto 0);
signal state_reg : sum_array;
....
process(clk, rst) is
variable sum1, sum2, sum3, sum4 : sum_array;
begin
if rst = '1' then
....
elsif clk'event and clk = '1' then
...
sum1(0) := state_reg(a1) + gamma(a2);
sum2(0) := state_reg(b1) + gamma(b2);
sum3(0) := state_reg(c1) + gamma(c2);
sum4(0) := state_reg(d1) + gamma(d2);
state_reg(0) <= MIN4(sum1(0), sum2(0), sum3(0), sum4(0));
...
end if; -- rst,clk
end process;
where a1, a2, ..., d2 are some constants. The problem with this code is
that we access the newly calculated state_reg in the very next clock
cycle (the constants can be for instance zero!). Hence pipelining is not
possible. The Minimum search (MIN4) is done by doing 6 subtractions in
parallel to avoid a two stage minimum search tree.
I would like to boost the clock frequency of this architecture. Is there
a way to further improve the description in VHDL? Or is it possible to
do some hand optimisations?
Thanks in advance,
Matthias
I'm trying to implement an add-compare-select (ACS) unit in a Spartan3
but I am not satisfied with the speed. The code looks like this:
....
type sum_array is array(0 to 7) of signed(13 downto 0);
signal state_reg : sum_array;
....
process(clk, rst) is
variable sum1, sum2, sum3, sum4 : sum_array;
begin
if rst = '1' then
....
elsif clk'event and clk = '1' then
...
sum1(0) := state_reg(a1) + gamma(a2);
sum2(0) := state_reg(b1) + gamma(b2);
sum3(0) := state_reg(c1) + gamma(c2);
sum4(0) := state_reg(d1) + gamma(d2);
state_reg(0) <= MIN4(sum1(0), sum2(0), sum3(0), sum4(0));
...
end if; -- rst,clk
end process;
where a1, a2, ..., d2 are some constants. The problem with this code is
that we access the newly calculated state_reg in the very next clock
cycle (the constants can be for instance zero!). Hence pipelining is not
possible. The Minimum search (MIN4) is done by doing 6 subtractions in
parallel to avoid a two stage minimum search tree.
I would like to boost the clock frequency of this architecture. Is there
a way to further improve the description in VHDL? Or is it possible to
do some hand optimisations?
Thanks in advance,
Matthias