A
a s
Guest
Dear all,
I have come up with 2 solutions in VHDL, how to count number of bits
in input data.
The thing I don't understand is why the 2 solutions produce different
results, at least with Xilinx ISE and its XST.
There is quite a substantial difference in required number of slices/
LUTs.
1. solution with unrolled loop: 41 slices, 73 LUTs
2. solution with loop: 54 slices, 100 LUTs
The entity of both architectures is the same:
entity one_count is
Port ( din : in STD_LOGIC_vector(31 downto 0);
dout : out STD_LOGIC_vector(5 downto 0)
);
end one_count;
The architecture with an unrolled loop is the following:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity one_count is
Port ( din : in STD_LOGIC_vector(31 downto 0);
dout : out STD_LOGIC_vector(5 downto 0)
);
end one_count;
architecture one_count_unrolled_arch of one_count is
signal cnt : integer range 0 to 32;
begin
cnt <= to_integer(unsigned(din( 0 downto 0))) +
to_integer(unsigned(din( 1 downto 1))) +
to_integer(unsigned(din( 2 downto 2))) +
to_integer(unsigned(din( 3 downto 3))) +
to_integer(unsigned(din( 4 downto 4))) +
to_integer(unsigned(din( 5 downto 5))) +
to_integer(unsigned(din( 6 downto 6))) +
to_integer(unsigned(din( 7 downto 7))) +
to_integer(unsigned(din( 8 downto 8))) +
to_integer(unsigned(din( 9 downto 9))) +
to_integer(unsigned(din(10 downto 10))) +
to_integer(unsigned(din(11 downto 11))) +
to_integer(unsigned(din(12 downto 12))) +
to_integer(unsigned(din(13 downto 13))) +
to_integer(unsigned(din(14 downto 14))) +
to_integer(unsigned(din(15 downto 15))) +
to_integer(unsigned(din(16 downto 16))) +
to_integer(unsigned(din(17 downto 17))) +
to_integer(unsigned(din(18 downto 18))) +
to_integer(unsigned(din(19 downto 19))) +
to_integer(unsigned(din(20 downto 20))) +
to_integer(unsigned(din(21 downto 21))) +
to_integer(unsigned(din(22 downto 22))) +
to_integer(unsigned(din(23 downto 23))) +
to_integer(unsigned(din(24 downto 24))) +
to_integer(unsigned(din(25 downto 25))) +
to_integer(unsigned(din(26 downto 26))) +
to_integer(unsigned(din(27 downto 27))) +
to_integer(unsigned(din(28 downto 28))) +
to_integer(unsigned(din(29 downto 29))) +
to_integer(unsigned(din(30 downto 30))) +
to_integer(unsigned(din(31 downto 31)));
dout <= std_logic_vector(to_unsigned(cnt,6));
end one_count_unrolled_arch ;
And the architecture with a loop is the following:
architecture one_count_loop_arch of one_count_loop is
signal cnt : integer range 0 to 32;
begin
process(din) is
variable tmp : integer range 0 to 32;
begin
tmp := to_integer(unsigned(din(0 downto 0)));
for i in 1 to 31 loop
tmp := tmp + to_integer(unsigned(din(i downto i)));
end loop;
cnt <= tmp;
end process;
dout <= std_logic_vector(to_unsigned(cnt,6));
end one_count_loop_arch ;
I would be really grateful if somebody could point out what I did
wrong with the 2. solution with loop.
It certainly must be my mistake, but I can not find it...
Additionally, I know that this "brute-force" one counting might not be
the optimal approach,
but this is just my first attempt to get the job done. If somebody has
a better solution, I would
appreciate it if you could share it.
Regards,
Peter
I have come up with 2 solutions in VHDL, how to count number of bits
in input data.
The thing I don't understand is why the 2 solutions produce different
results, at least with Xilinx ISE and its XST.
There is quite a substantial difference in required number of slices/
LUTs.
1. solution with unrolled loop: 41 slices, 73 LUTs
2. solution with loop: 54 slices, 100 LUTs
The entity of both architectures is the same:
entity one_count is
Port ( din : in STD_LOGIC_vector(31 downto 0);
dout : out STD_LOGIC_vector(5 downto 0)
);
end one_count;
The architecture with an unrolled loop is the following:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity one_count is
Port ( din : in STD_LOGIC_vector(31 downto 0);
dout : out STD_LOGIC_vector(5 downto 0)
);
end one_count;
architecture one_count_unrolled_arch of one_count is
signal cnt : integer range 0 to 32;
begin
cnt <= to_integer(unsigned(din( 0 downto 0))) +
to_integer(unsigned(din( 1 downto 1))) +
to_integer(unsigned(din( 2 downto 2))) +
to_integer(unsigned(din( 3 downto 3))) +
to_integer(unsigned(din( 4 downto 4))) +
to_integer(unsigned(din( 5 downto 5))) +
to_integer(unsigned(din( 6 downto 6))) +
to_integer(unsigned(din( 7 downto 7))) +
to_integer(unsigned(din( 8 downto 8))) +
to_integer(unsigned(din( 9 downto 9))) +
to_integer(unsigned(din(10 downto 10))) +
to_integer(unsigned(din(11 downto 11))) +
to_integer(unsigned(din(12 downto 12))) +
to_integer(unsigned(din(13 downto 13))) +
to_integer(unsigned(din(14 downto 14))) +
to_integer(unsigned(din(15 downto 15))) +
to_integer(unsigned(din(16 downto 16))) +
to_integer(unsigned(din(17 downto 17))) +
to_integer(unsigned(din(18 downto 18))) +
to_integer(unsigned(din(19 downto 19))) +
to_integer(unsigned(din(20 downto 20))) +
to_integer(unsigned(din(21 downto 21))) +
to_integer(unsigned(din(22 downto 22))) +
to_integer(unsigned(din(23 downto 23))) +
to_integer(unsigned(din(24 downto 24))) +
to_integer(unsigned(din(25 downto 25))) +
to_integer(unsigned(din(26 downto 26))) +
to_integer(unsigned(din(27 downto 27))) +
to_integer(unsigned(din(28 downto 28))) +
to_integer(unsigned(din(29 downto 29))) +
to_integer(unsigned(din(30 downto 30))) +
to_integer(unsigned(din(31 downto 31)));
dout <= std_logic_vector(to_unsigned(cnt,6));
end one_count_unrolled_arch ;
And the architecture with a loop is the following:
architecture one_count_loop_arch of one_count_loop is
signal cnt : integer range 0 to 32;
begin
process(din) is
variable tmp : integer range 0 to 32;
begin
tmp := to_integer(unsigned(din(0 downto 0)));
for i in 1 to 31 loop
tmp := tmp + to_integer(unsigned(din(i downto i)));
end loop;
cnt <= tmp;
end process;
dout <= std_logic_vector(to_unsigned(cnt,6));
end one_count_loop_arch ;
I would be really grateful if somebody could point out what I did
wrong with the 2. solution with loop.
It certainly must be my mistake, but I can not find it...
Additionally, I know that this "brute-force" one counting might not be
the optimal approach,
but this is just my first attempt to get the job done. If somebody has
a better solution, I would
appreciate it if you could share it.
Regards,
Peter