Counting bits

S

Steve

Guest
I am trying to implement a simple circuit that counts the number of 1's in a
16 or 32 bit register.

I am fairly new to VHDL and have implemented it the following way

for i in 15 downto 0 loop
if (Reg(i) = '1') then
Bitcount := BitCount + 1;
end if;
end loop;

in synthesizing this code, i found that this generated numerous 16 bit
adders. is there a better way to implement consider the solution can be
stored in only 4 bits?

Steve
 
"Steve" <scoe.NoSpam@uoguelph.ca> wrote in message
news:uqemc.411116$2oI1.100046@twister01.bloor.is.net.cable.rogers.com...
I am trying to implement a simple circuit that counts the number of 1's in
a
16 or 32 bit register.

I am fairly new to VHDL and have implemented it the following way

for i in 15 downto 0 loop
if (Reg(i) = '1') then
Bitcount := BitCount + 1;
end if;
end loop;

in synthesizing this code, i found that this generated numerous 16 bit
adders. is there a better way to implement consider the solution can be
stored in only 4 bits?
Here a recursive solution I wrote a long time ago.
Notice that due to the variable declaration
VARIABLE nmb : INTEGER RANGE 0 TO a'LENGTH;
the synthesis tool should not use larger adders then needed.

I synthesised it resulting (with this w=8) in:
2 1-bist adders
2 2-bits adders
1 3-bits adder

Egbert Molenkamp

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY count IS
GENERIC (w : positive := 8);
PORT (a : IN std_logic_vector(w-1 DOWNTO 0);
q : OUT integer);
END count;

ARCHITECTURE recursive OF count IS
FUNCTION cnt (a:std_logic_vector) RETURN integer IS
VARIABLE nmb : INTEGER RANGE 0 TO a'LENGTH;
VARIABLE ai : std_logic_vector(a'LENGTH-1 DOWNTO 0);
CONSTANT middle : integer := a'LENGTH/2;
BEGIN
ai := a;
IF ai'LENGTH>=2 THEN
nmb := cnt(ai(ai'LENGTH-1 DOWNTO middle))
+ cnt(ai(middle-1 DOWNTO 0));
ELSE
IF ai(0)='1' THEN nmb:=1; ELSE nmb:=0; END IF;
END IF;
RETURN nmb;
END cnt;
BEGIN
q <= cnt(a);
END recursive;
 
Steve wrote:

I am trying to implement a simple circuit that counts the number of 1's in a
16 or 32 bit register.

in synthesizing this code, i found that this generated numerous 16 bit
adders. is there a better way to implement consider the solution can be
stored in only 4 bits?
If it's possible in your design you could make a multicycle design using
a shift register and a 4-bit counter:

process(clk)
begin
if (rising_edge(clk)) then
for i in 15 downto 1 loop
Reg(i) <= Reg(i - 1);
end loop;
if (Reg(0) = '1') then
Bitcount <= Bitcount + 1;
end if;
end if;
end process;


Anders
 
"Steve" <scoe.NoSpam@uoguelph.ca> wrote in message news:<uqemc.411116$2oI1.100046@twister01.bloor.is.net.cable.rogers.com>...
I am trying to implement a simple circuit that counts the number of 1's in a
16 or 32 bit register.

I am fairly new to VHDL and have implemented it the following way

for i in 15 downto 0 loop
if (Reg(i) = '1') then
Bitcount := BitCount + 1;
end if;
end loop;

in synthesizing this code, i found that this generated numerous 16 bit
adders. is there a better way to implement consider the solution can be
stored in only 4 bits?

Steve
I've had similar experience with for-loops in synthesizable code --
there's little, if no control of the implementation. Here's a
Confluence solution. With the help of "tree", it recursively builds
an combinatorial adder network with minimal logic:

(* Parametric bit counter. Counter grows based on vector width. *)
component count_bits +vector -count with bin_op uni_op is
component bin_op +a +b -x is
x = ('0' '++' a) '+' ('0' '++' b) (* Pad a zero bit and add. *)
end
component uni_op +a -x is
x = '0' '++' a (* Pad a zero bit. *)
end
{tree bin_op uni_op {list_of_bits vector $} count}
end

(* Instantation example: *)
{count_bits '00110101' count} (* count would equal '0100' *)

-Tom
 
Earlier I replayed with a recursive solution using a function but also a
component can be used this way (and still synthesisable). This looks like
your solution.

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY count IS
GENERIC (w : positive := 8);
PORT (a : IN std_logic_vector(w-1 DOWNTO 0);
q : OUT integer RANGE 0 TO w);
END count;

ARCHITECTURE recursive OF count IS
COMPONENT count
GENERIC (w : positive := 8);
PORT (a : IN std_logic_vector(w-1 DOWNTO 0);
q : OUT integer RANGE 0 TO w);
END COMPONENT;
CONSTANT middle : integer := w/2;
SIGNAL out_left,out_right : integer RANGE 0 TO w;
BEGIN
recursive:IF w >= 2 GENERATE
left:count
GENERIC MAP (w-middle)
PORT MAP (a(w-1 DOWNTO middle),out_left);
right:count
GENERIC MAP (middle)
PORT MAP (a(middle-1 DOWNTO 0),out_right);
q <= out_left + out_right;
END GENERATE;

last:IF w = 1 GENERATE
q <= 1 WHEN a(0)='1' ELSE 0;
END GENERATE;
END recursive;

Egbert Molenkamp

"Tom Hawkins" <tom1@launchbird.com> schreef in bericht
news:833030c0.0405070730.119f8148@posting.google.com...
"Steve" <scoe.NoSpam@uoguelph.ca> wrote in message
news:<uqemc.411116$2oI1.100046@twister01.bloor.is.net.cable.rogers.com>...
I am trying to implement a simple circuit that counts the number of 1's
in a
16 or 32 bit register.

I am fairly new to VHDL and have implemented it the following way

for i in 15 downto 0 loop
if (Reg(i) = '1') then
Bitcount := BitCount + 1;
end if;
end loop;

in synthesizing this code, i found that this generated numerous 16 bit
adders. is there a better way to implement consider the solution can be
stored in only 4 bits?

Steve

I've had similar experience with for-loops in synthesizable code --
there's little, if no control of the implementation. Here's a
Confluence solution. With the help of "tree", it recursively builds
an combinatorial adder network with minimal logic:

(* Parametric bit counter. Counter grows based on vector width. *)
component count_bits +vector -count with bin_op uni_op is
component bin_op +a +b -x is
x = ('0' '++' a) '+' ('0' '++' b) (* Pad a zero bit and add. *)
end
component uni_op +a -x is
x = '0' '++' a (* Pad a zero bit. *)
end
{tree bin_op uni_op {list_of_bits vector $} count}
end

(* Instantation example: *)
{count_bits '00110101' count} (* count would equal '0100' *)

-Tom
 
I guess that this is indeed after all the same solution as your initial one.

I like however by far the elegance of your first one. No need to make it
more complex.

Having said this , it's clear that a number of related problems in the
sorting and counting area can be solved by this recursive solutions
(that I like so much...) The only time I would suggest going to the
"recursive" structural solution you described in this post is when the
basic building block (simple adder in this case) is non-trivial and you
want to make sure that a predefined one is used.

Regards,

Jos De Laender


Egbert Molenkamp wrote:
Earlier I replayed with a recursive solution using a function but also a
component can be used this way (and still synthesisable). This looks like
your solution.

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY count IS
GENERIC (w : positive := 8);
PORT (a : IN std_logic_vector(w-1 DOWNTO 0);
q : OUT integer RANGE 0 TO w);
END count;

ARCHITECTURE recursive OF count IS
COMPONENT count
GENERIC (w : positive := 8);
PORT (a : IN std_logic_vector(w-1 DOWNTO 0);
q : OUT integer RANGE 0 TO w);
END COMPONENT;
CONSTANT middle : integer := w/2;
SIGNAL out_left,out_right : integer RANGE 0 TO w;
BEGIN
recursive:IF w >= 2 GENERATE
left:count
GENERIC MAP (w-middle)
PORT MAP (a(w-1 DOWNTO middle),out_left);
right:count
GENERIC MAP (middle)
PORT MAP (a(middle-1 DOWNTO 0),out_right);
q <= out_left + out_right;
END GENERATE;

last:IF w = 1 GENERATE
q <= 1 WHEN a(0)='1' ELSE 0;
END GENERATE;
END recursive;

Egbert Molenkamp

"Tom Hawkins" <tom1@launchbird.com> schreef in bericht
news:833030c0.0405070730.119f8148@posting.google.com...

"Steve" <scoe.NoSpam@uoguelph.ca> wrote in message

news:<uqemc.411116$2oI1.100046@twister01.bloor.is.net.cable.rogers.com>...

I am trying to implement a simple circuit that counts the number of 1's

in a

16 or 32 bit register.

I am fairly new to VHDL and have implemented it the following way

for i in 15 downto 0 loop
if (Reg(i) = '1') then
Bitcount := BitCount + 1;
end if;
end loop;

in synthesizing this code, i found that this generated numerous 16 bit
adders. is there a better way to implement consider the solution can be
stored in only 4 bits?

Steve

I've had similar experience with for-loops in synthesizable code --
there's little, if no control of the implementation. Here's a
Confluence solution. With the help of "tree", it recursively builds
an combinatorial adder network with minimal logic:

(* Parametric bit counter. Counter grows based on vector width. *)
component count_bits +vector -count with bin_op uni_op is
component bin_op +a +b -x is
x = ('0' '++' a) '+' ('0' '++' b) (* Pad a zero bit and add. *)
end
component uni_op +a -x is
x = '0' '++' a (* Pad a zero bit. *)
end
{tree bin_op uni_op {list_of_bits vector $} count}
end

(* Instantation example: *)
{count_bits '00110101' count} (* count would equal '0100' *)

-Tom
 
"Steve" <scoe.NoSpam@uoguelph.ca> wrote in message
news:uqemc.411116$2oI1.100046@twister01.bloor.is.net.cable.rogers.com...
I am trying to implement a simple circuit that counts the number of
1's in a
16 or 32 bit register.

I am fairly new to VHDL and have implemented it the following way

for i in 15 downto 0 loop
if (Reg(i) = '1') then
Bitcount := BitCount + 1;
end if;
end loop;

in synthesizing this code, i found that this generated numerous 16 bit
adders. is there a better way to implement consider the solution can
be
stored in only 4 bits?

Steve

A few years ago I needed a circuit to do the same thing and looked at a
number of alternatives. Basically, what a circuit like this needs to do
is to add n 1-bit numbers. However, if you just start cascading a bunch
of 1-bit full adders you will end up with some redundant logic and a
number of untestable faults. The reason is that not all bit
combinations are possible in the final result. For example, for 16-bit
input vector you need a 5-bit vector to hold the result. But, only the
combinations of 00000 to 10000 (0-16) are possible. The remaining
combinations, 10001 to 11111 (17-31) can never occur. The same thing
applies to the intermediate stages leading to the final result. I spent
some time studying how to optimize this type of circuit and came up with
2, 4, 8, and 16-bit solutions that use optimized adders at each stage
that handle only the bit combinations that can possibly occur and none
others.

Here are my 8-bit and 16-bit summing networks. It may look like a lot
of equations, but the results are optimal with no redundancies and are
100% testable.

-- This macro outputs a binary count of the number of input bits that
-- are active at one time. It effectively sums 8 1-bit numbers.
LIBRARY IEEE;
USE IEEE.Std_Logic_1164.all;

ENTITY sum8 IS
PORT(a : IN Std_uLogic_vector(1 TO 8);
sum : OUT Std_uLogic_vector(3 downto 0) );
END ENTITY;

ARCHITECTURE beh OF sum8 IS

BEGIN

PROCESS(a)
VARIABLE s2aa, s2ab, s2ba, s2bb, s2ca, s2cb, s2da, s2db,
s4aa, s4ab, s4ac, s4ba, s4bb, s4bc, s8aaa
: std_logic;
VARIABLE s8 : Std_uLogic_vector(3 downto 0);
BEGIN
-- first level summing networks
s2aa := a(1) XOR a(2);
s2ab := a(1) AND a(2);

s2ba := a(3) XOR a(4);
s2bb := a(3) AND a(4);

s2ca := a(5) XOR a(6);
s2cb := a(5) AND a(6);

s2da := a(7) XOR a(8);
s2db := a(7) AND a(8);

-- second level summing networks
s4aa := s2aa XOR s2ba;
s4ab := (s2aa AND s2ba) OR (s2ab XOR s2bb);
s4ac := s2ab AND s2bb;

s4ba := s2ca XOR s2da;
s4bb := (s2ca AND s2da) OR (s2cb XOR s2db);
s4bc := s2cb AND s2db;

-- third level summing networks
s8(0) := s4aa XOR s4ba;
s8aaa := s4aa AND s4ba;
s8(1) := s8aaa XOR (s4ab XOR s4bb);
s8(2) := (s8aaa AND (s4ab OR s4bb)) OR (s4ab AND s4bb) OR (s4ac XOR
s4bc);
s8(3) := s4ac AND s4bc;

sum <= s8;
END PROCESS;

END ARCHITECTURE beh;


-- This macro outputs a binary count of the number of input bits that
-- are active at one time. It effectively sums 16 1-bit numbers.
LIBRARY IEEE;
USE IEEE.Std_Logic_1164.all;

ENTITY sum16 IS
PORT(a : IN Std_uLogic_vector(1 TO 16);
sum : OUT Std_uLogic_vector(4 downto 0) );
END ENTITY;

ARCHITECTURE beh OF sum16 IS

BEGIN

PROCESS(a)
VARIABLE s2aa, s2ab, s2ba, s2bb, s2ca, s2cb, s2da, s2db, s2ea, s2eb,
s2fa, s2fb, s2ga, s2gb, s2ha, s2hb,
s4aa, s4ab, s4ac, s4ba, s4bb, s4bc, s4ca, s4cb, s4cc, s4da, s4db,
s4dc,
s8aa, s8aaa, s8ab, s8ac, s8ad, s8ba, s8baa, s8bb, s8bc, s8bd,
s16aa, s16bb
: Std_uLogic;
VARIABLE s16 : Std_uLogic_vector(4 downto 0);
BEGIN
-- first level summing networks
s2aa := a(1) XOR a(2);
s2ab := a(1) AND a(2);

s2ba := a(3) XOR a(4);
s2bb := a(3) AND a(4);

s2ca := a(5) XOR a(6);
s2cb := a(5) AND a(6);

s2da := a(7) XOR a(8);
s2db := a(7) AND a(8);

s2ea := a(9) XOR a(10);
s2eb := a(9) AND a(10);

s2fa := a(11) XOR a(12);
s2fb := a(11) AND a(12);

s2ga := a(13) XOR a(14);
s2gb := a(13) AND a(14);

s2ha := a(15) XOR a(16);
s2hb := a(15) AND a(16);

-- second level summing networks
s4aa := s2aa XOR s2ba;
s4ab := (s2aa AND s2ba) OR (s2ab XOR s2bb);
s4ac := s2ab AND s2bb;

s4ba := s2ca XOR s2da;
s4bb := (s2ca AND s2da) OR (s2cb XOR s2db);
s4bc := s2cb AND s2db;

s4ca := s2ea XOR s2fa;
s4cb := (s2ea AND s2fa) OR (s2eb XOR s2fb);
s4cc := s2eb AND s2fb;

s4da := s2ga XOR s2ha;
s4db := (s2ga AND s2ha) OR (s2gb XOR s2hb);
s4dc := s2gb AND s2hb;

-- third level summing networks
s8aa := s4aa XOR s4ba;
s8aaa := s4aa AND s4ba;
s8ab := s8aaa XOR (s4ab XOR s4bb);
s8ac := (s8aaa AND (s4ab OR s4bb)) OR (s4ab AND s4bb) OR (s4ac XOR
s4bc);
s8ad := s4ac AND s4bc;

s8ba := s4ca XOR s4da;
s8baa := s4ca AND s4da;
s8bb := s8baa XOR (s4cb XOR s4db);
s8bc := (s8baa AND (s4cb OR s4db)) OR (s4cb AND s4db) OR (s4cc XOR
s4dc);
s8bd := s4cc AND s4dc;

-- fourth level summing network
s16(0) := s8aa XOR s8ba;
s16aa := s8aa AND s8ba;
s16(1) := s16aa XOR (s8ab XOR s8bb);
s16bb := (s16aa AND (s8ab OR s8bb)) OR (s8ab AND s8bb);
s16(2) := s16bb XOR (s8ac XOR s8bc);
s16(3) := (s16bb AND (s8ac OR s8bc)) OR (s8ac AND s8bc) OR (s8ad XOR
s8bd);
s16(4) := s8ad AND s8bd;

sum <= s16;
END PROCESS;

END ARCHITECTURE beh;


(I originally came up with these before I ever started using VHDL. With
some clever application of some nested VHDL Procedures the coding could
obviously be simplified. This was just a brute-force conversion.)

Charles
 
OK, here's a much more elegant coding of the same function that I
described in my previous post. Tested it; it works. Should synthesize
to the same optimal logic. Enjoy.

-- This macro outputs a binary count of the number of input bits that
-- are active at one time. It effectively sums 16 1-bit numbers.
LIBRARY IEEE;
USE IEEE.Std_Logic_1164.all;

ENTITY sum16 IS
PORT(a : IN Std_uLogic_vector(1 TO 16);
sum : OUT Std_uLogic_vector(4 downto 0) );
END ENTITY;

ARCHITECTURE beh2 OF sum16 IS

FUNCTION sum2bits (a : std_ulogic_vector(1 to 2) ) RETURN
std_ulogic_vector IS
VARIABLE s : std_ulogic_vector(1 downto 0);
BEGIN
s(0) := a(1) XOR a(2);
s(1) := a(1) AND a(2);
RETURN s;
END FUNCTION;

FUNCTION sum4bits (a : std_ulogic_vector(1 to 4) ) RETURN
std_ulogic_vector IS
VARIABLE sa, sb : std_ulogic_vector(1 downto 0);
VARIABLE s : std_ulogic_vector(2 downto 0);
BEGIN
sa := sum2bits(a(1 to 2));
sb := sum2bits(a(3 to 4));
s(0) := sa(0) XOR sb(0);
s(1) := (sa(0) AND sb(0)) OR (sa(1) XOR sb(1));
s(2) := sa(1) AND sb(1);
RETURN s;
END FUNCTION;

FUNCTION sum8bits (a : std_ulogic_vector(1 to 8) ) RETURN
std_ulogic_vector IS
VARIABLE sa, sb : std_ulogic_vector(2 downto 0);
VARIABLE c1 : std_ulogic;
VARIABLE s : std_ulogic_vector(3 downto 0);
BEGIN
sa := sum4bits(a(1 to 4));
sb := sum4bits(a(5 to 8));
s(0) := sa(0) XOR sb(0);
c1 := sa(0) AND sb(0);
s(1) := c1 XOR (sa(1) XOR sb(1));
s(2) := (c1 AND (sa(1) OR sb(1))) OR (sa(1) AND sb(1)) OR (sa(2) XOR
sb(2));
s(3) := sa(2) AND sb(2);
RETURN s;
END FUNCTION;

FUNCTION sum16bits (a : std_ulogic_vector(1 to 16) ) RETURN
std_ulogic_vector IS
VARIABLE sa, sb : std_ulogic_vector(3 downto 0);
VARIABLE c1, c2 : std_ulogic;
VARIABLE s : std_ulogic_vector(4 downto 0);
BEGIN
sa := sum8bits(a(1 to 8));
sb := sum8bits(a(9 to 16));
s(0) := sa(0) XOR sb(0);
c1 := sa(0) AND sb(0);
s(1) := c1 XOR (sa(1) XOR sb(1));
c2 := (c1 AND (sa(1) OR sb(1))) OR (sa(1) AND sb(1));
s(2) := c2 XOR (sa(2) XOR sb(2));
s(3) := (c2 AND (sa(2) OR sb(2))) OR (sa(2) AND sb(2)) OR (sa(3) XOR
sb(3));
s(4) := sa(3) AND sb(3);
RETURN s;
END FUNCTION;

BEGIN
sum <= sum16bits(a);
END ARCHITECTURE beh2;
 

Welcome to EDABoard.com

Sponsor

Back
Top