mac design in vhdl

Guest
hi..
I want to design a MAC(multiply-accumulator).I have written the
following code.The problem is ,when I do place & route in Xilinx ISE
7.1version,I get too many timing errors(around 30).My clock is 70Mhz.

entity mac is
generic(
input_width1 : integer:= 16;
input_width2 : integer:= 16;
output_width : integer := 36;
mac_cycle_width : integer := 4
);
port (
RESET : IN STD_LOGIC;
CLK : IN STD_LOGIC;
FD : IN STD_LOGIC;
ND : IN STD_LOGIC;
A : IN STD_LOGIC_VECTOR(input_width1-1 DOWNTO 0);
B : IN STD_LOGIC_VECTOR(input_width2-1 DOWNTO 0);
Q : OUT STD_LOGIC_VECTOR(output_width-1 DOWNTO 0);
RDY : OUT STD_LOGIC
);
end entity mac;

architecture rtl of mac is

signal cycle : STD_LOGIC_VECTOR(mac_cycle_width-1 DOWNTO 0);
signal rdy1 : std_logic;
signal sum : STD_LOGIC_VECTOR (output_width-1 DOWNTO 0);
signal temp2 : STD_LOGIC_VECTOR (output_width-1 DOWNTO 0);
signal prod : STD_LOGIC_VECTOR (input_width1 + input_width2 -1
DOWNTO 0);

begin

-- cycle determines the no of mac accumulations
-- fd indicates the start of new accumulation
process(reset,clk)
begin
if reset = '1' then
cycle <= (others =>'0');
elsif(clk'event and clk = '1')then
if ( fd = '1')then
cycle
<=conv_std_logic_vector((1),cycle'length);--"0001";--conv_std_logic_vector((1),cycle'length);
--;
else
cycle <= cycle +'1';
end if;
end if;
end process;

-- ND indicates that the new data is ready at the input.
-- the 2 inputs are multiplied and the product is added
-- to the previous accumulator result .
-- In the last cycle the accumulator result is given out and at
-- same time the accumulator is reset to zeros.
-- here the accumulator is variable temp1.
-- SUM holds the final accumulated result and temp2 holds the
-- intermediate results.
process(reset,clk)
variable temp1 : STD_LOGIC_VECTOR (output_width-1 DOWNTO 0);
begin
if(reset ='1')then
sum <= (others => '0');
prod <= (others => '0');
temp1 := (others => '0');
elsif( clk'event and clk ='0')then
if (nd ='1')then
prod <= A * (B);
temp1 := temp1 + prod;
if( cycle=conv_std_logic_vector((0),cycle'length))then
--"0000"--conv_std_logic_vector((0),cycle'length)
sum <= temp1;
temp1 := (others => '0');
end if;
end if;
end if;
temp2 <= temp1;
end process;


-- here Q indicates the accumulator output during all cycles
-- SUM holds the final accumulated result and temp2 holds the
-- intermediate results. both are combined to form Q.
process(clk)
begin
if( clk'event and clk ='1')then
if (
cycle=conv_std_logic_vector((0),cycle'length))then--"0000"--conv_std_logic_vector((0),cycle'length)
q <= sum;
else
q <= temp2;
end if;
end if;
end process;

--q <= sum;

-- At the end of MAC cycle rdy is generated to indicated
-- that the MAC output is ready.
process(reset,clk)
begin
if(reset ='1')then
rdy1 <= '0';
elsif( clk'event and clk ='1')then -- ori '1'
if( cycle=conv_std_logic_vector((0),cycle'length))then
--"0000"conv_std_logic_vector((0),cycle'length)
rdy1 <= '1';
else
rdy1 <= '0';
end if;
end if;
end process;

rdy <= rdy1;

end rtl;
 
hi all..
is there any wrong in this design?. fuctionally when I tested on
modelsim ,correct results were produced.Please help..


sksaras@hotmail.com wrote:
hi..
I want to design a MAC(multiply-accumulator).I have written the
following code.The problem is ,when I do place & route in Xilinx ISE
7.1version,I get too many timing errors(around 30).My clock is 70Mhz.

entity mac is
generic(
input_width1 : integer:= 16;
input_width2 : integer:= 16;
output_width : integer := 36;
mac_cycle_width : integer := 4
);
port (
RESET : IN STD_LOGIC;
CLK : IN STD_LOGIC;
FD : IN STD_LOGIC;
ND : IN STD_LOGIC;
A : IN STD_LOGIC_VECTOR(input_width1-1 DOWNTO 0);
B : IN STD_LOGIC_VECTOR(input_width2-1 DOWNTO 0);
Q : OUT STD_LOGIC_VECTOR(output_width-1 DOWNTO 0);
RDY : OUT STD_LOGIC
);
end entity mac;

architecture rtl of mac is

signal cycle : STD_LOGIC_VECTOR(mac_cycle_width-1 DOWNTO 0);
signal rdy1 : std_logic;
signal sum : STD_LOGIC_VECTOR (output_width-1 DOWNTO 0);
signal temp2 : STD_LOGIC_VECTOR (output_width-1 DOWNTO 0);
signal prod : STD_LOGIC_VECTOR (input_width1 + input_width2 -1
DOWNTO 0);

begin

-- cycle determines the no of mac accumulations
-- fd indicates the start of new accumulation
process(reset,clk)
begin
if reset = '1' then
cycle <= (others =>'0');
elsif(clk'event and clk = '1')then
if ( fd = '1')then
cycle
=conv_std_logic_vector((1),cycle'length);--"0001";--conv_std_logic_vector((1),cycle'length);
--;
else
cycle <= cycle +'1';
end if;
end if;
end process;

-- ND indicates that the new data is ready at the input.
-- the 2 inputs are multiplied and the product is added
-- to the previous accumulator result .
-- In the last cycle the accumulator result is given out and at
-- same time the accumulator is reset to zeros.
-- here the accumulator is variable temp1.
-- SUM holds the final accumulated result and temp2 holds the
-- intermediate results.
process(reset,clk)
variable temp1 : STD_LOGIC_VECTOR (output_width-1 DOWNTO 0);
begin
if(reset ='1')then
sum <= (others => '0');
prod <= (others => '0');
temp1 := (others => '0');
elsif( clk'event and clk ='0')then
if (nd ='1')then
prod <= A * (B);
temp1 := temp1 + prod;
if( cycle=conv_std_logic_vector((0),cycle'length))then
--"0000"--conv_std_logic_vector((0),cycle'length)
sum <= temp1;
temp1 := (others => '0');
end if;
end if;
end if;
temp2 <= temp1;
end process;


-- here Q indicates the accumulator output during all cycles
-- SUM holds the final accumulated result and temp2 holds the
-- intermediate results. both are combined to form Q.
process(clk)
begin
if( clk'event and clk ='1')then
if (
cycle=conv_std_logic_vector((0),cycle'length))then--"0000"--conv_std_logic_vector((0),cycle'length)
q <= sum;
else
q <= temp2;
end if;
end if;
end process;

--q <= sum;

-- At the end of MAC cycle rdy is generated to indicated
-- that the MAC output is ready.
process(reset,clk)
begin
if(reset ='1')then
rdy1 <= '0';
elsif( clk'event and clk ='1')then -- ori '1'
if( cycle=conv_std_logic_vector((0),cycle'length))then
--"0000"conv_std_logic_vector((0),cycle'length)
rdy1 <= '1';
else
rdy1 <= '0';
end if;
end if;
end process;

rdy <= rdy1;

end rtl;
 
sksaras@hotmail.com wrote:
hi all..
is there any wrong in this design?. fuctionally when I tested on
modelsim ,correct results were produced.Please help..


sksaras@hotmail.com wrote:
hi..
I want to design a MAC(multiply-accumulator).I have written the
following code.The problem is ,when I do place & route in Xilinx ISE
7.1version,I get too many timing errors(around 30).My clock is 70Mhz.
Have you taken a look at the timing report (*.twr) to see what paths are
failing? I assume the it is the multiply that is failing. Are you using
a chip with built in hardware multipliers? Are they being used?
 
hi..thanks a lot.
You are right.when I saw the timing analyzer (post -place & route
static timing analyzer)most errors are in the multiply route.
Now I have replaced multiply by xilinx multiplier core and the errors
are reduced to 4 but the slices occupancy has increased a lot.
Also I am not supposed to use any cores.Please suggest me any other
ways to design the MAC.Or can I modify the above code (opitmise) so
that errors are reduced?.

Duane Clark wrote:
sksaras@hotmail.com wrote:
hi all..
is there any wrong in this design?. fuctionally when I tested on
modelsim ,correct results were produced.Please help..


sksaras@hotmail.com wrote:
hi..
I want to design a MAC(multiply-accumulator).I have written the
following code.The problem is ,when I do place & route in Xilinx ISE
7.1version,I get too many timing errors(around 30).My clock is 70Mhz.


Have you taken a look at the timing report (*.twr) to see what paths are
failing? I assume the it is the multiply that is failing. Are you using
a chip with built in hardware multipliers? Are they being used?
 
sksaras@hotmail.com wrote:
hi..thanks a lot.
You are right.when I saw the timing analyzer (post -place & route
static timing analyzer)most errors are in the multiply route.
Now I have replaced multiply by xilinx multiplier core and the errors
are reduced to 4 but the slices occupancy has increased a lot.
Also I am not supposed to use any cores.Please suggest me any other
ways to design the MAC.Or can I modify the above code (opitmise) so
that errors are reduced?.
What chip are you targeting? The Virtex2 and later chips have built in
hardware multipliers that consume no slices. Is this a homework problem?

Again, you need to look at the timing report, see what paths are
breaking, and then determine what to do to fix them.
 
hi..I am using spartan3 200k chip.I know that this chip has 12
dedicated multipliers.If I use ' * ' to multiply ,will these
multipliers be used ? If it uses these multipliers then why is the
slice occupancy increased?

Also when I use multiplier xilinx core, then it must use these built-in
multipliers only to implement.hence it should occupy no or less
slices.But there is increase in slices number.why?

Again, you need to look at the timing report, see what paths are
breaking, and then determine what to do to fix them.
I have seen the timing report.The problem is in the multiply only.But
how do I fix it ?.Please suggest.



Duane Clark wrote:
sksaras@hotmail.com wrote:
hi..thanks a lot.
You are right.when I saw the timing analyzer (post -place & route
static timing analyzer)most errors are in the multiply route.
Now I have replaced multiply by xilinx multiplier core and the errors
are reduced to 4 but the slices occupancy has increased a lot.
Also I am not supposed to use any cores.Please suggest me any other
ways to design the MAC.Or can I modify the above code (opitmise) so
that errors are reduced?.

What chip are you targeting? The Virtex2 and later chips have built in
hardware multipliers that consume no slices. Is this a homework problem?

Again, you need to look at the timing report, see what paths are
breaking, and then determine what to do to fix them.
 
On 6 Sep 2006 00:13:50 -0700, sksaras@hotmail.com wrote:

+++hi..I am using spartan3 200k chip.I know that this chip has 12
+++dedicated multipliers.If I use ' * ' to multiply ,will these
+++multipliers be used ? If it uses these multipliers then why is the
+++slice occupancy increased?
************

Download XAPP467 from Xilinx web page and that will tell you what you
need to know how to use the dedicated multipliers in the Spartan3
series of devices. You can set the process constraints so that XST
will infer use of a dedicated multiplier. Thus A * B will use a
dedicated multiplier and not one comprised of LUTs. There is example
code also that you can look at an adapt to your need.

You should make use of these application notes as they can be helpful
in understanding how the logic works internally.

james
 

Welcome to EDABoard.com

Sponsor

Back
Top