problem in optimization of vhdl code

A

ashu

Guest
hi

my code given below is taking too much of (600) logic cells t& max
opreating freq is 7 MHZ i think which is very less,
can u plz suggest me some remedy........


library ieee ;
use ieee.std_logic_1164.all ;
use ieee.std_logic_arith.all ;


entity test is

port ( clk,sta_in : in std_logic ;
sta_out : out std_logic ;
sel : in bit_vector ( 2 downto 0 ) ;
data_in : in integer range -2047 to 2047 ;
data_out : out integer range -127 to 127 );

end test ;

architecture a of test is


begin

process( clk,sta_in )

variable mul : integer range -65535 to 65535 ;
variable mul1 : integer range -127 to 127 ;
variable s : std_logic_vector(0 to 16 ) ;
variable b,b1 : bit_vector ( 0 to 16 ) ;
variable b2 : bit_vector ( 0 to 7 ) ;
variable s2 : std_logic_vector ( 0 to 7 ) ;

begin


if ( clk 'event and clk = '1' ) then

if ( sta_in = '1' ) then

mul := 32 * data_in ;

case sel is

when "000" =>

mul := mul ;

when "001" =>
mul := mul / 5 ;

mul := mul * 4 ;
when "010" =>

mul := mul / 5 ;
mul := mul * 3 ;

when "011" =>

mul := mul / 2 ;

when "100" =>

mul := mul / 5 ;
mul := mul * 2 ;

when "101" =>

mul := mul / 5 ;

when "110" =>

mul := mul / 10 ;

when "111" =>

mul := mul / 20 ;

end case ;

s := conv_std_logic_vector( mul ,17 ) ;

b := to_bitvector(s) ;

b1 := b srl 9 ;
s := to_stdlogicvector( b1 ) ;

mul1 := conv_integer ( signed(s) ) ;

if ( b (8)= '1' ) then


mul1 := mul1 + 1 ;

else

mul1 := mul1 ;

end if ;

dat_out <= mul1 ;

sta_out <= sta_in ;

else


sta_out <= '0' ;


end if ;
end if ;
end process ;
end a ;



code is working logically its is producing the required outputs but
timing analyzer tool is showing max delay of 130ns due to which max
freq is limited to 7 mhz
should i try some other conversion functions etc.......plz let me know
about that


ashwani anand
 
Probably has to do with your dividers. I see the following at least

mul := mul / 5 ;
mul := mul / 10 ;
mul := mul / 20 ;

These are 'expensive' in terms of logic resources and performance to
implement. Can be done obviously but doesn't mean you don't pay a price.
Consider looking into the lpm_divide component and see if that will work
your application. It implements a divider the tradeoff being that it takes
several clock cycles for the output to become valid but what you get is
something that will work at a much higher clock frequency.

KJ
 
In addition to KJ's suggestions...

Examine the tradeoffs between various architectures and verify what the
tools are doing- e.g. with the dividers, you can use one divider with a
selectable divide by value or multiple dividers with fixed divide by
values, try both to see which one works better.

With the division, you can replace that with a multiplication and a
shift, e.g. to divide by 5, multiply by 2^N/5, then shift by N (larger
values of N will give more accurate results).

Most of the multiplications are by powers of 2, try shifting instead.

The multiplication by 3 could be replaced by two additions, which could
either be two adders or one pipelined adder.

Good luck!
 
ashu wrote:

my code given below is taking too much of (600) logic cells t& max
opreating freq is 7 MHZ i think which is very less,
can u plz suggest me some remedy........
Put your design on an RTL viewer and you
will see the problem. Five large blocks
of combinational logic between the
input pins and the output register.

I will assume your device has no dsp blocks.
To increase Fmax you can infer registers
after each block of logic. For example,
using the variable s before you define will give you
one pipe register:

end case ;
-- s := conv_std_logic_vector( mul ,17 ) ;
b := to_bitvector(s) ;
s := conv_std_logic_vector( mul ,17 ) ;

But don't reuse the identifier s, declare s1.
b1 := b srl 9 ;
s := to_stdlogicvector( b1 ) ;
^ ------------don't reuse s, declare s1
mul1 := conv_integer ( signed(s) ) ;
^-----s1

Inferring registers costs you nothing here
as they are presently bypassed in your
combinational blocks.

To make this design easier to understand,
consider changing to signed and unsigned
vectors of generic widths and ieee.numeric.std
functions.

-- Mike Treseler
 
HI Ashu,

Instead of performing a large multiplication, its better to implement
multiplication and division block as component.
Make multiplication and division block with small data value(say for 4
* 4 bits),
then instantiate tht blocks in ur code call them as per ur requirement.
it will reduce ur logic cells and will be easy to synthesize.

Regards
GJ
 

Welcome to EDABoard.com

Sponsor

Back
Top