How can I infer resource re-use in my VHDL code?

W

walala

Guest
Dear all,

I am facing a problem about resource(adder or multiplier) reuse in VHDL...

The target is to compute ZZ=XX*A' (8x8 matrix multiplication). I do matrix
multiplication row by row: using "count"(a counter increasing one at each
cycle) as row index, from 0 to 7. t1, t2, t3, t4, t5, t6, t7, t8 are
temporary variables.

It seems from the code that it needs a lot of adders/multipliers, ... how
can I infer the resource re-use in my VHDL code? Which VHDL style can I use
to get minimum adder/multiplier?

Is there anything relate to Synopsys DC?

Thank you very much,

-Walala

------------------------------------------------------------


-- A'=
-- 91 91 91 91 91 91 91 91
-- 126 106 71 25 -25 -71 -106 -126
-- 118 49 -49 -118 -118 -49 49 118
-- 106 -25 -126 -71 71 126 25 -106
-- 91 -91 -91 91 91 -91 -91 91
-- 71 -126 25 106 -106 -25 126 -71
-- 49 -118 118 -49 -49 118 -118 49
-- 25 -71 106 -126 126 -106 71 -25

if count<8 then

--ZZ(k)(0):
91*XX(k)(0)+126*XX(k)(1)+118*XX(k)(2)+106*XX(k)(3)+91*XX(k)(4)+71*XX(k)(5)+4
9*XX(k)(6)+25*XX(k)(7)
--ZZ(k)(7):
91*XX(k)(0)-126*XX(k)(1)+118*XX(k)(2)-106*XX(k)(3)+91*XX(k)(4)-71*XX(k)(5)+4
9*XX(k)(6)-25*XX(k)(7)


t1:=91*XX(count)(0)+118*XX(count)(2)+91*XX(count)(4)+49*XX(count)(6);

t2:=126*XX(count)(1)+106*XX(count)(3)+71*XX(count)(5)+25*XX(count)(7);
ZZ(count)(0)<=t1+t2;
ZZ(count)(7)<=t1-t2;



--ZZ(k)(1):
91*XX(k)(0)+106*XX(k)(1)+49*XX(k)(2)-25*XX(k)(3)-91*XX(k)(4)-126*XX(k)(5)-11
8*XX(k)(6)-71*XX(k)(7)
--ZZ(k)(6):
91*XX(k)(0)-106*XX(k)(1)+49*XX(k)(2)+25*XX(k)(3)-91*XX(k)(4)+126*XX(k)(5)-11
8*XX(k)(6)+71*XX(k)(7)

t3:=91*XX(count)(0)+49*XX(count)(2)-91*XX(count)(4)-118*XX(count)(6);

t4:=106*XX(count)(1)-25*XX(count)(3)-126*XX(count)(5)-71*XX(count)(7);
ZZ(count)(1)<=t3+t4;
ZZ(count)(6)<=t3-t4;



--ZZ(k)(2):
91*XX(k)(0)+71*XX(k)(1)-49*XX(k)(2)-126*XX(k)(3)-91*XX(k)(4)+25*XX(k)(5)+118
*XX(k)(6)+106*XX(k)(7)
--ZZ(k)(5):
91*XX(k)(0)-71*XX(k)(1)-49*XX(k)(2)+126*XX(k)(3)-91*XX(k)(4)-25*XX(k)(5)+118
*XX(k)(6)-106*XX(k)(7)

t5:=91*XX(count)(0)-49*XX(count)(2)-91*XX(count)(4)+118*XX(count)(6);

t6:=71*XX(count)(1)-126*XX(count)(3)+25*XX(count)(5)+106*XX(count)(7);
ZZ(count)(2)<=t5+t6;
ZZ(count)(5)<=t5-t6;



--ZZ(k)(3):
91*XX(k)(0)+25*XX(k)(1)-118*XX(k)(2)-71*XX(k)(3)+91*XX(k)(4)+106*XX(k)(5)-49
*XX(k)(6)-126*XX(k)(7)
--ZZ(k)(4):
91*XX(k)(0)-25*XX(k)(1)-118*XX(k)(2)+71*XX(k)(3)+91*XX(k)(4)-106*XX(k)(5)-49
*XX(k)(6)+126*XX(k)(7)

t7:=91*XX(count)(0)-118*XX(count)(2)+91*XX(count)(4)-49*XX(count)(6);

t8:=25*XX(count)(1)-71*XX(count)(3)+106*XX(count)(5)-126*XX(count)(7);
ZZ(count)(3)<=t7+t8;
ZZ(count)(4)<=t7-t8;


end if;
 
"walala" <mizhael@yahoo.com> wrote in message
news:bjrm8n$2o5$1@mozo.cc.purdue.edu...
I am facing a problem about resource(adder or multiplier) reuse in VHDL...
The target is to compute ZZ=XX*A' (8x8 matrix multiplication).
It seems from the code that it needs a lot of adders/multipliers, ... how
can I infer the resource re-use in my VHDL code?
By performing the multiply/adds sequentially. It is not very difficult
to do that under control of a state machine. An architecture with exactly
one multiplier is easy to achieve; similarly, an architecture with exactly
as many multipliers as there are columns in the multiplier matrix is also
quite straightforward. (Indeed, I think that's what you were trying
to do.) Mapping the operation on to different numbers of multipliers
is more tricky.

Which VHDL style can I use to get minimum adder/multiplier?
This has very little to do with VHDL style. It's a question of
architecture of an arithmetic operation; the problem would be exactly
the same if you were designing in schematic, Verilog or anything else
at the register-transfer level. Consider a dedicated control state
machine, or a microcoded architecture, or a general-purpose CPU,
or....

For truly minimal resources you should do everything bit-serial :)

Is there anything relate to Synopsys DC?
Do you have access to Synopsys Behavioral Compiler, or the "CoCentric"
products? Or A|RT Designer from Forte?
--

Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
The key to designing this efficiently is the use of pipelining. The latency
(or what was the word for the delay between the first input and the first
output :) ) may be great, but eventually you will get a result out on every
clock. Naturally, this answer was very general, and I don't have any
detailed info on the case...

regards,
juza

The target is to compute ZZ=XX*A' (8x8 matrix multiplication). I do matrix
multiplication row by row: using "count"(a counter increasing one at each
cycle) as row index, from 0 to 7. t1, t2, t3, t4, t5, t6, t7, t8 are
temporary variables.

It seems from the code that it needs a lot of adders/multipliers, ... how
can I infer the resource re-use in my VHDL code? Which VHDL style can I
use
to get minimum adder/multiplier?

Is there anything relate to Synopsys DC?

Thank you very much,

-Walala

------------------------------------------------------------


-- A'=
-- 91 91 91 91 91 91 91 91
-- 126 106 71 25 -25 -71 -106 -126
-- 118 49 -49 -118 -118 -49 49 118
-- 106 -25 -126 -71 71 126 25 -106
-- 91 -91 -91 91 91 -91 -91 91
-- 71 -126 25 106 -106 -25 126 -71
-- 49 -118 118 -49 -49 118 -118 49
-- 25 -71 106 -126 126 -106 71 -25

if count<8 then

--ZZ(k)(0):

91*XX(k)(0)+126*XX(k)(1)+118*XX(k)(2)+106*XX(k)(3)+91*XX(k)(4)+71*XX(k)(5)+4
9*XX(k)(6)+25*XX(k)(7)
--ZZ(k)(7):

91*XX(k)(0)-126*XX(k)(1)+118*XX(k)(2)-106*XX(k)(3)+91*XX(k)(4)-71*XX(k)(5)+4
9*XX(k)(6)-25*XX(k)(7)


t1:=91*XX(count)(0)+118*XX(count)(2)+91*XX(count)(4)+49*XX(count)(6);

t2:=126*XX(count)(1)+106*XX(count)(3)+71*XX(count)(5)+25*XX(count)(7);
ZZ(count)(0)<=t1+t2;
ZZ(count)(7)<=t1-t2;



--ZZ(k)(1):

91*XX(k)(0)+106*XX(k)(1)+49*XX(k)(2)-25*XX(k)(3)-91*XX(k)(4)-126*XX(k)(5)-11
8*XX(k)(6)-71*XX(k)(7)
--ZZ(k)(6):

91*XX(k)(0)-106*XX(k)(1)+49*XX(k)(2)+25*XX(k)(3)-91*XX(k)(4)+126*XX(k)(5)-11
8*XX(k)(6)+71*XX(k)(7)

t3:=91*XX(count)(0)+49*XX(count)(2)-91*XX(count)(4)-118*XX(count)(6);

t4:=106*XX(count)(1)-25*XX(count)(3)-126*XX(count)(5)-71*XX(count)(7);
ZZ(count)(1)<=t3+t4;
ZZ(count)(6)<=t3-t4;



--ZZ(k)(2):

91*XX(k)(0)+71*XX(k)(1)-49*XX(k)(2)-126*XX(k)(3)-91*XX(k)(4)+25*XX(k)(5)+118
*XX(k)(6)+106*XX(k)(7)
--ZZ(k)(5):

91*XX(k)(0)-71*XX(k)(1)-49*XX(k)(2)+126*XX(k)(3)-91*XX(k)(4)-25*XX(k)(5)+118
*XX(k)(6)-106*XX(k)(7)

t5:=91*XX(count)(0)-49*XX(count)(2)-91*XX(count)(4)+118*XX(count)(6);

t6:=71*XX(count)(1)-126*XX(count)(3)+25*XX(count)(5)+106*XX(count)(7);
ZZ(count)(2)<=t5+t6;
ZZ(count)(5)<=t5-t6;



--ZZ(k)(3):

91*XX(k)(0)+25*XX(k)(1)-118*XX(k)(2)-71*XX(k)(3)+91*XX(k)(4)+106*XX(k)(5)-49
*XX(k)(6)-126*XX(k)(7)
--ZZ(k)(4):

91*XX(k)(0)-25*XX(k)(1)-118*XX(k)(2)+71*XX(k)(3)+91*XX(k)(4)-106*XX(k)(5)-49
*XX(k)(6)+126*XX(k)(7)

t7:=91*XX(count)(0)-118*XX(count)(2)+91*XX(count)(4)-49*XX(count)(6);

t8:=25*XX(count)(1)-71*XX(count)(3)+106*XX(count)(5)-126*XX(count)(7);
ZZ(count)(3)<=t7+t8;
ZZ(count)(4)<=t7-t8;


end if;
 
"Jonathan Bromley" <jonathan.bromley@doulos.com> wrote
in message news:bjrued$jqt$1$8302bc10@news.demon.co.uk...

Do you have access to Synopsys Behavioral Compiler, or the "CoCentric"
products? Or A|RT Designer from Forte?
Whoops, that should be Adelante. Now part of ARM.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
Hi, Jonathan,

Thank you for your answer.

The problem is that in my original code, I am afraid that I do 8
multiplications for each "count"(row index)...

Will the Synposys DC know these multipliers can be reused since they belong
to different clock cycles? There must be some mechanism to let it know that
these multiplier can be reused here...

Otherwise, will it generate 8 multiplier/count x 8 count(there are 8
rows)=64 multipliers?

Can you help me clarify a little bit here? I tried to search the web, but
did not see any example of what you've mentioned that matrix multiplication
using one MAC? Neither do I have good reference book on my hand talking
about this topic...

By the way, I have access to Synopsys Behavior Compiler... What's up?

Thanks a lot,

-Walala

"Jonathan Bromley" <jonathan.bromley@doulos.com> wrote in message
news:bjs57k$4ks$1$830fa17d@news.demon.co.uk...
"Jonathan Bromley" <jonathan.bromley@doulos.com> wrote
in message news:bjrued$jqt$1$8302bc10@news.demon.co.uk...

Do you have access to Synopsys Behavioral Compiler, or the "CoCentric"
products? Or A|RT Designer from Forte?

Whoops, that should be Adelante. Now part of ARM.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW,
UK
Tel: +44 (0)1425 471223 mail:
jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web:
http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 

Welcome to EDABoard.com

Sponsor

Back
Top