How to RLOC adders in VHDL/Synplify to avoid broken carry ch

K

Ken

Guest
Hello folks,

I am implementing a filter on a -5 Virtex-II part (3000) and the critical
path is one of the longest adder carry chains in the design (28 bits).

I have noticed that the minimum period of my design is being clobbered by
the carry chain of the longest adder changing CLB column half-way through
instead of carrying on up the carry chain in the column it started in...

So, I would like to be able to put in my VHDL code an RLOC constraint (or
something) that would inform Synplify Pro not to do any clever optimisation
that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one
column (an old Ray Andraka post has led me to believe this is what is
happening).

Googling about has yielded some discussions on it but I cannot see exactly
how I would specify this in my VHDL to ensure that the carry chain remains
in one column.

Can someone give me some pointers please (ideally a quick code snippet to
demonstrate :) )?

Thanks in advance for your time,

Ken



--
To reply by email, please remove the _MENOWANTSPAM from my email address.
 
"Ken" <aeu96186_MENOWANTSPAM@yahoo.co.uk> wrote in message
news:bpg5k3$4a$1@dennis.cc.strath.ac.uk...
Hello folks,

I am implementing a filter on a -5 Virtex-II part (3000) and the critical
path is one of the longest adder carry chains in the design (28 bits).

I have noticed that the minimum period of my design is being clobbered by
the carry chain of the longest adder changing CLB column half-way through
instead of carrying on up the carry chain in the column it started in...

So, I would like to be able to put in my VHDL code an RLOC constraint (or
something) that would inform Synplify Pro not to do any clever
optimisation
that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one
column (an old Ray Andraka post has led me to believe this is what is
happening).

Googling about has yielded some discussions on it but I cannot see exactly
how I would specify this in my VHDL to ensure that the carry chain remains
in one column.

Can someone give me some pointers please (ideally a quick code snippet to
demonstrate :) )?

Thanks in advance for your time,

Ken


Ken,
Have you read the constraints guide in the Xilinx software manuals? Look
for the RLOC section. You end up with stuff in your UCF like :-

INST "*un6_burp_cry_0" RLOC = "X6Y4";
INST "*un6_burp_cry_1" RLOC = "X6Y4";
INST "*un6_burp_cry_2" RLOC = "X6Y5";
INST "*un6_burp_cry_3" RLOC = "X6Y5";
INST "*un6_burp_cry_4" RLOC = "X6Y8";
etc...

I used the floorplanner to get the names of things I want to RLOC. For
your problem, you could place the carry chain with floorplanner and send the
output to a temporary UCF to give you a start on your RLOC stuff. Hope that
makes sense! Read about H_SETs, HU_SETs and U_SETs too.
good luck, Syms.
 
Hi

"Symon" <symon_brewer@hotmail.com> escribió en el mensaje
news:bpg87l$1nmpu4$1@ID-212844.news.uni-berlin.de...
"Ken" <aeu96186_MENOWANTSPAM@yahoo.co.uk> wrote in message
news:bpg5k3$4a$1@dennis.cc.strath.ac.uk...

Hello folks,

I am implementing a filter on a -5 Virtex-II part (3000) and the
critical
path is one of the longest adder carry chains in the design (28 bits).

I have noticed that the minimum period of my design is being clobbered
by
the carry chain of the longest adder changing CLB column half-way
through
instead of carrying on up the carry chain in the column it started in...

So, I would like to be able to put in my VHDL code an RLOC constraint
(or
something) that would inform Synplify Pro not to do any clever
optimisation
that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one
column (an old Ray Andraka post has led me to believe this is what is
happening).

Googling about has yielded some discussions on it but I cannot see
exactly
how I would specify this in my VHDL to ensure that the carry chain
remains
in one column.

Can someone give me some pointers please (ideally a quick code snippet
to
demonstrate :) )?

Thanks in advance for your time,

Ken


Ken,
Have you read the constraints guide in the Xilinx software manuals?
Look
for the RLOC section. You end up with stuff in your UCF like :-

INST "*un6_burp_cry_0" RLOC = "X6Y4";
INST "*un6_burp_cry_1" RLOC = "X6Y4";
INST "*un6_burp_cry_2" RLOC = "X6Y5";
INST "*un6_burp_cry_3" RLOC = "X6Y5";
INST "*un6_burp_cry_4" RLOC = "X6Y8";
etc...

I used the floorplanner to get the names of things I want to RLOC. For
your problem, you could place the carry chain with floorplanner and send
the
output to a temporary UCF to give you a start on your RLOC stuff. Hope
that
makes sense! Read about H_SETs, HU_SETs and U_SETs too.
good luck, Syms.
I prefer the way used in one of Xilinx's TechXclusives to embed RLOC
attributes
directly in VHDL (Relationally Placed Macros). Here's an example of a RPM to
perform a registered a + b, using
the carry chain using the U_SET attribute.


-- begin VHDL code
library ieee;
use ieee.std_logic_1164.all;
library unisim;
use unisim.vcomponents.all;
use work.rlocs.all;

entity a_plus_b_reg is
generic (width: integer := 32; setn: integer := 1);
port (
clock : IN std_logic;
enable : IN std_logic;
a : IN std_logic_vector (width-1 downto 0);
b : IN std_logic_vector (width-1 downto 0);
q : OUT std_logic_vector (width-1 downto 0)
);
end a_plus_b_reg;

architecture rpm_arch of a_plus_b_reg is

attribute INIT: string;
attribute BEL: string;
attribute RLOC: string;
attribute U_SET: string;

signal prexor_int_q: std_logic_vector (width-1 downto 0);
signal int_carry: std_logic_vector (width-1 downto 0);
signal y: std_logic_vector (width-1 downto 0);

begin

int_carry(0) <= '0';

reg: for i in 0 to width-1 generate
attribute U_SET of q_reg: label is "uset" & integer'image(setn);
attribute RLOC of q_reg: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute BEL of q_reg: label is "FF" & belname_xy(i);
begin
q_reg: FDE port map (
D => y(i), CE => enable, C => clock,
Q => q(i));
end generate;

gena: for i in 0 to width-2 generate
attribute INIT of q_lut: label is "6";
attribute U_SET of q_lut: label is "uset" & integer'image(setn);
attribute U_SET of q_mxy: label is "uset" & integer'image(setn);
attribute U_SET of q_xor: label is "uset" & integer'image(setn);
attribute RLOC of q_lut: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute RLOC of q_mxy: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute RLOC of q_xor: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute BEL of q_lut: label is belname_fg(i);
attribute BEL of q_xor: label is "XOR" & belname_fg(i);
begin
q_lut: LUT2
--synthesis off
generic map (INIT => x"6")
--synthesis on
port map (
I1 => b(i), I0 => a(i),
O => prexor_int_q(i) );
q_mxy: MUXCY port map (
DI => a(i), CI => int_carry(i), S => prexor_int_q(i),
O => int_carry(i+1) );
q_xor: XORCY port map (
LI => prexor_int_q(i), CI => int_carry(i),
O => y(i) );
end generate;

genb: for i in width-1 to width-1 generate
attribute INIT of q_lut: label is "6";
attribute U_SET of q_lut: label is "uset" & integer'image(setn);
attribute U_SET of q_xor: label is "uset" & integer'image(setn);
attribute RLOC of q_lut: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute RLOC of q_xor: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute BEL of q_lut: label is belname_fg(i);
attribute BEL of q_xor: label is "XOR" & belname_fg(i);
begin
q_lut: LUT2
--synthesis off
generic map (INIT => x"6")
--synthesis on
port map (
I1 => b(i), I0 => a(i),
O => prexor_int_q(i) );
q_xor: XORCY port map (
LI => prexor_int_q(i), CI => int_carry(i),
O => y(i) );
end generate;

end rpm_arch;
-- end VHDL code


The resulting RPM is a column of 1 x w/2 slices, being w the value assigned
to the width the generic
The setn generic lets you create different U_SET names for different
instances of the entity (if the instances
have no relative positions) or the same U_SET name and applying different
RLOCs to each instance
(if the instances have relative positions).

The rlocs package contains a couple of simple functions to return the
strings "F" or "G"
or the couple "X" or "Y", to differentiate the luts/ffs inside a single
slice. Read the constraints guide
about RLOC, RLOC_ORIGIN and the different kinds of sets you can create. And
the RPM techxcluvise,
of course.

If you prefer the placer to select the absolute positioning of the RPM, then
that's all you need.
If you want total control, then you can select the RPM position attaching an
RLOC_ORIGIN
to the U_SET name in the UCF file.

I've successfully used this entity on the virtex2 architecture & XST. Don't
know how to tell
Synplify Pro to attach those attributes, but it shouldn't be that difficult.

The drawback is your design is no longer portable. You're stuck with Xilinx
parts that use the XY
coordinate system (not all of them). But you can create different versions
for different architectures, of course.

Best regards

Francisco Rodriguez
 
Ken,
Have you read the constraints guide in the Xilinx software manuals?
Look
for the RLOC section. You end up with stuff in your UCF like :-

INST "*un6_burp_cry_0" RLOC = "X6Y4";
INST "*un6_burp_cry_1" RLOC = "X6Y4";
INST "*un6_burp_cry_2" RLOC = "X6Y5";
INST "*un6_burp_cry_3" RLOC = "X6Y5";
INST "*un6_burp_cry_4" RLOC = "X6Y8";
etc...

I used the floorplanner to get the names of things I want to RLOC. For
your problem, you could place the carry chain with floorplanner and send
the
output to a temporary UCF to give you a start on your RLOC stuff. Hope
that
makes sense! Read about H_SETs, HU_SETs and U_SETs too.
good luck, Syms.

Syms,

Thanks for the reply,

I am familiar with the Xilinx contraints guide but I would like to put the
constraint in the VHDL rather than the ucf and I do not want to make it
Xilinx specific.

An adder is such a simple thing and the device has specific wires to
implement it quickly - surely there must be a way to inform the tools to use
the carry chain in one column only for max speed?

Cheers,

Ken
 
Ken wrote:
I am familiar with the Xilinx contraints guide but I would like to put the
constraint in the VHDL rather than the ucf and I do not want to make it
Xilinx specific.
If you are using RLOC's, aren't you making it Xilinx specific?

Not only that, are RLOC's guaranteed to even be the same from one Xilinx
family to another Xilinx family?

An adder is such a simple thing and the device has specific wires to
implement it quickly - surely there must be a way to inform the tools to use
the carry chain in one column only for max speed?
I'm sure you've already thought of this, but can you not break the adder
up?

Good luck,

Marc
 
If you are using RLOC's, aren't you making it Xilinx specific?

Not only that, are RLOC's guaranteed to even be the same from one Xilinx
family to another Xilinx family?
I would rather not use RLOCs - I just want to inform the tools that using
the carry chain in one column is more important than any fancy optimisations
that save a few slices but cause the fast carry chain to broken.

An adder is such a simple thing and the device has specific wires to
implement it quickly - surely there must be a way to inform the tools to
use
the carry chain in one column only for max speed?

I'm sure you've already thought of this, but can you not break the adder
up?
Quite possibly but that would be a pain in the neck.

I just don't see why this should be difficult.

Cheers,

Ken
 
Francisco,

Many thanks for your detailed response and the code.

If I go down the road of abandoning trying to get synthesis to accomplish
this then I will certainly be referring to your implementation.

Cheers,

Ken



Ken,
Have you read the constraints guide in the Xilinx software manuals?
Look
for the RLOC section. You end up with stuff in your UCF like :-

INST "*un6_burp_cry_0" RLOC = "X6Y4";
INST "*un6_burp_cry_1" RLOC = "X6Y4";
INST "*un6_burp_cry_2" RLOC = "X6Y5";
INST "*un6_burp_cry_3" RLOC = "X6Y5";
INST "*un6_burp_cry_4" RLOC = "X6Y8";
etc...

I used the floorplanner to get the names of things I want to RLOC.
For
your problem, you could place the carry chain with floorplanner and send
the
output to a temporary UCF to give you a start on your RLOC stuff. Hope
that
makes sense! Read about H_SETs, HU_SETs and U_SETs too.
good luck, Syms.


I prefer the way used in one of Xilinx's TechXclusives to embed RLOC
attributes
directly in VHDL (Relationally Placed Macros). Here's an example of a RPM
to
perform a registered a + b, using
the carry chain using the U_SET attribute.


-- begin VHDL code
library ieee;
use ieee.std_logic_1164.all;
library unisim;
use unisim.vcomponents.all;
use work.rlocs.all;

entity a_plus_b_reg is
generic (width: integer := 32; setn: integer := 1);
port (
clock : IN std_logic;
enable : IN std_logic;
a : IN std_logic_vector (width-1 downto 0);
b : IN std_logic_vector (width-1 downto 0);
q : OUT std_logic_vector (width-1 downto 0)
);
end a_plus_b_reg;

architecture rpm_arch of a_plus_b_reg is

attribute INIT: string;
attribute BEL: string;
attribute RLOC: string;
attribute U_SET: string;

signal prexor_int_q: std_logic_vector (width-1 downto 0);
signal int_carry: std_logic_vector (width-1 downto 0);
signal y: std_logic_vector (width-1 downto 0);

begin

int_carry(0) <= '0';

reg: for i in 0 to width-1 generate
attribute U_SET of q_reg: label is "uset" & integer'image(setn);
attribute RLOC of q_reg: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute BEL of q_reg: label is "FF" & belname_xy(i);
begin
q_reg: FDE port map (
D => y(i), CE => enable, C => clock,
Q => q(i));
end generate;

gena: for i in 0 to width-2 generate
attribute INIT of q_lut: label is "6";
attribute U_SET of q_lut: label is "uset" & integer'image(setn);
attribute U_SET of q_mxy: label is "uset" & integer'image(setn);
attribute U_SET of q_xor: label is "uset" & integer'image(setn);
attribute RLOC of q_lut: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute RLOC of q_mxy: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute RLOC of q_xor: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute BEL of q_lut: label is belname_fg(i);
attribute BEL of q_xor: label is "XOR" & belname_fg(i);
begin
q_lut: LUT2
--synthesis off
generic map (INIT => x"6")
--synthesis on
port map (
I1 => b(i), I0 => a(i),
O => prexor_int_q(i) );
q_mxy: MUXCY port map (
DI => a(i), CI => int_carry(i), S => prexor_int_q(i),
O => int_carry(i+1) );
q_xor: XORCY port map (
LI => prexor_int_q(i), CI => int_carry(i),
O => y(i) );
end generate;

genb: for i in width-1 to width-1 generate
attribute INIT of q_lut: label is "6";
attribute U_SET of q_lut: label is "uset" & integer'image(setn);
attribute U_SET of q_xor: label is "uset" & integer'image(setn);
attribute RLOC of q_lut: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute RLOC of q_xor: label is "X0" & "Y" &
integer'image(integer(i/2));
attribute BEL of q_lut: label is belname_fg(i);
attribute BEL of q_xor: label is "XOR" & belname_fg(i);
begin
q_lut: LUT2
--synthesis off
generic map (INIT => x"6")
--synthesis on
port map (
I1 => b(i), I0 => a(i),
O => prexor_int_q(i) );
q_xor: XORCY port map (
LI => prexor_int_q(i), CI => int_carry(i),
O => y(i) );
end generate;

end rpm_arch;
-- end VHDL code


The resulting RPM is a column of 1 x w/2 slices, being w the value
assigned
to the width the generic
The setn generic lets you create different U_SET names for different
instances of the entity (if the instances
have no relative positions) or the same U_SET name and applying different
RLOCs to each instance
(if the instances have relative positions).

The rlocs package contains a couple of simple functions to return the
strings "F" or "G"
or the couple "X" or "Y", to differentiate the luts/ffs inside a single
slice. Read the constraints guide
about RLOC, RLOC_ORIGIN and the different kinds of sets you can create.
And
the RPM techxcluvise,
of course.

If you prefer the placer to select the absolute positioning of the RPM,
then
that's all you need.
If you want total control, then you can select the RPM position attaching
an
RLOC_ORIGIN
to the U_SET name in the UCF file.

I've successfully used this entity on the virtex2 architecture & XST.
Don't
know how to tell
Synplify Pro to attach those attributes, but it shouldn't be that
difficult.

The drawback is your design is no longer portable. You're stuck with
Xilinx
parts that use the XY
coordinate system (not all of them). But you can create different versions
for different architectures, of course.

Best regards

Francisco Rodriguez
 
Ken wrote:

If you are using RLOC's, aren't you making it Xilinx specific?

Not only that, are RLOC's guaranteed to even be the same from one Xilinx
family to another Xilinx family?


I would rather not use RLOCs - I just want to inform the tools that using
the carry chain in one column is more important than any fancy optimisations
that save a few slices but cause the fast carry chain to broken.
I agree - if the FPGA supports it, there is no reason the synthesis tool
shouldn't. I'd talk with the synthesis vendors about it if I were you.
Synplicity seems quite responsive.

Or perhaps you could get the synthesis tool to do what you are wanting
by placing a tiny period constraint on that portion of the design,
thereby forcing the tool will do everything in its power to make it
absolutely as fast as possible.

Marc
 
Marc,

I agree - if the FPGA supports it, there is no reason the synthesis tool
shouldn't. I'd talk with the synthesis vendors about it if I were you.
Synplicity seems quite responsive.
I have emailed Synplicity support - they have been very good in the past and
I expect they will be on this too.

Or perhaps you could get the synthesis tool to do what you are wanting
by placing a tiny period constraint on that portion of the design,
thereby forcing the tool will do everything in its power to make it
absolutely as fast as possible.
Probrably could - but, the problem would then fall to another adder that is
1 microsecond behind the one just fixed.

In a design with many adders, I think global control is needed to force use
of the carry chains in one column.

Cheers,

Ken
 
Ken wrote:
An adder is such a simple thing and the device has specific wires
to implement it quickly - surely there must be a way to inform the
tools to use the carry chain in one column only for max speed?

If you don't want to RLOC the primitives, perhaps the next best thing
to try is to put a syn_keep attribute on the input operand signals of
the adder; if it is in fact a logic optimization that is causing an
irregularity which breaks the carry chain placement, that will usually
put a stop to it.

If one of the operands is a constant, that can often cause this sort of
problem; you'll need to assign the constant to a signal having a syn_keep
rather than placing the syn_keep on the constant itself. (at least you used
to need to do that, I haven't used Synplify since last year)

If this is a counter, also note that Synplify has some hardcoded
internal thresholds below which it will implement random logic instead
of carry chain logic, which can cause similar problems for short counters.

Brian
 
If you don't want to RLOC the primitives, perhaps the next best thing
to try is to put a syn_keep attribute on the input operand signals of
the adder; if it is in fact a logic optimization that is causing an
irregularity which breaks the carry chain placement, that will usually
put a stop to it.

If one of the operands is a constant, that can often cause this sort of
problem; you'll need to assign the constant to a signal having a syn_keep
rather than placing the syn_keep on the constant itself. (at least you
used
to need to do that, I haven't used Synplify since last year)

If this is a counter, also note that Synplify has some hardcoded
internal thresholds below which it will implement random logic instead
of carry chain logic, which can cause similar problems for short counters.
Brian,

Thanks for the reply.

I have tried putting syn_keeps (and syn_preserves) on the adder in question
and then on all signals in the design - the Synplify logfile indicates no
replication/pruning has taken place but I still see Synplify reporting a
broken chain in the worst path report.

Bit annoying this.

Cheers,

Ken
 
Ken wrote:
the Synplify logfile indicates no replication/pruning has taken place
but I still see Synplify reporting a broken chain in the worst path report.
Can you post the offending fragment of code showing the adder and
input operands, and the broken chain report message that results?

some other suggestions:

- drop your synthesis target clock to 1 MHz and see if the
broken chain problem goes away

- put the offending adder in its' own module with the
leave-my-module-hierarchy-alone attribute set
( you can then stick an area constraint on that module to force
it to stay in one column, but if the chain's already broken
that won't help too much )

- do the HDL-Analyst RTL/primitive views of the offending adder
show something odd splitting it apart near that bit?

Brian
 
Brian is correct, you sometimes need to put syn_keeps on signals going into the
adder to keep it from optimizing segments of the carry chain. This typically
happens when one or more of the adder input bits are constants. The syn keep has
to be on the signal coming in, not on the adder:

attribute syn_keep of kept_a:true;
attribute syn_keep of kept_b:true;


kept_a<= a;
kept_b<= b;

sum_d<= kept_a + kept_b;

if sum_d is not going immediately to a register, then you may also need put a
syn_keep on sum_d.

In order to put RLOCs on the adder, you need to build the adder out of
instantiated primitives. In that case you may still need some syn_keeps on
either the LUT outputs or the carry signal to keep synplify from messing with
it, depending on the version of synplify. v7.3.3 requires a syn_keep on the
carry if you have a constant '1' going into the adder's carry in. An earlier
version, I think it was 5.3 needed a syn keep on the lut outputs, but 7.3.x need
the syn_keep on the LUT outputs taken off or it will insert an extra lut between
the LUT and the carry chain.



Brian Davis wrote:

Ken wrote:
the Synplify logfile indicates no replication/pruning has taken place
but I still see Synplify reporting a broken chain in the worst path report.

Can you post the offending fragment of code showing the adder and
input operands, and the broken chain report message that results?

some other suggestions:

- drop your synthesis target clock to 1 MHz and see if the
broken chain problem goes away

- put the offending adder in its' own module with the
leave-my-module-hierarchy-alone attribute set
( you can then stick an area constraint on that module to force
it to stay in one column, but if the chain's already broken
that won't help too much )

- do the HDL-Analyst RTL/primitive views of the offending adder
show something odd splitting it apart near that bit?

Brian
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 

Welcome to EDABoard.com

Sponsor

Back
Top