Digesting runs of ones or zeros "well"

hmurray@suespammers.org (Hal Murray) wrote in message news:<vnv6iirmfn1jd2@corp.supernews.com>...
Do you realize how patronizing your response was?

Please, quickly give the appropriate INIT for a LUT4 where the desired
output is

&(in[3:1]^~in[2:0])

Since you can count to 4, this should be simple.
Can you guarantee that other engineers looking at your code later will
understand what you're trying to do?

Sorry. I wasn't trying to be an asshole.

I thought you were into trying to partitioning logic into LUTs in some
sneaky way.

I assumed software is smart enough to compute an INIT string
from a logic equation. The old Xilinx tools could do that for
3000 series parts. Has that fallen through the cracks with the
newer software?

If I'm manually instantiating LUTs, the Synplify synthesizer has
nothing to base the INIT upon: no equations, no clue. If I manage to
convince the software to get into the LUTs that I need so I can go
into the EDIF netlist (or HDL Analyst) and find the INITs, I've
already achieved my goal of getting the design into teh desired LUTs.
If I do get the INITs from those sources, I must be absolutely sure I
get the port order correct or the logic is blown (or I need to do the
ol' Carnot shuffle).

I tried to instantiate a LUT the other day and I couldn't figure out
the "right" way to do the Verilog for Synplify since there are INIT
parameters for simulation and xc_props="INIT=xxxx" for synthesis as
far as I can tell. I had something that looked un-LUT-like in the HDL
Analyst technology viewer so I didn't pursue that furhter.

Code with LUTs and INITs is sincerely less supportable than code with
something as annoying as an AND3 primitive instantiation. Interesting
thing with the AND3 - there was no primitive in the virtex2.v file
included in the Synplify flow but a quick black_box definition in my
source and the synthesizer knew it was a 3-input AND. It implemented
in the Xilinx device just fine.


Oh - Vinh, if you're reading... I used an inference of the form

bytesPlus1[8:1]==bytesPlus1[7:0]

and got a note in Synplify saying it "detected a comparator ==" and
produced your two-levels of logic with the 4-input AND. If you do
things "just" the right way.... Oy.
 
I tried to instantiate a LUT the other day and I couldn't figure out
the "right" way to do the Verilog for Synplify since there are INIT
parameters for simulation and xc_props="INIT=xxxx" for synthesis as
far as I can tell. I had something that looked un-LUT-like in the HDL
Analyst technology viewer so I didn't pursue that furhter.
This seems like a good addition to the tools discussion in other
threads.

There really should be a simple way to say something like:
Generate this signal in a LUT (or whatever) with these inputs.
Possibly constraining the location and/or locking some inputs to
particular ports.
The idea is that the system will figure out the details from the
info it already has.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
Hi,

The way I use to do it is to draw a carnaugh map and include it in a
comment above the instanciated lut.

Göran

John_H wrote:

hmurray@suespammers.org (Hal Murray) wrote in message news:<vnsdiro5e8e272@corp.supernews.com>...


I overuse the syn_keep attribute and I hate the idea of instantiating
LUTs. My Carnot skills aren't exactly used regularly.


Are Carnot skills needed? I can generally count to 4. A 4 input LUT
can implement any function of 4 inputs.



Do you realize how patronizing your response was?

Please, quickly give the appropriate INIT for a LUT4 where the desired
output is

&(in[3:1]^~in[2:0])

Since you can count to 4, this should be simple.
Can you guarantee that other engineers looking at your code later will
understand what you're trying to do?

With Regards,
John_H
 
Oh - Vinh, if you're reading... I used an inference of the form

bytesPlus1[8:1]==bytesPlus1[7:0]

and got a note in Synplify saying it "detected a comparator ==" and
produced your two-levels of logic with the 4-input AND. If you do
things "just" the right way.... Oy.
Great John :_) Glad you found a compact way of getting Synplify to do what
you want. It's a great synthesizer, but like all of them, it's got it's own
quirky "dialect." Good ol' trial and error.

Heh one annoying situation is when the synthesizer "optimizes" 90% of your
design away, and you have to hunt down that little itty bitty bit of code
that caused it.
 
"John_H" <johnhandwork@mail.com> skrev i melding
news:Xwoeb.26$XP3.4711@news-west.eli.net...
Greetings,

I need to detect runs. I want to look at 65 bits and show when there are
9
consecutive 1s or 0s from the byte boundaries resulting in 8 values per
clock. This should be comfortably done in two logic levels (I need clean
logic delays).

The idea is simple but the implementation is tough. I'm working with
Verilog in Synplify, targeting a Xilinx Spartan-3. I have to resort to
design violence to get the results that I believe are "best."

Any thoughts on how to do this "better?" (the following code likes fixed
fonts)
I just started reading this thread.. Am I correct if you really want to
detect 9 EQUAL bits in a row from a stream?
Could you not do this just with a 4bits counter and a comparator/zero
detector?
 
hmurray@suespammers.org (Hal Murray) writes:

I overuse the syn_keep attribute and I hate the idea of instantiating
LUTs. My Carnot skills aren't exactly used regularly.

Are Carnot skills needed?
Not unless you're building a heat engine out of your FPGA...

Homann
--
Magnus Homann, M.Sc. CS & E
d0asta@dtek.chalmers.se
 
Two LUT's to look at two consecutive nibbles.
One LUT to AND the output of the above with the next most significant bit
(the ninth bit).
That's it. Two levels. 24 LUT's.
Is that what you wanted?

Almost. The LUTs can't look at full nibbles. Since I need to make
sure all bits are equal to each other, there's a "smear."
You can look at nibbles without the smear.
If you know that all bits in each of the nibbles are equal you can
select one bit for each nibble as a representant and check whether the
nibbles are equal.

for each byte:
eq3210 <= '1' when data(3 downto 0) = "0000" or data(3 downto 0) =
"1111" else '0';
eq7654 <= '1' when data(7 downto 4) = "0000" or data(7 downto 4) =
"1111" else '0';
eq840 <= '1' when data(8)&data(4)&data(0) = "000" or
data(8)&data(4)&data(0) = "111" else '0';
run_found <= eq3210 and eq7654 and eq840;

That's three lut's and a carry chain or four luts in two levels of
logic.


Kolja Sulimma
 
"Morten Leikvoll" <m-leik@online.nospam> wrote in message news:<5z9gb.28389$os2.397003@news2.e.nsc.no>...
I just started reading this thread.. Am I correct if you really want to
detect 9 EQUAL bits in a row from a stream?
Could you not do this just with a 4bits counter and a comparator/zero
detector?
Correct, I need "equal" bits, either 9'h000 or 9'h1ff, starting from
0, 8, 16, ... 56.

The input is 65 bits per clock with a fast clock, output from BlockRAM
which was loaded at full width.

Counters require more than one clock.
 
Magnus Homann <d0asta@mis.dtek.chalmers.se> wrote in message news:<lthe2m7f3m.fsf@mis.dtek.chalmers.se>...
hmurray@suespammers.org (Hal Murray) writes:

I overuse the syn_keep attribute and I hate the idea of instantiating
LUTs. My Carnot skills aren't exactly used regularly.

Are Carnot skills needed?

Not unless you're building a heat engine out of your FPGA...

Homann
Just goes to show how little my Karnaugh skills are used.

Thanks for the laugh. :)
 
news@sulimma.de (Kolja Sulimma) wrote in message news:<b890a7a.0310060516.2056fc82@posting.google.com>...

<snip>

Two LUT's to look at two consecutive nibbles.
One LUT to AND the output of the above with the next most significant bit
(the ninth bit).
<snip>

Almost. The LUTs can't look at full nibbles. Since I need to make
sure all bits are equal to each other, there's a "smear."
I was trying to underscore that nibble checks with the 9th bit as the
qualifier were not sufficient. You expand upon this below by
qualifying with the [0] and [4] bits from the nibbles you looked at.

You can look at nibbles without the smear.
If you know that all bits in each of the nibbles are equal you can
select one bit for each nibble as a representant and check whether the
nibbles are equal.

for each byte:
eq3210 <= '1' when data(3 downto 0) = "0000" or data(3 downto 0) =
"1111" else '0';
eq7654 <= '1' when data(7 downto 4) = "0000" or data(7 downto 4) =
"1111" else '0';
eq840 <= '1' when data(8)&data(4)&data(0) = "000" or
data(8)&data(4)&data(0) = "111" else '0';
run_found <= eq3210 and eq7654 and eq840;

That's three lut's and a carry chain or four luts in two levels of
logic.


Kolja Sulimma
I appreciate the fresh perspective - I tried coding some things inline
similar to what you suggested, all with sub-optimal results. Using
syn_keeps on the three different variables and ANDing them together
would produce a valid result much like what has been achieved already.
It's too bad the nibble approach didn't convince the synthesizer to
do things any different than before.
 
The mapper seems to be pretty good at respecting LUTs and not changing them.

rickman wrote:

Actually, I don't think logic primatives will work since the back end
mapper can redo logic at will. The keep attribute is what is required
to define the LUTs and even that is not guaranteed since it only results
in a wire being kept; the LUT can still be split if other logic uses the
same inputs.
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
Ray Andraka <ray@andraka.com> writes:
<snip>
A third method is instantiating the LUT4, LUT3 or LUT2 component. In
this case, you dont' have a separate component to define, but you do
have to supply the LUT contents in the form of a hex string in two
places: once for the INIT=attribute which gets passed onto the edif
netlist but not to simulation, and once to the INIT generic which is
used for simulation, but not passed to the netlist. As I understand
it, the latest version of synplify passes the INIT generic to the
netlist, but that is not portable. Great care has to be taken with
this method to make sure the simulation matches the synthesized
hardware. It is also extraordinarily error prone and difficult to
read and maintain. This can be worked around by writing a function to
parse a boolean expression and convert it to the appropriate INIT
strings, but it is not trivial to write.

snip
... The third is generally too hard to use and maintain without
a good INIT string generation function, however it does not require a
large library of 2,3 and 4 input functions. It is hardly usable
without the boolean to INIT function however.
Like this one? I rememebred seeing this mentioned a few weeks ago - a
quick google and I found a link to it... from andraka.com of all
places :) [ Ray, you might want to update the link ]

http://www.rockylogic.com/freebies/freebies.html
Scroll down to "Locking Logic to a Single Xilinx Virtex LUT"

Cheers,
Martin
--
martin.j.thompson@trw.com
TRW Conekt, Solihull, UK
http://www.trw.com/conekt
 
Yes, like that one. I forgot I had a link to it. I went and updated all
the links on the links page this morning. There were a number of them that
had new homes, Thanks for reminding me and bringing it to my attention.

Martin Thompson wrote:

Like this one? I rememebred seeing this mentioned a few weeks ago - a
quick google and I found a link to it... from andraka.com of all
places :) [ Ray, you might want to update the link ]

http://www.rockylogic.com/freebies/freebies.html
Scroll down to "Locking Logic to a Single Xilinx Virtex LUT"

Cheers,
Martin
--
martin.j.thompson@trw.com
TRW Conekt, Solihull, UK
http://www.trw.com/conekt
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
John_H wrote:
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:3F7B2060.A4FA04B@yahoo.com...
I can't say I really understand your problem statement. I also don't
see how your code is solving the problem. Can you give a better
explanation of the problem? What do you mean "from the byte
boundaries"? Are you counting only the bit sets of {0..8}, {8..16},
{16..24}...? If so, this seems like an easy problem to implement.

You have it right - 8:0, 16:8, 24:16.... In the 41 bits illustrated below I
want to note when the sequence across [16:8] (illustrated by the 9 1s below
the count) in result[1].

11010101000101010111101111111111110010101 pattern
09876543210987654321098765432109876543210 count
00000000000000000000000011111111100000000 (reference)

The trouble is that while it seems easy to implement, getting the logic to
come out clean in the implementation - the "pushing the rope" problem -
doesn't comes easily. If one does a loop looking for 8 adjacent equals or
the 8-wide AND of 8 XNORs, the realization takes up 3 levels of logic with
FDR primitives resulting in 2x-3x the resources and about twice the delay.

The code with the two nested for loops breaks up the 9-bit compare into two
4-bit and one 3-bit compare to verify all the 9 bits are equal to each other
and to break the implementation into 2 levels of logic rather than 3+ (the +
being from the flop's reset input taking some extra routing delay).

I'd love a cleaner approach that doesn't come off so complex and gets
realized in the resources it "should" use. It's tough to get it where I
want by pushing on the rope.
My experience has been that it does not much matter how you code
combinatorial logic like this. The tools run it through a grinder and
produce an optimal version (in its own mind). When I want to optimize
like this, I either use a "keep" attribute on the wire, or sometimes you
can instantiate primitives. For logic I don't think primitives work
since gates just get remapped.

But I still don't understand your code. Why does the outer loop range
over 64 values. I would code two nested loops where the outer loop
ranges over the 8 outputs and the inner loop ranges over the 9 inputs
for each output. Or just skip the inner loop and use two outputs from
two sets of four inputs feeding a 3 input function and use keeps on the
first two output arrays. Maybe that is what you are doing, but I can't
figure out the code easily.

I see you are incrementing the i variable by j and ranging j in the
second loop by some complex control expression. Can't you just
increment i by 8?

for( i=0; i<64; i=i+8 ) begin
k = i % 8;
for( j=0; j<4; j=j+1 ) begin
runBitsA_[k] = runBitsA_[k] & bytePlus1[i+j];
runBitsB_[k] = runBitsB_[k] & bytePlus1[i+j+4];
end
runByte_ = runBitsA_ & runBitsB_[k] & bytePlus1[i+9];
end

Put the keep on runBitsA_ and runBitsB_ and you should get your two
level structure.



John_H wrote:

Greetings,

I need to detect runs. I want to look at 65 bits and show when there
are 9
consecutive 1s or 0s from the byte boundaries resulting in 8 values per
clock. This should be comfortably done in two logic levels (I need
clean
logic delays).

The idea is simple but the implementation is tough. I'm working with
Verilog in Synplify, targeting a Xilinx Spartan-3. I have to resort to
design violence to get the results that I believe are "best."

Any thoughts on how to do this "better?" (the following code likes
fixed
fonts)

- John_H
=====================================
module testRun ( input clk
, input [64:0] bytePlus1
, output reg [ 7:0] runByte /* synthesis xc_props =
"INIT=R"
*/
); // INIT included to force register as FD primitive -
bleah

reg [23:0] runBits; // I wanted the syn_keep on this combinatorial
"reg"
wire [23:0] runBits_ /* synthesis syn_keep = 1 */ = runBits; // - bleah
reg [ 7:0] runByte_;
integer i,j,k;

always @(*)
begin
runBits = -24'h1;
runByte_ = -8'h1;
k = 0; // overlapping aaa aaaa
for( i=0; i<64; i=i+j ) // consecutive aaaa
begin // bit regions 876543210
for( j=0; (i%8+j<8) && (j<3); j=j+1 )
runBits[k] = runBits[k] & (bytePlus1[i+j]==bytePlus1[i+j+1]);
runByte_[i/8] = runByte_[i/8] & runBits_[k];
k = k + 1;
end
end
always @(posedge clk) runByte = runByte_;

endmodule
--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
Hi,

What synthesis tool did you use?
When I instanciate primitives directly in my code, they tend to stay in
the netlist.
The synthesis tool usually leaves them alone.
If isn't working attach a U_SET attribute to them so the tools thinks
it's a RPM which is normally leaves alone.
That has been my approach for doing primitives.

Göran

John_H wrote:

Goran Bilski <goran@xilinx.com> wrote in message news:<blhm91$h0f1@cliff.xsj.xilinx.com>...


Hi,

Why not use the carry-chain?

You can do any kind of detection on that primitive and it will save you LUTs

Göran



I tried that approach earlier today but I wasn't getting the carry
chain I was trying to infer. The Virtex-IIs started getting poorer at
getting on/off carry chains timing-wise relative to the general logic
resources so I was trying to get general logic to work; I suspect
Spartan-3s are similar. If I go straight to register, I would need 4
LUTs to go into the register through the XORCY instead of the natural
XORCY so the logic savings isn't a given to achieve the speed but I
could keep it to 3 LUTs with a small routing hit.

I believe I'd need to implement all the carry chain primitives through
the generate block, including the MUXCYs and XORCY elements because
the synthesizer sees that "oh, it's a short chain" and converts my
simple arithmetic form to a cascade of LUTs rather than the carry
chain. I tried my tricks, I stopped pursuing.

Maybe I'll try to coax it again tomorrow. Thank goodness for that
generate!
 
Hi,

Yes, The crosspoint for using carry-chain or luts are somewhere between
2 and 3 levels.

But I have found that I can continue doing logic operation by continuing
the carry-chain.
It's easy to do "and","or" operation on the carry-chain.

So it depends on how signals are used afterwards.

But maybe for your problem, simple LUT implementation is a better solution.

Göran
John_H wrote:

Goran Bilski <goran@xilinx.com> wrote in message news:<blj78j$h0h1@cliff.xsj.xilinx.com>...


Hi,

What synthesis tool did you use?
When I instanciate primitives directly in my code, they tend to stay in
the netlist.



Synplify does a good job of leaving the instantiated primitives in the
code, sure. My first issue was that I believe carry chains are longer
than 2 levels of LUTs. The second issue is that I was trying to infer
- not instantiate - the adder chain by adding 1 to {1,1,1} when all
three LUTs worth of logic are valid, using the sign as my output. The
synthesizer turned the inferrence into LUTs which is probably more
effective in logic delay. I would need to generate 3 MUXCYs per chain
for 8 chains. If I want the quick-register destination, I need to
also instantiate the XOR in the sign bit. 4 primitives replicated 8
times for timing which *may* be worse. I chose not to pursue hte
instantiations because of the expected lower performance.



The synthesis tool usually leaves them alone.
If isn't working attach a U_SET attribute to them so the tools thinks
it's a RPM which is normally leaves alone.
That has been my approach for doing primitives.

Göran

John_H wrote:



Goran Bilski <goran@xilinx.com> wrote in message news:<blhm91$h0f1@cliff.xsj.xilinx.com>...




Hi,

Why not use the carry-chain?

You can do any kind of detection on that primitive and it will save you LUTs

Göran




I tried that approach earlier today but I wasn't getting the carry
chain I was trying to infer. The Virtex-IIs started getting poorer at
getting on/off carry chains timing-wise relative to the general logic
resources so I was trying to get general logic to work; I suspect
Spartan-3s are similar. If I go straight to register, I would need 4
LUTs to go into the register through the XORCY instead of the natural
XORCY so the logic savings isn't a given to achieve the speed but I
could keep it to 3 LUTs with a small routing hit.

I believe I'd need to implement all the carry chain primitives through
the generate block, including the MUXCYs and XORCY elements because
the synthesizer sees that "oh, it's a short chain" and converts my
simple arithmetic form to a cascade of LUTs rather than the carry
chain. I tried my tricks, I stopped pursuing.

Maybe I'll try to coax it again tomorrow. Thank goodness for that
generate!




--
 
You can do it, and it is fairly simple. The shortcut method is to partition the
logic by using keep buffers (syn_keep attributes in synplify). The synthesizer
will preserve the signals with the keep buffer, which pushes those signals onto
LUT outputs. The lut contents are then specified with logic equations and the
luts are inferred. For example:

attribute syn_keep of d:signal is true;
attribute syn_keep of h: signal is true;

begin

d<= a and b and c;
h<= e and f and (not g);
out<= d or h;

forces the synthesis to put the equation for d in one lut, the equation for h in
one lut and out in a third lut. Actually what it does is force the outputs d,
and h to be lut outputs. If a,b and c come from flip flops or have syn_keeps on
them then you fully force the LUT. If you leave those off, the the synth is free
to move those. This gives you a method of controlling construction without
resorting to instantiation.

Synplicity also provides a mechanism for creating a component tht you can put an
fmap on forcing it to be a LUT. In this case, the LUT has to be a separate
component. This one can be RLOC''d. To do this, you create a separate
component, in this case an AND3, and include an xc_map=lut attribute on the
component. The architecture of the component has the boolean equation for the
LUT in it.

--FMAP'd and3
library IEEE;
use IEEE.std_logic_1164.all;

entity fmap_and3 is
port ( a, b, c : in std_logic;
z : out std_logic);
end fmap_and3;
architecture rtl of fmap_and3 is
attribute xc_map : STRING;
attribute xc_map of rtl : architecture is "lut";
attribute syn_hier: string;
attribute syn_hier of rtl:architecture is "hard";
begin
z <= a and b and c;
end rtl;

This method depends on your naming to make it readable. Naming is not hard for
simple logic functions, but can be a bit of a pain when the 4 inptu equation gets
complicated.

A third method is instantiating the LUT4, LUT3 or LUT2 component. In this case,
you dont' have a separate component to define, but you do have to supply the LUT
contents in the form of a hex string in two places: once for the INIT=attribute
which gets passed onto the edif netlist but not to simulation, and once to the
INIT generic which is used for simulation, but not passed to the netlist. As I
understand it, the latest version of synplify passes the INIT generic to the
netlist, but that is not portable. Great care has to be taken with this method
to make sure the simulation matches the synthesized hardware. It is also
extraordinarily error prone and difficult to read and maintain. This can be
worked around by writing a function to parse a boolean expression and convert it
to the appropriate INIT strings, but it is not trivial to write.

The first method gives you structure but no handles for naming or placing the
LUTs. The second is one I use for many of my library macros, as it permits
attachment of an RLOC placement attribute so you can get control over placement.
It does however result in a large library of small combinatorial functions which
can become awkward to maintain. The third is generally too hard to use and
maintain without a good INIT string generation function, however it does not
require a large library of 2,3 and 4 input functions. It is hardly usable
without the boolean to INIT function however.



Hal Murray wrote:

I tried to instantiate a LUT the other day and I couldn't figure out
the "right" way to do the Verilog for Synplify since there are INIT
parameters for simulation and xc_props="INIT=xxxx" for synthesis as
far as I can tell. I had something that looked un-LUT-like in the HDL
Analyst technology viewer so I didn't pursue that furhter.

This seems like a good addition to the tools discussion in other
threads.

There really should be a simple way to say something like:
Generate this signal in a LUT (or whatever) with these inputs.
Possibly constraining the location and/or locking some inputs to
particular ports.
The idea is that the system will figure out the details from the
info it already has.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
I'm not sure I caught your requirement completely. As I understand it, you need to detect
when there are 9 0's or 9 1's starting at a byte boundary?
That can be done with a bit of logic and a carry chain:

the lsb inverts the remaining bits if it is zero, otherwise the remaining bits are passed to
the carry chain unchanged. the carry chain gets the conditionally inverted bits starting at
LSB+1, and at the 9th bit you look at the carry chain output (the carry in at the base of
the chain is also a '1'). If all the bits match the LSB, you get a '1' on the carry chain
output. This is easiest to code with instantiated MUXCY's rather than pushing on a rope to
get the synthesizer to figure out you want to use the carry chain as a wide and gate. Not
sure if this meets your requirement or not, but it is illustrative of how you might tackle a
similar problem with a perhaps less than obvious approach.


John_H wrote:

rickman <spamgoeshere4@yahoo.com> wrote in message news:<3F7B3F59.191901A4@yahoo.com>...
My experience has been that it does not much matter how you code
combinatorial logic like this. The tools run it through a grinder and
produce an optimal version (in its own mind). When I want to optimize
like this, I either use a "keep" attribute on the wire, or sometimes you
can instantiate primitives. For logic I don't think primitives work
since gates just get remapped.

I overuse the syn_keep attribute and I hate the idea of instantiating
LUTs. My Carnot skills aren't exactly used regularly.

But I still don't understand your code. Why does the outer loop range
over 64 values.

I've had problems with bit ranges in the past where [i+4:i] is a
complaint. Perhaps this isn't an issue with for loops but I've
learned to avoid them in general logic. They do work fine in generate
blocks, however. I stepped through every bit to make a comparison to
the adjacent bit; 3 adjacent comparisons lumped into one variable
(with an eventual syn_keep) would give me 4-input functions that
should pack into LUTs. The complex end of the inside loop is so that
the three "LUTs" per byte are 4-input, 4-input, and 3-input functions.

I would code two nested loops where the outer loop
ranges over the 8 outputs and the inner loop ranges over the 9 inputs
for each output. Or just skip the inner loop and use two outputs from
two sets of four inputs feeding a 3 input function and use keeps on the
first two output arrays. Maybe that is what you are doing, but I can't
figure out the code easily.

I see you are incrementing the i variable by j and ranging j in the
second loop by some complex control expression. Can't you just
increment i by 8?

for( i=0; i<64; i=i+8 ) begin
k = i % 8;
for( j=0; j<4; j=j+1 ) begin
runBitsA_[k] = runBitsA_[k] & bytePlus1[i+j];
runBitsB_[k] = runBitsB_[k] & bytePlus1[i+j+4];
end
runByte_ = runBitsA_ & runBitsB_[k] & bytePlus1[i+9];
end

Put the keep on runBitsA_ and runBitsB_ and you should get your two
level structure.

This works very well for runs of ones only. I need to identify runs
of ones or runs of zeros. The technique can be expanded to my needs
resulting in runBitsA, B, and C where one of them needs to cover 2
comparisons, not 3 like the others. ...which is really is the
approach I was coding but using consecutive bits in a vector rather
than {A,B,C} and using the one statement rather than 3 to make the
assignments, dealing with the 2 comparison exception by terminating
the inside loop early.

Thanks for the help.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
Followup to: <3F83932E.16B23BC6@andraka.com>
By author: Ray Andraka <ray@andraka.com>
In newsgroup: comp.arch.fpga
The first method gives you structure but no handles for naming or placing the
LUTs. The second is one I use for many of my library macros, as it permits
attachment of an RLOC placement attribute so you can get control over placement.
It does however result in a large library of small combinatorial functions which
can become awkward to maintain. The third is generally too hard to use and
maintain without a good INIT string generation function, however it does not
require a large library of 2,3 and 4 input functions. It is hardly usable
without the boolean to INIT function however.
Here is a simple Perl script that produces the appropriate LUT bit
pattern for a 4-input LUT given any arbitrary boolean expression
involving "0", "1", "a", "b", "c", "d". The operators are the stanard
C/Perl ~ | ^ & -- the booleanizing operators including ?: and ! should
not be used. The extension to 5-input LUTs should be obvious.

This should be easily tweakable to produce any particular syntax
desired.

Posted mostly as an example.

-hpa

#!/usr/bin/perl

$e = join(' ', @ARGV);
$e =~ s/1/\(\$one\)/g;
$e =~ s/a/\(\$a\)/g;
$e =~ s/b/\(\$b\)/g;
$e =~ s/c/\(\$c\)/g;
$e =~ s/d/\(\$d\)/g;

$one = 0xffff;
$a = 0xaaaa;
$b = 0xcccc;
$c = 0xf0f0;
$d = 0xff00;

printf "%04x\n", eval($e) & $one;

--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
 
I'd rather use a function or procedure within the HDL so that the boolean expression
is in the code and is used directly to generate the init value. Otherwise, there is
a high chance of either transcription error or not updating the comment with the
boolean
expression. There is a link on my website to one such VHDL function.

"H. Peter Anvin" wrote:

Followup to: <3F83932E.16B23BC6@andraka.com
By author: Ray Andraka <ray@andraka.com
In newsgroup: comp.arch.fpga

The first method gives you structure but no handles for naming or placing the
LUTs. The second is one I use for many of my library macros, as it permits
attachment of an RLOC placement attribute so you can get control over placement.
It does however result in a large library of small combinatorial functions which
can become awkward to maintain. The third is generally too hard to use and
maintain without a good INIT string generation function, however it does not
require a large library of 2,3 and 4 input functions. It is hardly usable
without the boolean to INIT function however.


Here is a simple Perl script that produces the appropriate LUT bit
pattern for a 4-input LUT given any arbitrary boolean expression
involving "0", "1", "a", "b", "c", "d". The operators are the stanard
C/Perl ~ | ^ & -- the booleanizing operators including ?: and ! should
not be used. The extension to 5-input LUTs should be obvious.

This should be easily tweakable to produce any particular syntax
desired.

Posted mostly as an example.

-hpa

#!/usr/bin/perl

$e = join(' ', @ARGV);
$e =~ s/1/\(\$one\)/g;
$e =~ s/a/\(\$a\)/g;
$e =~ s/b/\(\$b\)/g;
$e =~ s/c/\(\$c\)/g;
$e =~ s/d/\(\$d\)/g;

$one = 0xffff;
$a = 0xaaaa;
$b = 0xcccc;
$c = 0xf0f0;
$d = 0xff00;

printf "%04x\n", eval($e) & $one;

--
hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 

Welcome to EDABoard.com

Sponsor

Back
Top