Lattice Diamond & tristate

  • Thread starter Aleksandar Kuktin
  • Start date
A

Aleksandar Kuktin

Guest
Hello folks!

Unrelated to the other recent thread about Diamond and MachXO2, does
anyone know how to make Lattice's Diamond and MachXO2 synthesizeand use
tristate buffers?

Now, I'm not interested in tristates because "tristates" but because I am
trying to save up a bit of space by using a bidirectional bus instead of
two unidirectional. But for that, I need to alternate writing to the bus
and that is what I need tristates for.

Pursuant of this, in my verilog sources, I have declared the relevant
ports as bidirectional (inout), used proper constructs for alternating
reading and writing (high impedance and all that), but when I try to
synthesize the resulting code, Lattice's synthesizer claims an error to
the effect "wire such_and_such is constantly being driven from multiple
places" and stops. When I try with Synplify Pro, Synplify does synthesize
the tristates, but when Diamond translates that to its own format, I get
warnings similar to "unknown attribute: origin_instead_of --
ignoring" (I'm typing this from memory). I take that error to mean that
the translation program removed the tristates and left me with broken
code.
 
In article <l2l2lr$3ji$1@speranza.aioe.org>,
Aleksandar Kuktin <akuktin@gmail.com> wrote:
Hello folks!

Unrelated to the other recent thread about Diamond and MachXO2, does
anyone know how to make Lattice's Diamond and MachXO2 synthesizeand use
tristate buffers?

As others have said there are no tri-states. The next best alternative is
an OR-tree:

wor [3:0] foo;

wire [3:0] a;
wire drive_a;
assign foo = drive_a ? a : 4'd0;

wire [3:0] b;
wire drive_b;
assign foo = drive_b ? b : 4'd0;

wire [3:0] c;
wire drive_c;
assign foo = drive_c ? c : 4'd0;

etc. It's pretty efficient since you can make a 4 input OR gate with a
single LUT, or a 16 input OR gate with two levels of LUTs.

If you try to instantiate an internal tri-state bus, the tools usually
convert it to the above (but I've not tried this in Lattice- for sure this is
what happens in Xilinx and Altera).

You should get the same generated gates with this case statement:

wire [3:0] foo;

casex (enables) // synthesis parallel_case full_case
4'bxxx1: foo = a;
4'bxx1x: foo = b;
4'bx1xx: foo = c;
4'b1xxx: foo = d;
endcase

This code is safer (can not result in synthesis/simulation mismatch), but
above code is usually faster unfortunately, even though the synthesis tool
should be able to infer the parallel_case:

case (enables)
4'b0001: foo = a;
4'b0010: foo = b;
4'b0100: foo = c;
4'b1000: foo = d;
default: foo = 4'bxxxx;
endcase

--
/* jhallen@world.std.com AB1GO */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
 
Aleksandar Kuktin wrote:
Hello folks!

Unrelated to the other recent thread about Diamond and MachXO2, does
anyone know how to make Lattice's Diamond and MachXO2 synthesizeand use
tristate buffers?

Now, I'm not interested in tristates because "tristates" but because I am
trying to save up a bit of space by using a bidirectional bus instead of
two unidirectional. But for that, I need to alternate writing to the bus
and that is what I need tristates for.

Pursuant of this, in my verilog sources, I have declared the relevant
ports as bidirectional (inout), used proper constructs for alternating
reading and writing (high impedance and all that), but when I try to
synthesize the resulting code, Lattice's synthesizer claims an error to
the effect "wire such_and_such is constantly being driven from multiple
places" and stops. When I try with Synplify Pro, Synplify does synthesize
the tristates, but when Diamond translates that to its own format, I get
warnings similar to "unknown attribute: origin_instead_of --
ignoring" (I'm typing this from memory). I take that error to mean that
the translation program removed the tristates and left me with broken
code.

It's been a while since I last used Lattice parts, but I don't remember
them as having internal tristates. The last Xilinx devices with
internal tristates were the Virtex (original and E) and Spartan 2
series, bith very long in the tooth, and older than the Lattice EC
and ECP devices that I first used. So if the synthesis tool is
translating your tristates into logic, whether or not this breaks
the code, you're not likely to free up any space this way.

--
Gabor
 
On 10/3/2013 8:38 PM, Aleksandar Kuktin wrote:
Hello folks!

Unrelated to the other recent thread about Diamond and MachXO2, does
anyone know how to make Lattice's Diamond and MachXO2 synthesizeand use
tristate buffers?

Now, I'm not interested in tristates because "tristates" but because I am
trying to save up a bit of space by using a bidirectional bus instead of
two unidirectional. But for that, I need to alternate writing to the bus
and that is what I need tristates for.

Pursuant of this, in my verilog sources, I have declared the relevant
ports as bidirectional (inout), used proper constructs for alternating
reading and writing (high impedance and all that), but when I try to
synthesize the resulting code, Lattice's synthesizer claims an error to
the effect "wire such_and_such is constantly being driven from multiple
places" and stops. When I try with Synplify Pro, Synplify does synthesize
the tristates, but when Diamond translates that to its own format, I get
warnings similar to "unknown attribute: origin_instead_of --
ignoring" (I'm typing this from memory). I take that error to mean that
the translation program removed the tristates and left me with broken
code.

There are *no* internal tristate buffers in today's FPGAs. You can use
tristate buffers on I/O pins, but that is it. If you try to infer
tristate buffers it will either give errors or replace the tristates
with muxes and multiple busses.

--

Rick
 
On Fri, 04 Oct 2013 15:22:46 +0000, Joseph H Allen wrote:

In article <l2l2lr$3ji$1@speranza.aioe.org>,
Aleksandar Kuktin <akuktin@gmail.com> wrote:
Hello folks!

Unrelated to the other recent thread about Diamond and MachXO2, does
anyone know how to make Lattice's Diamond and MachXO2 synthesizeand use
tristate buffers?

As others have said there are no tri-states.

Well f--k. Pardon the language. This significantly reduces the amount of
fun one can have with FPGAs (internally)...

The next best alternative
is an OR-tree:

wor [3:0] foo;

wire [3:0] a;
wire drive_a;
assign foo = drive_a ? a : 4'd0;

wire [3:0] b;
wire drive_b;
assign foo = drive_b ? b : 4'd0;

wire [3:0] c;
wire drive_c;
assign foo = drive_c ? c : 4'd0;

Hmm... I'm pretty sure this won't work. The synthesizer will again
complain that the wire is driven from multiple sources. But I'll give it
a try.

The thought of using OR gates has crossed my mind before, but I didn't
really think much about it. I thought about trying something with a
register being alternatively filled from different sources, but that also
whouldn't do what I want it to.
 
In article <l2ng7e$g35$1@speranza.aioe.org>,
Aleksandar Kuktin <akuktin@gmail.com> wrote:
On Fri, 04 Oct 2013 15:22:46 +0000, Joseph H Allen wrote:

In article <l2l2lr$3ji$1@speranza.aioe.org>,
Aleksandar Kuktin <akuktin@gmail.com> wrote:
Hello folks!

Unrelated to the other recent thread about Diamond and MachXO2, does
anyone know how to make Lattice's Diamond and MachXO2 synthesizeand use
tristate buffers?

As others have said there are no tri-states.

Well f--k. Pardon the language. This significantly reduces the amount of
fun one can have with FPGAs (internally)...

Hmm... I'm pretty sure this won't work. The synthesizer will again
complain that the wire is driven from multiple sources. But I'll give it
a try.

I just tried it in LSE- it works fine. Also it tri-state works. Make sure
you are assigning like this:

assign bus = enable ? signal : 4'bzzzz;

It also works to have module outputs connected directly to a wor or
tri-state bus. Annoyingly, this does not work in Altera Quartus-II.

The thought of using OR gates has crossed my mind before, but I didn't
really think much about it. I thought about trying something with a
register being alternatively filled from different sources, but that also
whouldn't do what I want it to.

Ah, this is a different issue. A 'reg' can only be driven by a single
always block (and always blocks can only directly assign to regs, not
wires).

You can do it in simulation, but it's too difficult to untangle the dataflow
in the general case for synthesis.

You will find that you will sometimes want two always blocks to affect the
same reg in the same cycle, but the only way to do it is take a Mealy output
from one always block and use it as an input to another. This is certainly
one cause of tendonits in for Verilog RTL coders. To get Mealy outputs you
have to use two always blocks, one clocked and one non-clocked. Here is the
style:

// Clocked block: you must use non-blocking assign for simulation to work
// properly.

reg [3:0] my_counter, nxt_my_counter; // Register and input to register

always @(posedge clk or negedge reset_l)
if (!reset_l)
begin
my_counter <= 0;
end
else
begin
my_counter <= nxt_my_counter;
end

// Logic block: can use blocking or non-blocking assign, it doesn't matter.

always @(*)
begin
nxt_my_counter = my_counter; // Preserve logic by default

if (increment)
nxt_my_counter = nxt_my_counter + 1'd1;

if (decrement)
nxt_my_counter = nxt_my_coutner - 1'd1;
end

// Second always block, can use Mealy outputs (nxt_xxxxx signals) as inputs.

reg [3:0] foo;

always @(posedge clk or negedge reset_l)
if (!reset_l)
begin
foo <= 0;
end
else
begin
if (bar || nxt_my_counter[3])
foo <= foo + 1'd1;
end

--
/* jhallen@world.std.com AB1GO */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
 
On Sat, 05 Oct 2013 04:23:27 +0000, Joseph H Allen wrote:

Hmm... I'm pretty sure this won't work. The synthesizer will again
complain that the wire is driven from multiple sources. But I'll give it
a try.

I just tried it in LSE- it works fine. Also it tri-state works. Make
sure you are assigning like this:

assign bus = enable ? signal : 4'bzzzz;

It also works to have module outputs connected directly to a wor or
tri-state bus. Annoyingly, this does not work in Altera Quartus-II.

I've only now found time to try this. And it works! What happened
previously was that I assigned like this:

assign bus = enable ? signal : 8'hz;

Turns out you _do_ need to 'z every bit.

Unfortunately, the bidir bus did not live up to my expectations.
Basically, yes, it reduces the number of SLICEs used, but only marginally
and not always, depending on other details (like the one below).

I found a different way to free up space: use the internal clock. MachXO2
comes with an internal oscillator and using this instead of an other
clock source can easily slash high tens of SLICEs off. I don't know
exactly why this happens, the only thing I can assume is that many of
those SLICEs are used for synchronization or normalization or something
else that doesn't need to be done with the onboard clock.
 
On 10/15/2013 8:43 PM, Aleksandar Kuktin wrote:
On Sat, 05 Oct 2013 04:23:27 +0000, Joseph H Allen wrote:

Hmm... I'm pretty sure this won't work. The synthesizer will again
complain that the wire is driven from multiple sources. But I'll give it
a try.

I just tried it in LSE- it works fine. Also it tri-state works. Make
sure you are assigning like this:

assign bus = enable ? signal : 4'bzzzz;

It also works to have module outputs connected directly to a wor or
tri-state bus. Annoyingly, this does not work in Altera Quartus-II.

I've only now found time to try this. And it works! What happened
previously was that I assigned like this:

assign bus = enable ? signal : 8'hz;

Turns out you _do_ need to 'z every bit.

Generally I don't use hexadecimal for tristates. I have always found
that 8'bz is the same as 8'bzzzzzzzz. Remember that if you size the
tristate constant too small (e.g. 4'bz for an 8-bit reg) then it
gets _zero_ extended as required by the LRM.

Unfortunately, the bidir bus did not live up to my expectations.
Basically, yes, it reduces the number of SLICEs used, but only marginally
and not always, depending on other details (like the one below).

I found a different way to free up space: use the internal clock. MachXO2
comes with an internal oscillator and using this instead of an other
clock source can easily slash high tens of SLICEs off. I don't know
exactly why this happens, the only thing I can assume is that many of
those SLICEs are used for synchronization or normalization or something
else that doesn't need to be done with the onboard clock.

huh? I would have thought that the slice usage would be the same
for any clock source. You might use some slices for reset and
lock detection if you add a PLL for the external clock. But just
running a pin to a global clock buffer shouldn't use any slice
resources. That being said, I've seen large differences in slice
usage for the exact same design built with different placements.
Much of this is due to LUTs used as route-throughs or differences
in slice packing. I'd check the LUT and register count rather
than slices before giving any credence to the idea that an external
clock source adds to logic usage.

--
Gabor
 
On 10/15/2013 10:10 PM, Gabor wrote:
On 10/15/2013 8:43 PM, Aleksandar Kuktin wrote:

Unfortunately, the bidir bus did not live up to my expectations.
Basically, yes, it reduces the number of SLICEs used, but only marginally
and not always, depending on other details (like the one below).

I found a different way to free up space: use the internal clock. MachXO2
comes with an internal oscillator and using this instead of an other
clock source can easily slash high tens of SLICEs off. I don't know
exactly why this happens, the only thing I can assume is that many of
those SLICEs are used for synchronization or normalization or something
else that doesn't need to be done with the onboard clock.


huh? I would have thought that the slice usage would be the same
for any clock source. You might use some slices for reset and
lock detection if you add a PLL for the external clock. But just
running a pin to a global clock buffer shouldn't use any slice
resources. That being said, I've seen large differences in slice
usage for the exact same design built with different placements.
Much of this is due to LUTs used as route-throughs or differences
in slice packing. I'd check the LUT and register count rather
than slices before giving any credence to the idea that an external
clock source adds to logic usage.

This whole thread is a bit silly. There are *no* internal tristates in
FPGAs released in the last 10 years, possibly 15 years. The simulator
is showing tristates because that is what the code describes. Synthesis
is turning this into multiplexers inside the FPGA.

I can't say anything about the "slice" usage because slices are not the
primitive elements in even a Xilinx FPGA, rather slices have multiple
components which all get used in different ways including just for
routing. If any part of a slice is used the slice is counted as used.
This doesn't mean more logic can't be included... in other words, the
slice utilization could be as high as 100% and you can still add more
logic to your design.

So counting slice usage isn't really telling you the story you want to
know about. Try looking at the synthesis tool. Often they give a
schematic view of exactly what logic was generated. I believe this is
called "Technology View" in the tool I use for Lattice parts. Then you
can see exactly how they are implementing your HDL code.

Personally I would prefer to code for multiplexers because then you have
some control over how it is implemented. In some logic devices other
than Xilinx, muxes are more efficiently implemented as the cascaded AND
of ORs rather than the traditional OR of ANDs, product of sums vs. sum
of products if you will. Coding to the hardware can help the tool get
to an efficient solution, but not if you are imagining internal logic
that doesn't exist.

--

Rick
 

Welcome to EDABoard.com

Sponsor

Back
Top