Can someone try my code on other architectures/families ?

W

whygee

Guest
Hello,

I've spend a lot of time checking and optimizing the followind VHDL code :
* http://yasep.org/VHDL/asu_rop2.vhd is the "interesting" code
* http://yasep.org/VHDL/testdiff.vhd is the testbench
(already configured for speed measurement)
This is going to be the Add/Sub/Logic execution unit of YASEP ( http://yasep.org )
and the same code works for 16-bit and 32-bit wide datapath versions
(just change the generic).

I intend it make it as portable as possible, though I have optimised it for the
target I can access : Actel ProASIC3 (which I apreciate more and more).
So the granularity is gates of 3 inputs instead of the more classic 4-input LUT.
I imagine that it's easier to use 3-inputs logic in a 4-inputs system than
the reverse (sacrificing the unused 4th input for portability and generality).

Due to several constrains, I have pipelined the unit with 1 logic layer
just before the first FF barrier, then 5 logic layers and finally 2 layers
after the second FFs. My goal is to run with a safe margin at 100MHz
(P&R said 114MHz last time i tried).

However i don't know what kind of speed and device occupation (LUTs ?)
this design will give. Also, I'm open to comments, suggestions and
advices about the code (style, methods, etc.). [note: I have used concurrent
signal assignations wherever I could to ease the manual netlist alterations.]
Finally, I only use Synplicity and some warnings might appear or disappear
with other tools : only experience can tell this.
And in my experience, trying to port code makes it more solid and useful.

Can somebody spend a few minutes downloading and trying the code
on Altera or Xilinx tools and chips ?

Thanks in advance,
YG
 
whygee wrote:
Hello,

I've spend a lot of time checking and optimizing the followind VHDL code :
* http://yasep.org/VHDL/asu_rop2.vhd is the "interesting" code
Try modelsim first:

70 Mon Aug 11 /evtfs/home/tres/vhdl/asu_rop2> vcom asu_rop2.vhd
Model Technology ModelSim SE vcom 6.2a Compiler 2006.06 Jun 16 2006
-- Loading package standard
-- Loading package std_logic_1164
-- Loading package std_logic_arith
-- Loading package std_logic_unsigned
-- Compiling entity addsub32
-- Compiling architecture behaviour of addsub32
** Error: asu_rop2.vhd(36):
Cannot drive signal 'dataareg' from this subprogram.
** Error: asu_rop2.vhd(37):
Cannot drive signal 'databreg' from this subprogram.
** Error: asu_rop2.vhd(38):
Cannot drive signal 'databreg2' from this subprogram.
** Error: asu_rop2.vhd(39):
Cannot drive signal 'addsub2' from this subprogram.
** Error: asu_rop2.vhd(40):
Cannot drive signal 'sum' from this subprogram.
** Error: asu_rop2.vhd(41):
Cannot drive signal 'sum' from this subprogram.
** Error: asu_rop2.vhd(42):
Cannot drive signal 'rop2_out' from this subprogram.
** Error: asu_rop2.vhd(43):
Cannot drive signal 'rop2_combine' from this subprogram.
** Error: asu_rop2.vhd(77): VHDL Compiler exiting
71 Mon Aug 11 /evtfs/home/tres/vhdl/asu_rop2>
 
Mike Treseler wrote:
whygee wrote:
Hello,

I've spend a lot of time checking and optimizing the followind VHDL code :
* http://yasep.org/VHDL/asu_rop2.vhd is the "interesting" code

Try modelsim first:
uh-ohhhhhh... shame on me /o\

Modelsim is integrated in Libero but since i got it working in june,
i have not yet bothered to start it :-/
I relied too much on synthesis output only, as well as tests
with the FPGA loaded with the configuration and fed with
test vectors from a PC... Is Synplicity too laxist ?

70 Mon Aug 11 /evtfs/home/tres/vhdl/asu_rop2> vcom asu_rop2.vhd
Model Technology ModelSim SE vcom 6.2a Compiler 2006.06 Jun 16 2006
-- Loading package standard
-- Loading package std_logic_1164
-- Loading package std_logic_arith
-- Loading package std_logic_unsigned
-- Compiling entity addsub32
-- Compiling architecture behaviour of addsub32
snip
** Error: asu_rop2.vhd(43):
Cannot drive signal 'rop2_combine' from this subprogram.
** Error: asu_rop2.vhd(77): VHDL Compiler exiting
71 Mon Aug 11 /evtfs/home/tres/vhdl/asu_rop2
I see where/why i have faulted.
I tried to solve a problem the wrong way.

BTW, what is the scope/visibility of procedures/functions
regarding the signals ? Why does it work with Synplicity
and not Modelsim ? Is there a VHDL version (year) issue ?
I'm going to test this now.
I'm also updating other aspects of this big, scary files.

Thank you for your very fast answer,
YG
 
whygee wrote:

Modelsim is integrated in Libero but since i got it working in june,
i have not yet bothered to start it :-/
It's worth learning.
It has excellent error messages, and it is much more patient than I am.

I relied too much on synthesis output only, as well as tests
with the FPGA loaded with the configuration and fed with
test vectors from a PC... Is Synplicity too laxist ?
If it will compile that file without warnings, yes.

BTW, what is the scope/visibility of procedures/functions
regarding the signals ?
A signal can only be driven from a procedure in process scope.
Example:
http://mysite.verizon.net/miketreseler/sync_template.vhd

Why does it work with Synplicity
and not Modelsim ?
There are random processes in the universe.

Is there a VHDL version (year) issue ?
Unlikely.

I'm going to test this now.
I'm also updating other aspects of this big, scary files.
Consider making use of the numeric_std functions.

Thank you for your very fast answer,
Thank vcom.

-- Mike Treseler
 
Hello again !

I've spent the last hours fighting on many fronts, but
i believe that it was worth. The new versions are available
at the same address : http://yasep.org/VHDL
Later versions will appear at this same address.

Mike Treseler wrote:
whygee wrote:
Modelsim is integrated in Libero but since i got it working in june,
i have not yet bothered to start it :-/
It's worth learning.
It has excellent error messages, and it is much more patient than I am.
The messages that i have seen say what is wrong but not where or why :-(
I fortunately inferred enough to make the error messages disappear,
and it was at the cost of several things. Fortunately again,
i did a good job in the "netlist" part.
Even better : this transformation made a few errors apparent.

I relied too much on synthesis output only, as well as tests
with the FPGA loaded with the configuration and fed with
test vectors from a PC... Is Synplicity too laxist ?
If it will compile that file without warnings, yes.
damnit !

BTW, what is the scope/visibility of procedures/functions
regarding the signals ?
A signal can only be driven from a procedure in process scope.
Example:
http://mysite.verizon.net/miketreseler/sync_template.vhd
I've spent the last hours looking at your website as well,
thanks for your work !

Why does it work with Synplicity and not Modelsim ?
There are random processes in the universe.
This is not the kind of explanation that will help me :-(
Anyway, compiling with multiple tools, once again proves to
be very important.

Is there a VHDL version (year) issue ?
Unlikely.
I have been away from VHDLand during the last few years and
hoped that the situation would get better :-(

I'm going to test this now.
I'm also updating other aspects of this big, scary files.
Consider making use of the numeric_std functions.
what would they bring ?
The "behaviour" implementation already uses the "+" operator
successfully, yet it was not "fast enough" so i reverse-engineered
the generated netlist.

What would have been ... great : a pipelined version of
the add/substract unit, without the need to mess with the
detailed netlist. Actel seems to think that a monolithic
60MHz unit is good enough for all its customers...
I'm not designing a Cray and yet, i have to be careful
about not going beyond 5 logic levels and a fanout of 3 :-(

Anyway, this was very instructive.

Thank you for your very fast answer,
Thank vcom.
well, thanks whoever :)

-- Mike Treseler
bed <= YG;
wait(not(sleepy));
 
whygee wrote:

compiling with multiple tools, once again proves to
be very important.
The important thing is to verify by simulation
before attempting synthesis. Simulators check
for language errors and don't worry much
about synthesis. Synthesis tools tend to
match patterns and templates and don't worry
as much about enforcing language rules.
You need to get both working, or there
will be long hours on the bench.

The "behaviour" implementation already uses the "+" operator
successfully, yet it was not "fast enough" so i reverse-engineered
the generated netlist.
Hmmm. I suspect that the descriptions were not equivalent,
but even if they were, this "speedup" is device dependent.

Since a netlist style is difficult to read anyway,
a vendor library instance might be the more useful description
in this case. Or consider a faster device.

-- Mike Treseler
 
Symon wrote:
Can somebody spend a few minutes downloading and trying the code
on Altera or Xilinx tools and chips ?

Yes, somebody can. You! Xilinx offer their software for free in a thing
called 'Webpack', and Altera has a similar thing called 'Web Edition', that
you can use to try out your code. You'll find the debug process goes a lot
quicker without the Usenet posting-response loops! ;-)
Good luck, Syms.
I understand and agree.

However, posting on Usenet has one advantage :
it makes one *think* about the code (tools don't think).
Some "peer review" does not harm, right ?
And it's very educative.
I thought that my code was "ok and finished" before I sent
the original post, and I understood it was not the case,
by reading the answers and other threads. I have spent
a lot of time in "submarine mode" without external input.
Also, I learned that I relied too much on synthesis output,
and that my code should be structured a bit differently.

Oh, and there is something more :
installing yet other tools and bundles on my computer is
painful and annoying. It was already a great achievement
for me to successfully install Libero (2 months ago),
and i only start to be at ease with it. yeah.

Please be patient : around year 2002, i had 3 or 4 different
VHDL simulators installed on my old linux box. I wrote
http://f-cpu.seul.org/new/VHDL-HOWTO.f-cpu back then.
Installing countless tools is a big investment in time
and efforts, that pays in the future, but it keeps one
from implementing things _now_ :-/ I can't spend all my time
managing and chaperoning software, when all i want is
to code :-/

anyway, you probably understand what I mean :) </rant>

yg
 
Hello again,

Mike Treseler wrote:
whygee wrote:
compiling with multiple tools, once again proves to
be very important.

The important thing is to verify by simulation
before attempting synthesis. Simulators check
for language errors and don't worry much
about synthesis. Synthesis tools tend to
match patterns and templates and don't worry
as much about enforcing language rules.
You need to get both working, or there
will be long hours on the bench.
That is a good and clear explanation, thank you :)

The "behaviour" implementation already uses the "+" operator
successfully, yet it was not "fast enough" so i reverse-engineered
the generated netlist.
Hmmm. I suspect that the descriptions were not equivalent,
I have carefully checked every bit after every single alteration/enhancement.
And I came up with a few methods to do this.

As a final test, I have prepared a PC104 interface
between the FPGA evaluation board and an old industrial PC
that can feed about 100K test vectors per second.
Much faster than a SW simulator can do, just to be sure
that 1+2=3. Even though the input combinations are 2^64,
I use a good LFSR to "tickle" all the carry chains,
and past tests have found errors very fast,
most well before the first 1000 vectors.

but even if they were, this "speedup" is device dependent.
I am aware of this dependency, and i tried other methods before
I invested more than a month into the analysis of the synthesiser's output.
It was the best place to start from because any other add/sub
structure or even parametric generator gave substancially lower speed.
And these 5 weeks of manual efforts were a good way to
explore Libero's capabilities. Finally, I have a slightly
faster version (though a bit larger than the original) but the
real speed increase comes from the pipeline gates.

Furthermore, even if the "optimal" speed is device dependent,
I believe that this code can "work" on other families (though under-efficiently),
because I avoided any vendor-specific feature or code.
Only the structure of the boolean tree (3-input gates) makes
it adapted to the ProASIC family (though another recent post
by Rickman suggests that some Altera chips would be happy too).
I even have an eye on ASICs and their standard cell libraries.

Finally, the "manually pipelined version" /could/ also
give a boost on other FPGA families in some cases.
The "inefficiency" of the 3-tree can be offset by well-placed FFs.
For example, I have a pair of "new" (sigh), unsoldered XCV400 around
and it can maybe benefit from a little pipeline gate here or there...
If I ever find time to make a PCB.

But for the moment, I'm happy with the Actel chips,
now that I better understand their purposes, weaknesses
and strengths. So the device dependency is not a critical problem.

Since a netlist style is difficult to read anyway,
a vendor library instance might be the more useful description
in this case. Or consider a faster device.
I don't expect someone to dig deep into the netlist.
If a problem arises, I can spot an error quite fast now.
And since most people use faster FPGAs, they won't bother with
the netlist version.

If I (or someone else) find a vendor-provided pipelined Add/sub unit,
it's not difficult to include it in the code (just create another "architecture"
of the entity). Also, I have recently got an APEX evaluation board
(don't know much about it yet), but it will take months to get it running.

The fact is that, for several reasons, and mostly because
"I like Actel and this is a completely ungrounded and subjective,
hence undiscussable topic", I'll use this family in my next projects.
OTOH, I'll have to work on a Xilinx-based "paid" project and this is not
an issue for me, since I won't deal with the board directly,
I'll only provide "portable VHDL" to the engineer who will do the rest :)

...

BTW, i misunderstood your previous suggestion about "numeric_std".
I'm working on this now.

Thank you again for your time,

-- Mike Treseler
YG
 

Welcome to EDABoard.com

Sponsor

Back
Top