W
whygee
Guest
Hello,
I've spend a lot of time checking and optimizing the followind VHDL code :
* http://yasep.org/VHDL/asu_rop2.vhd is the "interesting" code
* http://yasep.org/VHDL/testdiff.vhd is the testbench
(already configured for speed measurement)
This is going to be the Add/Sub/Logic execution unit of YASEP ( http://yasep.org )
and the same code works for 16-bit and 32-bit wide datapath versions
(just change the generic).
I intend it make it as portable as possible, though I have optimised it for the
target I can access : Actel ProASIC3 (which I apreciate more and more).
So the granularity is gates of 3 inputs instead of the more classic 4-input LUT.
I imagine that it's easier to use 3-inputs logic in a 4-inputs system than
the reverse (sacrificing the unused 4th input for portability and generality).
Due to several constrains, I have pipelined the unit with 1 logic layer
just before the first FF barrier, then 5 logic layers and finally 2 layers
after the second FFs. My goal is to run with a safe margin at 100MHz
(P&R said 114MHz last time i tried).
However i don't know what kind of speed and device occupation (LUTs ?)
this design will give. Also, I'm open to comments, suggestions and
advices about the code (style, methods, etc.). [note: I have used concurrent
signal assignations wherever I could to ease the manual netlist alterations.]
Finally, I only use Synplicity and some warnings might appear or disappear
with other tools : only experience can tell this.
And in my experience, trying to port code makes it more solid and useful.
Can somebody spend a few minutes downloading and trying the code
on Altera or Xilinx tools and chips ?
Thanks in advance,
YG
I've spend a lot of time checking and optimizing the followind VHDL code :
* http://yasep.org/VHDL/asu_rop2.vhd is the "interesting" code
* http://yasep.org/VHDL/testdiff.vhd is the testbench
(already configured for speed measurement)
This is going to be the Add/Sub/Logic execution unit of YASEP ( http://yasep.org )
and the same code works for 16-bit and 32-bit wide datapath versions
(just change the generic).
I intend it make it as portable as possible, though I have optimised it for the
target I can access : Actel ProASIC3 (which I apreciate more and more).
So the granularity is gates of 3 inputs instead of the more classic 4-input LUT.
I imagine that it's easier to use 3-inputs logic in a 4-inputs system than
the reverse (sacrificing the unused 4th input for portability and generality).
Due to several constrains, I have pipelined the unit with 1 logic layer
just before the first FF barrier, then 5 logic layers and finally 2 layers
after the second FFs. My goal is to run with a safe margin at 100MHz
(P&R said 114MHz last time i tried).
However i don't know what kind of speed and device occupation (LUTs ?)
this design will give. Also, I'm open to comments, suggestions and
advices about the code (style, methods, etc.). [note: I have used concurrent
signal assignations wherever I could to ease the manual netlist alterations.]
Finally, I only use Synplicity and some warnings might appear or disappear
with other tools : only experience can tell this.
And in my experience, trying to port code makes it more solid and useful.
Can somebody spend a few minutes downloading and trying the code
on Altera or Xilinx tools and chips ?
Thanks in advance,
YG