J
JustJohn
Guest
On Mar 6, 11:30 pm, JustJohn <justjohna...@gmail.com> wrote:
are (and this is not a synthesis benchmark, it is an illustraion of
efficient circuit structure expressed via coding style to reduce LUT
usage, applicable across all synthesis tools, thanks for the warning
Philippe, and BTW I AGREE, EULAs forbidding benchmarks are evil):
vec32_sum3: 6-input LUTs: 30 Slices: 4
Still tops 35, although some may consider the code too complex for
marginal improvement.
Regards,
John
for the 6-LUT case. Actual numbers for the 6-LUT case without BRAMsOn Mar 1, 8:41 am, Peter wrote:
Andy, I completely agree with what you have written above.
One should strive for maintainable and understandable version.
Although, on my particular case, I have to find a good solution
in terms of LUT resources, because I need 8 instances of
one counters with 64-bit input data. And the device is getting full...
Peter,
It looks like you've solved your problem simply by moving to a
better synthesis tool, so this may not be of interest anymore.
However, in addition to space, finding a more compact implementation
often leads to a speed increase as well, AND power savings.
Additionally, I like this counting bits problem, it turns up often
enough that it deserves some attention. As to maintainability, nothing
promotes that more than good comments. Once the function is written it
can be stuffed away in a package, never dealt with again (If anyone
copies the code, please include credit).
See my other post for the most compact/fastest way to implement the 32-
bit sum using 4-LUTs and taking advantage of carry logic fabric.
Gabor's post of the 35 LUT number when using 6-LUTs got me looking at
that case. Here are the results (Spartan 3 for the 4-LUTs, Spartan 6
for the 6-LUTs, XST for both, I'd be curious if any other synthesis
tool does better). Synthesizers continually improve, but nothing beats
a good look at the problem, as the 6-LUT case illustrates with a
better than 2:1 savings:
vec32_sum2: 4-input LUTs: 53 Slices: 31
vec32_sum3: 6-input LUTs: 15 Slices: 4
Finally, this is a neat problem because it's nice to make the little
things count.
Best regards all,
John L. Smith
Chagrinned OOPS!, synthesizer was throwing the leaf ROMs into BRAMs
are (and this is not a synthesis benchmark, it is an illustraion of
efficient circuit structure expressed via coding style to reduce LUT
usage, applicable across all synthesis tools, thanks for the warning
Philippe, and BTW I AGREE, EULAs forbidding benchmarks are evil):
vec32_sum3: 6-input LUTs: 30 Slices: 4
Still tops 35, although some may consider the code too complex for
marginal improvement.
Regards,
John