Picking the best synthesis result before implementation

J

James07

Guest
Out of curiosity, I wrote a script to explore with different options in the Vivado software (2014.4), especially on the synthesis options under SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after synthesis, just enough to get the timing estimate. I explore everything except the directive because it seems like you use the directive, you cannot manually set the options

My goal is to see if it will give me a better result before I move on to implementation. However, out of the 50 different results I see that a lot of the estimated worst slacks and timing scores are the same. About 40% of the results report the same values. I ran on 3 sample designs and it gave me the same thing.

So my question is, is there a way to differentiate what is a better synthesis result? What should I look at in the report?
 
On Thu, 30 Jul 2015 20:23:12 -0700, James07 wrote:

Out of curiosity, I wrote a script to explore with different options in
the Vivado software (2014.4), especially on the synthesis options under
SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after
synthesis, just enough to get the timing estimate. I explore everything
except the directive because it seems like you use the directive, you
cannot manually set the options

My goal is to see if it will give me a better result before I move on to
implementation. However, out of the 50 different results I see that a
lot of the estimated worst slacks and timing scores are the same. About
40% of the results report the same values. I ran on 3 sample designs and
it gave me the same thing.

So my question is, is there a way to differentiate what is a better
synthesis result? What should I look at in the report?

Did you also differentiate by resource usage? Same timing result and
lower usage would count as better, but sometimes different settings will,
after optimisation, yield the same result.

It's also worth trying ISE, with both the old and new VHDL parser (though
switching parsers is more likely to dance round bugs than improve synth
results).

While Vivado is relatively new, ISE has been heavily tuned across the
years and I wouldn't be surprised to find it sometimes gives better
results.

If you try it, I'd be interested to see your conclusions.

-- Brian
 
Brian Drummond wrote:
On Thu, 30 Jul 2015 20:23:12 -0700, James07 wrote:

Out of curiosity, I wrote a script to explore with different options in
the Vivado software (2014.4), especially on the synthesis options under
SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after
synthesis, just enough to get the timing estimate. I explore everything
except the directive because it seems like you use the directive, you
cannot manually set the options

My goal is to see if it will give me a better result before I move on to
implementation. However, out of the 50 different results I see that a
lot of the estimated worst slacks and timing scores are the same. About
40% of the results report the same values. I ran on 3 sample designs and
it gave me the same thing.

So my question is, is there a way to differentiate what is a better
synthesis result? What should I look at in the report?

Did you also differentiate by resource usage? Same timing result and
lower usage would count as better, but sometimes different settings will,
after optimisation, yield the same result.

It's also worth trying ISE, with both the old and new VHDL parser (though
switching parsers is more likely to dance round bugs than improve synth
results).

You can't use the old parser on 6 or 7 series parts. It's OK to
use the newer parser for older parts, but the use_new_parser
switch is ignored for 6 or 7 series. So in effect there's only
one XST implementation to try if you are using 7-series parts.
ISE does allow you to use SmartXplorer to investigate different
canned sets of options, though. I usually find that you need to
individually tune the settings to get the best results.

While Vivado is relatively new, ISE has been heavily tuned across the
years and I wouldn't be surprised to find it sometimes gives better
results.

If you try it, I'd be interested to see your conclusions.

-- Brian

--
Gabor
 
On Friday, July 31, 2015 at 11:23:18 AM UTC+8, James07 wrote:
Out of curiosity, I wrote a script to explore with different options in the Vivado software (2014.4), especially on the synthesis options under SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after synthesis, just enough to get the timing estimate. I explore everything except the directive because it seems like you use the directive, you cannot manually set the options

My goal is to see if it will give me a better result before I move on to implementation. However, out of the 50 different results I see that a lot of the estimated worst slacks and timing scores are the same. About 40% of the results report the same values. I ran on 3 sample designs and it gave me the same thing.

So my question is, is there a way to differentiate what is a better synthesis result? What should I look at in the report?

1. Lower area utilization with similar timing results would be considered good. However, it will be even better to take a look at the individual utilization of resources like LUTs, BRAM and DSP blocks. You may want to choose a synthesis result that allows you to add more features to your design in the future. Such features may require BRAM or DSP in different proportions. So, it might be good to see the synthesis results, especially area, with respect to expected feature changes in the future.

2. Power is another factor that you may consider when deciding which is a better synthesis result. If you have two synthesis results, where one uses a lot of LUTs while the other uses a lot of DSP blocks, it is very likely that the one with DSP blocks will dissipate lesser dynamic power. This is because DSP blocks are optimized hard IP blocks on the device.

3. Have you analysed your results with respect to pin assignment? If pin assignment is critical to how your FPGA will be placed on the board, you may want to see the synthesis results with that perspective. Under no pin assignment constraint, the tool automatically assigns pins to the design. Pin assignment constraint is not applied by the tool during "synthesis-only" run. But the default pin assignment and corresponding synthesis results can be analyzed with respect to your planned pin assignment.

4. If a large percentage of synthesis results give similar results, it also means that the tool is not finding many opportunities to perform various optimizations. It could be because your design is already very well architected or it could be that it needs to be re-architected if you are aiming for certain specific performance measures. As the designer, you know better which is the case with the design.
 
On Friday, July 31, 2015 at 6:41:47 PM UTC+8, Brian Drummond wrote:
Did you also differentiate by resource usage? Same timing result and
lower usage would count as better, but sometimes different settings will,
after optimisation, yield the same result.

As far as I can tell, the resource usage is almost the same and similar. I am taking another look. On the first glance, for the 40% I mentioned, they look almost the same, which is also partly why I can't tell these clones troopers apart.

It's also worth trying ISE, with both the old and new VHDL parser (though
switching parsers is more likely to dance round bugs than improve synth
results).

While Vivado is relatively new, ISE has been heavily tuned across the
years and I wouldn't be surprised to find it sometimes gives better
results.

If you try it, I'd be interested to see your conclusions.

-- Brian

Yes, I am intending to try it on ISE. The latest (and last!) ISE version 14..7 works on one of the older V7 devices. I will try that and see what is the result, although I am not so sure if it gives estimated timing scores after synthesis. Need to look into it.
 
On Sunday, August 2, 2015 at 12:41:43 PM UTC+8, Sharad wrote:
1. Lower area utilization with similar timing results would be considered good. However, it will be even better to take a look at the individual utilization of resources like LUTs, BRAM and DSP blocks. You may want to choose a synthesis result that allows you to add more features to your design in the future. Such features may require BRAM or DSP in different proportions.. So, it might be good to see the synthesis results, especially area, with respect to expected feature changes in the future.


2. Power is another factor that you may consider when deciding which is a better synthesis result. If you have two synthesis results, where one uses a lot of LUTs while the other uses a lot of DSP blocks, it is very likely that the one with DSP blocks will dissipate lesser dynamic power. This is because DSP blocks are optimized hard IP blocks on the device.

3. Have you analysed your results with respect to pin assignment? If pin assignment is critical to how your FPGA will be placed on the board, you may want to see the synthesis results with that perspective. Under no pin assignment constraint, the tool automatically assigns pins to the design. Pin assignment constraint is not applied by the tool during "synthesis-only" run. But the default pin assignment and corresponding synthesis results can be analyzed with respect to your planned pin assignment.

This is a good point. No, I haven't got to that step. Based on what I understand from the Vivado flow, that happens during place_design phase. Hmm... so perhaps the next step is to take that 40% results and continue running them till end of place_design, and check out the timing estimates. I guess the later it is in the flow, the more accurate it becomes.

> 4. If a large percentage of synthesis results give similar results, it also means that the tool is not finding many opportunities to perform various optimizations. It could be because your design is already very well architected or it could be that it needs to be re-architected if you are aiming for certain specific performance measures. As the designer, you know better which is the case with the design.

I wouldn't say it is already well-architected. Sometimes my hands are tied and I can't change the code. So I am exploring ways to work the tools to my advantage. Thanks for the helpful comments.
 
On 8/2/2015 6:56 AM, kt8128@gmail.com wrote:
On Sunday, August 2, 2015 at 12:41:43 PM UTC+8, Sharad wrote:
1. Lower area utilization with similar timing results would be
considered good. However, it will be even better to take a look at
the individual utilization of resources like LUTs, BRAM and DSP
blocks. You may want to choose a synthesis result that allows you
to add more features to your design in the future. Such features
may require BRAM or DSP in different proportions.. So, it might be
good to see the synthesis results, especially area, with respect to
expected feature changes in the future.


2. Power is another factor that you may consider when deciding
which is a better synthesis result. If you have two synthesis
results, where one uses a lot of LUTs while the other uses a lot of
DSP blocks, it is very likely that the one with DSP blocks will
dissipate lesser dynamic power. This is because DSP blocks are
optimized hard IP blocks on the device.

3. Have you analysed your results with respect to pin assignment?
If pin assignment is critical to how your FPGA will be placed on
the board, you may want to see the synthesis results with that
perspective. Under no pin assignment constraint, the tool
automatically assigns pins to the design. Pin assignment constraint
is not applied by the tool during "synthesis-only" run. But the
default pin assignment and corresponding synthesis results can be
analyzed with respect to your planned pin assignment.


This is a good point. No, I haven't got to that step. Based on what I
understand from the Vivado flow, that happens during place_design
phase. Hmm... so perhaps the next step is to take that 40% results
and continue running them till end of place_design, and check out the
timing estimates. I guess the later it is in the flow, the more
accurate it becomes..

My experience is the timing numbers from synthesis are totally bogus.
You need to do a place and route if you want to compare timing data.
Even then you can get noticeable improvements in timing by running more
than one routes with different settings. So the connection back to your
synthesis parameters is hard to explore without a lot of work. Using
one pass on place and route may show synthesis option A to be the best
by 4% but when you explore the routing options you may find synthesis
option B is now 7% better.

I think this problem space is very chaotic with small changes in initial
conditions giving large changes in results.

I worked on a project once where the timing analysis tools were broken
saying the project met timing when it didn't. The design would fail on
the bench until we hit it with cold spray. I tried using manual
placement to improve the routing, but everything I did to improve this
feature made some other feature worse or even unroutable.

We automated a process of tweaking the initial seed parameter to get
multiple runs each night. The next day we would test those runs on the
bench with a chip warmer. Eventually we found a good design and shipped
it. Ever since then I have treated the entire compile-place-route
process like an exploration of the Mandelbrot set.


4. If a large percentage of synthesis results give similar results,
it also means that the tool is not finding many opportunities to
perform various optimizations. It could be because your design is
already very well architected or it could be that it needs to be
re-architected if you are aiming for certain specific performance
measures. As the designer, you know better which is the case with
the design.

I wouldn't say it is already well-architected. Sometimes my hands are
tied and I can't change the code. So I am exploring ways to work the
tools to my advantage. Thanks for the helpful comments.

Is there a particular problem you are having with the results? Is the
design larger than you need? If you haven't done a place-route I guess
it can't be that it is too slow. If you are just trying to "optimize" I
suggest you don't bother and just move on to the place and route. See
what sorts of results you get before you spend time trying to optimize a
design that may be perfectly good.

There is a rule about optimization. It says *don't* unless you have to.
Optimizing for "this" can make it harder to get "that" working or at
very least result in spending a lot of time on something that isn't
important in the end.

--

Rick
 
On Monday, August 3, 2015 at 12:59:14 AM UTC+8, rickman wrote:
My experience is the timing numbers from synthesis are totally bogus.
You need to do a place and route if you want to compare timing data.
Even then you can get noticeable improvements in timing by running more
than one routes with different settings. So the connection back to your
synthesis parameters is hard to explore without a lot of work. Using
one pass on place and route may show synthesis option A to be the best
by 4% but when you explore the routing options you may find synthesis
option B is now 7% better.

I think this problem space is very chaotic with small changes in initial
conditions giving large changes in results.

Yes, I understand that and have seen that myself. Part of it is why I am struggling to qualify what is a "good" synthesize result, with meeting timing as the end goal. For example, let say "A" synthesis set has 10% of meeting timing with various P&R settings. "B" synthesis set has only 5%. *Something* has got to be that difference.

I worked on a project once where the timing analysis tools were broken
saying the project met timing when it didn't. The design would fail on
the bench until we hit it with cold spray.

This is hilarious!


Is there a particular problem you are having with the results? Is the
design larger than you need? If you haven't done a place-route I guess
it can't be that it is too slow. If you are just trying to "optimize" I
suggest you don't bother and just move on to the place and route. See
what sorts of results you get before you spend time trying to optimize a
design that may be perfectly good.

I have done place-route a couple of times and it takes around 8 hours. (1 hour for synthesis) I tried different directives as well and it gave me a variety of results.

I understand how I am approaching this may not be practical in the grand scheme of things. BUT I got curious when I read in the V design methodology that if you get -300ps after post-synthesis, you can definitely meet timing. I also vaguely remember an illustration showing synthesis has a 10x effect on end results. I wonder how and who did these estimations.

There is a rule about optimization. It says *don't* unless you have to.
Optimizing for "this" can make it harder to get "that" working or at
very least result in spending a lot of time on something that isn't
important in the end.


--

Rick
 
On 8/2/2015 10:14 PM, kt8128@gmail.com wrote:
On Monday, August 3, 2015 at 12:59:14 AM UTC+8, rickman wrote:

My experience is the timing numbers from synthesis are totally
bogus. You need to do a place and route if you want to compare
timing data. Even then you can get noticeable improvements in
timing by running more than one routes with different settings. So
the connection back to your synthesis parameters is hard to explore
without a lot of work. Using one pass on place and route may show
synthesis option A to be the best by 4% but when you explore the
routing options you may find synthesis option B is now 7% better.

I think this problem space is very chaotic with small changes in
initial conditions giving large changes in results.

Yes, I understand that and have seen that myself. Part of it is why I
am struggling to qualify what is a "good" synthesize result, with
meeting timing as the end goal. For example, let say "A" synthesis
set has 10% of meeting timing with various P&R settings. "B"
synthesis set has only 5%. *Something* has got to be that
difference.

I think there is little about your synthesis result that can be easily
measured in a meaningful way to predict the timing result of routing.
That is what I mean about it being "chaotic". It is much like
predicting the weather more than a week out. You can see general
trends, but hard to predict any details with any accuracy. So the
weather man just doesn't try.

In FPGAs the synthesis result has no insight into routing so they just
measure the logic delays and then add a standard factor for routing.
Routing can be impacted by the logic partitioning in ways that are hard
to predict. I'd be willing to speculate it is a bit like the way they
proved in general the task of predicting the run time of a computer
algorithm will take as much run time as the algorithm itself. So the
best way to estimate run time is to run the task. Best way to estimate
routing result is to run routing. Routing is often half the total path
time, so without good info on that there is no decent guess to timing.


I worked on a project once where the timing analysis tools were
broken saying the project met timing when it didn't. The design
would fail on the bench until we hit it with cold spray.

This is hilarious!

this was also some time ago using the Altera Max+II tools when Quartus
was the "current" tool. Trouble was Altera didn't support the older
devices with the new Quartus tool. We were adding features to an
existing product so we didn't have the luxury of using the new tools
with new parts. Eventually they relented and did support the older
parts with Quartus, but it was well after our project was done. I
expect we weren't the only customer to want support for older products.


Is there a particular problem you are having with the results? Is
the design larger than you need? If you haven't done a place-route
I guess it can't be that it is too slow. If you are just trying to
"optimize" I suggest you don't bother and just move on to the place
and route. See what sorts of results you get before you spend time
trying to optimize a design that may be perfectly good.

I have done place-route a couple of times and it takes around 8
hours. (1 hour for synthesis) I tried different directives as well
and it gave me a variety of results.

Must be a large project. The project we were on would load up multiple
runs on many CPUs overnight. This would give us many trials to sort
through the next day. Best if this is done on a design that has passed
all logic checks and even runs in the board with a reduced clock or cold
spray.


I understand how I am approaching this may not be practical in the
grand scheme of things. BUT I got curious when I read in the V design
methodology that if you get -300ps after post-synthesis, you can
definitely meet timing. I also vaguely remember an illustration
showing synthesis has a 10x effect on end results. I wonder how and
who did these estimations.

I'm not sure what a "10x effect" means. But sure, a bad synthesis will
give you a bad timing result. On large projects it is hard to deal with
timing issues sometimes. You might try breaking the project down to
smaller pieces to see if they will meet timing separately. Perhaps you
will find a given module that is a problem and you can focus on code
changes to improve the synthesis? I don't think you can do tons just
using tweaks to tool parameters.

Are your modules partitioned in a way that lets each one be checked for
timing without lots of paths that cross?

--

Rick
 
On Sun, 02 Aug 2015 03:28:07 -0700, kt8128 wrote:

On Friday, July 31, 2015 at 6:41:47 PM UTC+8, Brian Drummond wrote:

While Vivado is relatively new, ISE has been heavily tuned across the
years and I wouldn't be surprised to find it sometimes gives better
results.

Yes, I am intending to try it on ISE. The latest (and last!) ISE version
14.7 works on one of the older V7 devices. I will try that and see what
is the result, although I am not so sure if it gives estimated timing
scores after synthesis. Need to look into it.

It does. If you can't see what you want in the summary, read the .syr
(Synth report) file.

-- Brian
 
On Thu, 30 Jul 2015 20:23:12 -0700, James07 wrote:

Out of curiosity, I wrote a script to explore with different options in
the Vivado software (2014.4), especially on the synthesis options under
SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after
synthesis, just enough to get the timing estimate. I explore everything
except the directive because it seems like you use the directive, you
cannot manually set the options

My goal is to see if it will give me a better result before I move on to
implementation. However, out of the 50 different results I see that a
lot of the estimated worst slacks and timing scores are the same. About
40% of the results report the same values. I ran on 3 sample designs and
it gave me the same thing.

So my question is, is there a way to differentiate what is a better
synthesis result? What should I look at in the report?

It is possible that you are giving tools too simple test-cases. Try
giving them something complicated - like big designs with VERY much
interconnectivity that also need to be fast - and see how they fare then.
 

Welcome to EDABoard.com

Sponsor

Back
Top