efficiently handling of flat data in skill

L

layoutDesign

Guest
I am working with flat data (because the XL connecitvity down the
hierarchy is not there). More specifically, I am working with
cv~>shapes (where cv is the current cellView). I want to create a list
of a subset of those shapes based on if the shapes are a rectangle and
a particular layerName. Here is a snip of my current code.

shapes = cv~>shapes
foreach(shape shapes
if( shape~>objType == "rect" && shape~>layerName == "VIA2"
newShapeList = cons(shape newShapeList)
)
)

This is taking way too long. Maybe I could cut down my time in half
but iterating/checking through cv~>shapes once? I suspect that
creating this massive "shapes" list is a big problem. Any other tips
would be great on this situation and/or your dealing with flat/massive
amounts of data. Thanks :)
 
On Feb 5, 5:47 am, layoutDesign <ford...@gmail.com> wrote:
I am working with flat data (because the XL connecitvity down the
hierarchy is not there). More specifically, I am working with
cv~>shapes (where cv is the current cellView). I want to create a list
of a subset of those shapes based on if the shapes are a rectangle and
a particular layerName. Here is a snip of my current code.

shapes = cv~>shapes
foreach(shape shapes
  if( shape~>objType == "rect" && shape~>layerName == "VIA2"
      newShapeList = cons(shape newShapeList)
  )
)

This is taking way too long. Maybe I could cut down my time in half
but iterating/checking through cv~>shapes once?  I suspect that
creating this massive "shapes" list is a big problem.  Any other tips
would be great on this situation and/or your dealing with flat/massive
amounts of data.  Thanks :)
Hi,

newShapeList = setof(shape cv~>shapes shape~>objType == "rect" &&
shape~>layerName == "VIA2")

This should be somewhat faster than foreach loop.
If the run time is still too long, you might have to look into itkdb
based c level implementation.
Another improvement we can do is, of the two conditions, put the
condition which is more likely to fail as the first one.

Regards,
Suresh
 
Suresh Jeevanandam wrote, on 02/05/09 06:43:
On Feb 5, 5:47 am, layoutDesign <ford...@gmail.com> wrote:
I am working with flat data (because the XL connecitvity down the
hierarchy is not there). More specifically, I am working with
cv~>shapes (where cv is the current cellView). I want to create a list
of a subset of those shapes based on if the shapes are a rectangle and
a particular layerName. Here is a snip of my current code.

shapes = cv~>shapes
foreach(shape shapes
if( shape~>objType == "rect" && shape~>layerName == "VIA2"
newShapeList = cons(shape newShapeList)
)
)

This is taking way too long. Maybe I could cut down my time in half
but iterating/checking through cv~>shapes once? I suspect that
creating this massive "shapes" list is a big problem. Any other tips
would be great on this situation and/or your dealing with flat/massive
amounts of data. Thanks :)

Hi,

newShapeList = setof(shape cv~>shapes shape~>objType == "rect" &&
shape~>layerName == "VIA2")

This should be somewhat faster than foreach loop.
If the run time is still too long, you might have to look into itkdb
based c level implementation.
Another improvement we can do is, of the two conditions, put the
condition which is more likely to fail as the first one.

Regards,
Suresh
In many cases the difference between using setof rather than the foreach which
conses a list is pretty small. I just did a quick test on a database with 250000
shapes, and the foreach method took 0.54s versus 0.47s for the setof version.
This is _after_ I'd solved the major speed problem which is the amount of time
it spends allocating the list cells and dbobjects. With my example, the return
list was something like 17000 elements, to give a feel for how much work it was
having to do inside the if in the first function.

The best approach is generally to use the SKILL profiler (in the SKILL
Development Toolbox). With this you can see the amount of time it spends in each
function, the critical path (hit the Tree icon first). You can also profile
memory, rather than time.

In this example, it took 15 seconds in gc (the garbage collector) initially. The
second run is much quicker, because it spends less time having to allocate
blocks of space (note, I'm talking IC5141 here; there have been a number of
improvements in IC61 in this area, and we also allocate bigger chunks to start
off with, since machines don't have tiddly amounts of RAM any more, so assuming
you only have a few Kbytes is a bit silly).

gc is called both when there is garbage to be cleaned, and also when you need
more memory - the first thing it does is to try to collect garbage, rather than
allocate more space.

What I did here (and this was a little overkill) was to use the needNCells()
function to pre-allocate more list cells and dbobject :

needNCells('list 1000000)
needNCells('dbobject 1000000)

(a million is too many, but didn't do any real harm).

You can use the gcsummary() function to see what has been allocated for each type.

With the needNCells() call first, the run time goes from 15.5 seconds down to
0.54 seconds - as I mentioned above.

In general you want to avoid keeping doing things like cv~>shapes over and over
again, because each time it has to rebuild the list.

The setof approach may help, but sometimes iterating over the list once and
having a condition inside the list is better than iterating multiple times - for
example, if I was building three lists - (e.g. Via1 rects, Via2 rects and Via3
rects), I would need three setof functions, or one foreach.

As I said, the best thing is to profile, because guessing where the bottleneck
is, is not always that easy!

Regards,

Andrew.
 
Suresh Jeevanandam wrote, on 02/05/09 06:43:
<snip>
newShapeList = setof(shape cv~>shapes shape~>objType == "rect" &&
shape~>layerName == "VIA2")
Thanks Suresh for the feedback.

On Feb 5, 4:24 pm, Andrew Beckett <andr...@DcEaLdEeTnEcTe.HcIoSm>
wrote:
<snip>
The best approach is generally to use the SKILL profiler (in the SKILL
Development Toolbox). With this you can see the amount of time it spends in each
function, the critical path (hit the Tree icon first). You can also profile
memory, rather than time.
Wow! This is very nice, and much more accurate than the digital
minute hand I was using.

<snip>
The setof approach may help, but sometimes iterating over the list once and
having a condition inside the list is better than iterating multiple times - for
example, if I was building three lists - (e.g. Via1 rects, Via2 rects and Via3
rects), I would need three setof functions, or one foreach.
I suspected this list thing was killing me in time. I see your point,
and the profiler I just ran confirms it. Iterating once through and
using "conditionals" is going to cut down the time in a significant
way. Thanks Andrew for the new tip in using the profiler, this is
great, and also the suggestion. Much appreciated.
 

Welcome to EDABoard.com

Sponsor

Back
Top