Has anyone produced a board using Kicad?

May 23, 2006

Now, back to the original topic at hand, before Stuart and Ales so
rudely created a gschem and gEDA turf war over schematics not being
part of PCB's charter -- I was unaware that gEDA had the right to
dictate design to the PCB project which existed long before gEDA. I'll
let DJ, Harry, Stuart and Ales work their turf war out ... leave me out
of it please.

I considered strongly adding schematics into PCB some four years back,
while Harry was still maintaining it himself out of JHU. When Harry
dropped off the face of the earth for a while, I even considered
starting a PCB project at sf.net, till I saw one day Harry had created
one. Harry had believed strongly that Schematics do not belong in PCB,
and as chief maintainer of the sf.net project, that was his choice.
Before starting FpgaC on sf.net, I was strongly tempted to pickup and
continue to support an Xaw version, and add Schematics in, as an sf.net
project called PCB2 ... fork the project since Harry and I have very
different goals and expectations about UI's and the types of designs
PCB should support/produce. I asked very clearly on the user forum for
PCB if Xaw was dead, trying to get a clear idea if it would remain a
crippled Gtk version ... and no answer. I was actually supprised to
find DJ had done a Lesstif variation, and had deviated strongly
(forked) from the old/Gtk UI.

In the end I decided I would be more useful digging TMCC out of the
grave, and bringing it forward a decade to be useful with todays FPGA
products.

TMCC/FpgaC suffers badly from the same working set problems I posed for
PCB. Very small changes in a project code transition FpgaC compile
times from a few minutes, to hours ... and in one case from 45 minutes
to over a day and a half, simply by exceeding working set sizes for L2
cache. Interestingly enough, the same C code does the same thing to
GCC at a slightly different boundry point.

Student, and other toy, projects frequently contain simple algorithms
that are ok inside a typical processors L2 cache size these day ...
that when the data set grows just slightly, fail horribly performance
wise. In this case, linearly searching a linked list works fine up to
about 90-95% of L2 cache size. When you exceed that threshold,
performance drops and run times increase roughly 10X or better because
of the nature of LRU or pseudo-LRU cache replacement policy.

Consider for example, a small cache of 4 "bins" of marbles taken from a
bowl of 300 marbles. If we first reference a certain red marble, it's
taken from the bowl and placed in a cache bin after searching the 300
marbles for it. We keep using, and replacing the red marble avoiding
the search in the bowl. Later we also use a green, blue, and yellow
marble, which take the three remaining bins in the cache. Because of
the nature of the task, we always use red, green, blue, and yellow in
that order, always taking from the cache, and replacing in the cache.

When our working set expands to five mables, we have a cache failure,
which goes like this. We access the red, green, blue, yellow marbles in
order from the cache, then we need a white marble. The red mable is
least reciently used so it's removed from the cache, and replaced with
the white marble. We then repeat our cycle, next needing the red mable
which is no longer in the cache, so we must fetch it from the bowl, and
due to the LRU algorithm, replace the green marble with the red
marble. However next we need the green mable, which forces the blue out
of the cache. Next we need the blue marble, forcing the yellow out of
the cache. Next we need the yellow, forcing the white out of the cache
.... and so on, with every cache hit faulting, requiring a lengthy
access and search of the bowl.

LRU algorithms fail horribly with sequential searches of the cached
working set, resulting in a very sharp reduction in performance as the
working set is exceeded. In FpgaC's case, the primary data structures
are linked lists which are frequently searched completely to verify the
lack of duplicate enteries when a new item is created. When the working
set under these linked lists exceeds the processors L2 cache size, run
times jump by more than a factor of 10 for many machines these days ...
the ratio of L2 cach performance compared to memory performance. Thus,
depending on the host processors L2/L3 cache size, there are critical
points for FpgaC where the run times to compile incrementally small
increases in program size, jumps dramatically. The fix for this is
relatively simple, and will occur soon, which is to replace the linear
searches with tree or hash searches to avoid referencing the entire
working set to invoke the LRU replacement failure mode problem.

Similar problems exist at several levels in the responsiveness of PCB.
Any event which forces a search of the design space, will require the
working set to hold all the objects that are required to be searched.
When that working set increases past various cache sizes, noticable
increases in latency will result, to the point that they are visible in
the UI ... that point will vary depeding on the particular machine
being used (L1/L2/L3 cache sizes, and the relative latency of reference
for faults). Developers who only use and test on a fast processors,
with large caches, and fast native memory, will not notice extremely
jerky performance that someone using a P2 333 celerion (128K cache)
with a 66mhz processor bus and fast page mode memory will encounter.
Slightly larger designs running on 4Ghz processors with 512K caches
will fail equally noticably with a design some 4-10 times larger.

Certain operations will fail harder, those which invoke a series of X
output, as they will also incure the memory overhead of part of the
kernel, some shared libraries, the X server, and display driver in the
"working set" for those operations. While a 512K cache is four times
larger, the available working set is Cache size minus X working set,
meaning that for small cache sizes there might not be much working set
left at all, while doubling the cache size may actually increase the
usable working set by 10X or more.

Just taking a guess, PCB + kernel + Xaw + Xserver, probably has a
working set something around or slightly larger than 128K for very
basic PCB tasks. Thus, we will see cache flushing and reloading between
X calls, and locally ok cache performance at both ends. As the L2/L3
cache grows to 512K this is probably less of a problem.

What does become a problem is when the PCB<==>X working set get
continually flused by every call, such as making a long series of
little calls to the Xserver, faulting all the way to the X server, and
faulting all the way back ... calling performance drops like a rock ...
factor of 10 or more. This happens when the task at either, or both, of
the PCB or X server end requires a slightly larger working set, making
the total working set for LRU into worst case failure mode.

I suspect that the Gtk failure modes do this, by including Gtk overhead
into the working set, such that every PCB to Gtk to X server call
faults round trip, and runs at native memory performance. The reason I
believe this is that in my testing a 550Mhz PIII machine with SDRAM is
only about twice as slow, as a 2GHz P4 machine with DDR SDRAM, in this
failure mode ... rather than the 4-6X normal compuation difference when
running at CPU speeds from L1 cache, or even L2 cache.

With synchronous calls to Gtk and the X server, it's difficult for PCB
to keep it's event processing in real time.

I have a several day class I used to regularly teach that discusses in
detail, designing for hysteresis problems that occure with step
discontinuties of the processor load vs thruput function that is quite
useful for recognizing and designing architectural solutions to
problems of this class.

So ... application architecture in respect to working set sizes is a
critical performance issue. Algorithm choices which conserve working
set, and avoid sequential LRU faulting are a critical issue for design.
And carefully managing data set representation for compactness, even at
the cost of a number of cpu cycles for packing/unpacking can greatly
push off the working set failure threshold with care full design.
Consolidation of processes to minimize frequency and location of long
working set calls to external processes (like including them as threads
if necessary) are critical to align them with places in the UI
interaction where latency hiding in transparent, rather than highly
visible.

fpga_toys@yahoo.com wrote:

DJ Delorie wrote:
fpga_toys@yahoo.com writes:
It really needs to be the same tool,

Or at least *seem* like it's the same tool. Otherwise, I agree.

On larger designs, memory is being pushed to maintain lists and objects
instantiated already. Paging severely cuts into performance. When
running as a separate application, there is substantial page
replication introduced for every data page for a long list of shared
library instances, plus replication of the netlists. Likewise,
performance is critially tied to working set, having a second
application running concurrently with equally large working set, will
provoke substantial cache thrashing, which will show up as memory
latency induced jerkyness in the UI, as the cache is flushed out and
reloaded between contexts. While these may seem like parameters in the
application architecture that can be ignored, perceived UI performance
is heavily dependent on them. Similarly the communication between
separate applications results in context switches, which causes
additional cache thrashing by including large sections of the kernel in
the working set. Consider the processor is some 20-100 times faster
than L2/L3 cache these days, and the cache is frequently another 10-50
times or more faster than memory. Exceeding cache working sets,
effectively turns the machine into a 50MHz processor again.

There are substantial performance reasons suggesting that it should be
the same application, (just a different thread at most) to conserve
memory resources, and improve performance. While they may not be
critical for toy student projects, for many real life projects which
are much larger, they become critical UI problems. The sample
ProofOfConcept design I sent you, is about 1/5 the size of several
production designs I have done using PCB.

When the typical desktop CPU comes standard with 10MB or better of L2
cache, these issues might go away. Last time I checked, this was only
available for high end Itianum processors, well outside the reach of
most mortals in cost (or me right now).

Stuart Brorson · May 23, 2006

Dude, way to flame!

Sorry for the top posting, but this thread suddently got kind of
long . . . .

I'll let others decide whos rude and sensitive; I didn't mean to come
across that way. I do try to make a point out of defending gEDA since
there is a lot of FUD out there, a lot of it due to user cluelessness
i.e. newbie college freshmen, or worse: grad students

.
I think it important to place verifiable facts (like examples of
boards done with the gEDA/PCB, and software design considerations)
against generalized complaints about usability (like "it's too slow"
or "too hard to use"). If it sounds like flaming, or an ego thing,
that't not the intent.

In contrast to some of the flames we get, you do appear to be quite
clueful, and have done open-source stuff yourself, so my hat is off to
you.

As for "a patch is worth a thousand posts", well, I stand
by it. I see you only *considered* patching PCB. Hmmmm . . .
But OTOH you started your own project, so you're doin' alright.

As for all your points about L2 cache, scalable algorithms, and
synchronous calls from GTK to X, that's very nice. But have you
verified any of them in the *specific* instance of PCB's code, or are
they just some ideas? Do you have any specific files/line numbers
where you see sub-optimal loops? You see, that's my point:
Donating general ideas is very easy, but doing implementation is
difficult. However, implementation is what counts. As an
open-source guy, I'm sure you know this.

As for the unfairly maligned GTK port of PCB: It was done by popular
request by a developer who kindly donated his valuable time to the
project, just as Ales, DJ, and you do with your respective projects.
This port brought PCB into a widget set that didn't look like 1985,
and also provided some usability enhancements. Any slowness is due to
the GTK widget set.

That GTK version of PCB worked fine (i.e. reasonable response time)
for me on normal desktop workstations (i.e. I didn't need a
supercomputer), but my boards tend to be of the middle-level size and
complexity. If you have boards which radically slow down PCB, that's
an interesting factoid for the general EDA community: You're doing
some pretty large designs using gEDA. Care to share component or net
counts on those boards? It would interest some of the nay-sayers
here.

(A few people did complain about the GTK port's speed when it came
out. Perhaps the speed did depend upon each computers' detailed
architecture, cache usage, and stuff like that. In
any event, DJ and team have re-architected PCB to support multiple
GUIs, including GTK, Motif, and XAw. It should be getting even
speedier now.)

Anyway, this discussion is devolving, and I have made my simple points
already:

* GEDA is very usable -- and is often used in the real world -- for
board designs up to mid-level complexity.

* "A patch is worth a thousand posts." Put another way: ideas are
cheap, implementation is what counts.

Therefore, with that I'll bid this thread farewell.

Stuart

fpga_toys@yahoo.com wrote:
: Now, back to the original topic at hand, before Stuart and Ales so
: rudely created a gschem and gEDA turf war over schematics not being
: part of PCB's charter -- I was unaware that gEDA had the right to
: dictate design to the PCB project which existed long before gEDA. I'll
: let DJ, Harry, Stuart and Ales work their turf war out ... leave me out
: of it please.

: I considered strongly adding schematics into PCB some four years back,
: while Harry was still maintaining it himself out of JHU. When Harry
: dropped off the face of the earth for a while, I even considered
: starting a PCB project at sf.net, till I saw one day Harry had created
: one. Harry had believed strongly that Schematics do not belong in PCB,
: and as chief maintainer of the sf.net project, that was his choice.
: Before starting FpgaC on sf.net, I was strongly tempted to pickup and
: continue to support an Xaw version, and add Schematics in, as an sf.net
: project called PCB2 ... fork the project since Harry and I have very
: different goals and expectations about UI's and the types of designs
: PCB should support/produce. I asked very clearly on the user forum for
: PCB if Xaw was dead, trying to get a clear idea if it would remain a
: crippled Gtk version ... and no answer. I was actually supprised to
: find DJ had done a Lesstif variation, and had deviated strongly
: (forked) from the old/Gtk UI.

: In the end I decided I would be more useful digging TMCC out of the
: grave, and bringing it forward a decade to be useful with todays FPGA
: products.

: TMCC/FpgaC suffers badly from the same working set problems I posed for
: PCB. Very small changes in a project code transition FpgaC compile
: times from a few minutes, to hours ... and in one case from 45 minutes
: to over a day and a half, simply by exceeding working set sizes for L2
: cache. Interestingly enough, the same C code does the same thing to
: GCC at a slightly different boundry point.

: Student, and other toy, projects frequently contain simple algorithms
: that are ok inside a typical processors L2 cache size these day ...
: that when the data set grows just slightly, fail horribly performance
: wise. In this case, linearly searching a linked list works fine up to
: about 90-95% of L2 cache size. When you exceed that threshold,
: performance drops and run times increase roughly 10X or better because
: of the nature of LRU or pseudo-LRU cache replacement policy.

: Consider for example, a small cache of 4 "bins" of marbles taken from a
: bowl of 300 marbles. If we first reference a certain red marble, it's
: taken from the bowl and placed in a cache bin after searching the 300
: marbles for it. We keep using, and replacing the red marble avoiding
: the search in the bowl. Later we also use a green, blue, and yellow
: marble, which take the three remaining bins in the cache. Because of
: the nature of the task, we always use red, green, blue, and yellow in
: that order, always taking from the cache, and replacing in the cache.

: When our working set expands to five mables, we have a cache failure,
: which goes like this. We access the red, green, blue, yellow marbles in
: order from the cache, then we need a white marble. The red mable is
: least reciently used so it's removed from the cache, and replaced with
: the white marble. We then repeat our cycle, next needing the red mable
: which is no longer in the cache, so we must fetch it from the bowl, and
: due to the LRU algorithm, replace the green marble with the red
: marble. However next we need the green mable, which forces the blue out
: of the cache. Next we need the blue marble, forcing the yellow out of
: the cache. Next we need the yellow, forcing the white out of the cache
: ... and so on, with every cache hit faulting, requiring a lengthy
: access and search of the bowl.

: LRU algorithms fail horribly with sequential searches of the cached
: working set, resulting in a very sharp reduction in performance as the
: working set is exceeded. In FpgaC's case, the primary data structures
: are linked lists which are frequently searched completely to verify the
: lack of duplicate enteries when a new item is created. When the working
: set under these linked lists exceeds the processors L2 cache size, run
: times jump by more than a factor of 10 for many machines these days ...
: the ratio of L2 cach performance compared to memory performance. Thus,
: depending on the host processors L2/L3 cache size, there are critical
: points for FpgaC where the run times to compile incrementally small
: increases in program size, jumps dramatically. The fix for this is
: relatively simple, and will occur soon, which is to replace the linear
: searches with tree or hash searches to avoid referencing the entire
: working set to invoke the LRU replacement failure mode problem.

: Similar problems exist at several levels in the responsiveness of PCB.
: Any event which forces a search of the design space, will require the
: working set to hold all the objects that are required to be searched.
: When that working set increases past various cache sizes, noticable
: increases in latency will result, to the point that they are visible in
: the UI ... that point will vary depeding on the particular machine
: being used (L1/L2/L3 cache sizes, and the relative latency of reference
: for faults). Developers who only use and test on a fast processors,
: with large caches, and fast native memory, will not notice extremely
: jerky performance that someone using a P2 333 celerion (128K cache)
: with a 66mhz processor bus and fast page mode memory will encounter.
: Slightly larger designs running on 4Ghz processors with 512K caches
: will fail equally noticably with a design some 4-10 times larger.

: Certain operations will fail harder, those which invoke a series of X
: output, as they will also incure the memory overhead of part of the
: kernel, some shared libraries, the X server, and display driver in the
: "working set" for those operations. While a 512K cache is four times
: larger, the available working set is Cache size minus X working set,
: meaning that for small cache sizes there might not be much working set
: left at all, while doubling the cache size may actually increase the
: usable working set by 10X or more.

: Just taking a guess, PCB + kernel + Xaw + Xserver, probably has a
: working set something around or slightly larger than 128K for very
: basic PCB tasks. Thus, we will see cache flushing and reloading between
: X calls, and locally ok cache performance at both ends. As the L2/L3
: cache grows to 512K this is probably less of a problem.

: What does become a problem is when the PCB<==>X working set get
: continually flused by every call, such as making a long series of
: little calls to the Xserver, faulting all the way to the X server, and
: faulting all the way back ... calling performance drops like a rock ...
: factor of 10 or more. This happens when the task at either, or both, of
: the PCB or X server end requires a slightly larger working set, making
: the total working set for LRU into worst case failure mode.

: I suspect that the Gtk failure modes do this, by including Gtk overhead
: into the working set, such that every PCB to Gtk to X server call
: faults round trip, and runs at native memory performance. The reason I
: believe this is that in my testing a 550Mhz PIII machine with SDRAM is
: only about twice as slow, as a 2GHz P4 machine with DDR SDRAM, in this
: failure mode ... rather than the 4-6X normal compuation difference when
: running at CPU speeds from L1 cache, or even L2 cache.

: With synchronous calls to Gtk and the X server, it's difficult for PCB
: to keep it's event processing in real time.

: I have a several day class I used to regularly teach that discusses in
: detail, designing for hysteresis problems that occure with step
: discontinuties of the processor load vs thruput function that is quite
: useful for recognizing and designing architectural solutions to
: problems of this class.

: So ... application architecture in respect to working set sizes is a
: critical performance issue. Algorithm choices which conserve working
: set, and avoid sequential LRU faulting are a critical issue for design.
: And carefully managing data set representation for compactness, even at
: the cost of a number of cpu cycles for packing/unpacking can greatly
: push off the working set failure threshold with care full design.
: Consolidation of processes to minimize frequency and location of long
: working set calls to external processes (like including them as threads
: if necessary) are critical to align them with places in the UI
: interaction where latency hiding in transparent, rather than highly
: visible.

: fpga_toys@yahoo.com wrote:
:> DJ Delorie wrote:
:> > fpga_toys@yahoo.com writes:
:> > > It really needs to be the same tool,
:> >
:> > Or at least *seem* like it's the same tool. Otherwise, I agree.
:>
:> On larger designs, memory is being pushed to maintain lists and objects
:> instantiated already. Paging severely cuts into performance. When
:> running as a separate application, there is substantial page
:> replication introduced for every data page for a long list of shared
:> library instances, plus replication of the netlists. Likewise,
:> performance is critially tied to working set, having a second
:> application running concurrently with equally large working set, will
:> provoke substantial cache thrashing, which will show up as memory
:> latency induced jerkyness in the UI, as the cache is flushed out and
:> reloaded between contexts. While these may seem like parameters in the
:> application architecture that can be ignored, perceived UI performance
:> is heavily dependent on them. Similarly the communication between
:> separate applications results in context switches, which causes
:> additional cache thrashing by including large sections of the kernel in
:> the working set. Consider the processor is some 20-100 times faster
:> than L2/L3 cache these days, and the cache is frequently another 10-50
:> times or more faster than memory. Exceeding cache working sets,
:> effectively turns the machine into a 50MHz processor again.
:>
:> There are substantial performance reasons suggesting that it should be
:> the same application, (just a different thread at most) to conserve
:> memory resources, and improve performance. While they may not be
:> critical for toy student projects, for many real life projects which
:> are much larger, they become critical UI problems. The sample
:> ProofOfConcept design I sent you, is about 1/5 the size of several
:> production designs I have done using PCB.
:>
:> When the typical desktop CPU comes standard with 10MB or better of L2
:> cache, these issues might go away. Last time I checked, this was only
:> available for high end Itianum processors, well outside the reach of
:> most mortals in cost (or me right now).

Ales Hvezda · May 23, 2006

Hi again,

[snip]

Frankly, some find themselves in over their heads and don't feel they
can contribute at a reasonable level, and I generally ask them to stay
and spend the time training them. At least those people have the
integrity to openly communicate, rather than those that don't answer
their email after the first week, or never deliver after making a
commitment to do their part.

I guess I don't worry too much about people who either commit to do
something and then never deliver or never contribute or get busy with
real life. Whatever, I'm just happy when people contribute.

[snip]

I have a very different reason for working on free software.
I don't write free software to "create a legacy" for myself. I write
free software to solve real world problems and maybe somebody else will
find it useful. I make a significant effort to keep my ego out of the
process as much as possible. I find this approach works best and over
the years there have been many valuable contributions to the gEDA
project.

Obviously your ego is highly engaged to respond this way, as was
Stuarts, to jump in attacking suggestions on what PCB should be, and
never even mentioning gschem. Gert a grip fella, why in the hell are
you attacking me for making some constructive criticism, and responding
equally lively to Stuart's little pissy bit.

Attacking you? Huh? I was very careful in my word choice. I was
only refering to me, not you. My only point is that everybody has
their own different reasons for doing OSS/free software.

Anyways, interesting thread, but this is where I stop off as well.

Good luck with your project!

-Ales

May 23, 2006

Stuart Brorson wrote:

I see you only *considered* patching PCB. Hmmmm . . .

Ought to do your homework first ... I've sent Harry patches in the
past.

Donating general ideas is very easy, but doing implementation is
difficult. However, implementation is what counts. As an
open-source guy, I'm sure you know this.

Actually, if the only form of bug reporting acceptable is fixes, it
will be a very long time before your project is complete and stable.
I've provided demonstratable boards to Harry, DJ, etc that demonstrate
the unreasonable slowness.

As for the unfairly maligned GTK port of PCB: It was done by popular

Actually, not unfairly maligned. A clean Gtk port would not have broken
the existing Xaw usage until everybody agreed it was a stable
replacement. One of the boards I had here last year, took well over a
minute to redraw under Gtk after a simple pan, that was under a second
with Xaw ... that's not just SLOW, that's unusable. Doesn't mater how
pretty its GUI looks.

* "A patch is worth a thousand posts." Put another way: ideas are
cheap, implementation is what counts.

When you stop listening to other peoples experience, the only other
choice is to make all the same mistakes yourself along the way, and
hope you actually learn the lessons too, and not do as many less
clueful folks do, repeat the same old mistakes for ever, because thats
the way it's always been done.

There are far more clueful people in the world willing to share
experience, if not treated like clueless right off the bat. There was
no reason to jump into this discussion complaining about not supporting
gschem and gEDA ... I don't like gschem, and don't use it. one more
bastard drawing UI to have to learn. The whole eEDA cludge between
gschem and PCB is painful at best. The lack of a consistant UI was one
prime complaint to DJ.

What I was talking about, are technical reasons for doing it right as
part of PCB, which has nothing to do with gschem, or the turf you where
defending by mistakenly attacking me.

In don't think it's nearly as difficult as you were complaining about,
and I've already looked. Maybe it's just because I'm not easily scared
by complexity, and do a general itterative "Keep It Simple Stupid"
(KISS) approach to tackling difficult projects until I become much more
experienced with the code. As I noted to Harry several years back ...
all the pieces are there in pcb already ... just startout by treating
schematic symbols as footprints and keep dual footprint libraries
initially (schematic and pcb) and two wire tracks for the design at
first. Then clean up the internal interfaces slowly, to include a
reasonable formal architecture. Things like crossed wires not forming a
connection, unless explictly joined. Like linked references between
schematic symbols, pin function lists, and actual foot prints based on
industry standards and vendor data. Things like automatic cross
notation between instances of the netlist (traces/rats). Be able to
pull spice data for not only the design, but the implmentation traces
as well, tied to vendor part data. Maybe not all the first year, or
even the second or third.

You end up with ONE UI, one project file, and hopefully one consistant
parts library.

Years ago, you did a schematic, then laid out the board. These days
with FPGAs, I frequently do the PCB then the schematic as nearly all
the pins are assigned based on ease of layout, not predefined as you
have with comodity parts. It becomes very useful no days, to have both
the schematic and the pcb up at the same time, and draw both at the
same time, one net at a time.

Used Sony 21" monitors are $50/ea on ebay and dual/triple/quad head is
supported in both Windows and Linux. A dual processor 1Ghz system with
4GB of ram is under $500, which makes one heck of a CAD system under
linux. My desk has three SGI GDM-5011P's on it which take VGA in. Most
peoples Best Buy or Circuit City systems cost more than I paid for the
parts on eBay. With large glass tubes not being "cool" these days,
high res sony monitors are a MUST BUY for any hardware hobbiest doing
CAD while the supply lasts. Get several and use them till they die.
I'm 54, and find using large fonts makes web surfing easier on my eyes
when I'm not busy doing another design.

I'm an avid hardware hobbiest, and most of the really dense and fun
boards I've done are for personal research. I mostly get paid for doing
contract software work and networking stuff, with hardware projects a
secondary part of my living. I like hardware, and don't turn down the
contracts when I can get them. When two pieces of a 16"x22" six layer
SMOBC panel are $185 from certain suppliers, that will hold a half
dozen projects ... doing quality PCBs is both cheap and fun. I
frequently run a homebrew club from my home, and share panel runs,
making two peices of most projects $20-60. And about double that if we
need to do stencils for both sides. I will be mfg boards as a business
later this year, with dual smt pick and place lines and N2 relow ovens.
Mostly to produce my own research boards, plus low cost hobby and
student FPGA project boards from recycled parts that I have extra. The
lines were a couple grand off eBay, and picked up initially to build my
home FPGA super computer boards -- which is another fun project I've
been working on for a few years. Several thousand FPGAs, MIPS/PPC CPUs,
memory, water cooling and a lot of power

The boards sent to DJ and Harry to demonstrate the Gtk slowness are all
proof of concept designs from my own research projects, some of which
I've also sold a few of. I think DJ thinks they are "interesting" too.
So when Stuart was getting off that anyone that needs more than a toy
design that can be done with the crippled student version of various
demo products, I pretty much feel he doesn't have a clue what real
hardware geeks want to do in their spare time with $50 of recycled eBay
parts

I'll be going back to grad school soon, and need my "home computer" for
research

DJ Delorie · May 23, 2006

[subject changed to reflect reality, and most people aren't interested
in the internals of the pcb program. Please remove the [was...] when
replying - DJ]

I think the key to the gtk pcb's sudden slowdown can be found in
queuing theory. As you move the mouse, the X server generates events.
They come at a certain pace, and you deal with then in a certain time.
The size of the queue of events is determined by the input rate and
completion rate. One interesting rule - when the input rate exceeds
the completion rate, the queue eventually becomes infinite. This
"trip point" happens when the redraw exceeds a certain complexity,
such as the sample board, and depends on your hardware speed too.

The lesstif HID was designed for my 400MHz laptop, so I went to great
lengths to avoid this problem (having been stung by it before). It
does two things to avoid the queue.

First, I combine motion events. When I get a motion event I remove
all motion events from the queue and query the *current* mouse
pointer. Thus, if the system is busy, I end up skipping events to
keep up.

Second, I redraw the screen only when the program is otherwise idle.
The event handlers only update the data structures, they don't usually
draw on the screen, just set a flag to do so later. When the event
queue is empty, I test the flag and, if set, *then* I redraw the
screen. The net result is, if the redraw takes 0.1 second, the screen
will be done redrawing 0.1 second after you stop moving the mouse.

Also, note that PCB's core uses an rtree to store the data it needs
for a redraw. If you're looking at the whole board, you have no
choice but to go through the whole data list. However, if you zoom
in, the working set shrinks to only those objects that are visible.

Ian Bell · May 23, 2006

Stuart Brorson wrote:

On the other hand, I think that Kicad is a little buggier
than gEDA -- it segfaulted a couple of time during my hour or two
playing with it. Gschem never segfaults.

Experiences clearly differ. Kicad has never segfaulted for me but gEDA has.

Also, Kicad is more limited
IMHO. That is, gEDA/PCB scales nicely to large designs with lots of
schematic pages (many nets and many components). I am not sure Kicad
scales to more than one page (although it may and I missed that
feature).

You did. I does hierarchical schematics.

I think that gEDA SPICE netlister is much more
full-featured than Kicads (which can't import external vendor
subcircuit model files).

It is, but is is more mature.

Also, due to it's extinsible architecture,
gEDA can netlist to over 20 different file formats, including 4 or 5
commercial layout packages. Can Kicad do that?

Not 20, 5 at present.

(Indeed, can you
write out a netlist native to Kicad's layout editor?)

Yes.

Finally, I
personaly like the fact that gEDA/PCB are connected via
writing/reading files. It makes it easy to break into the flow with
scripts if need be.

Kicad is too.

I am very glad that Kicad is around, and I have recommended it to
newbies who weren't up to using gEDA/PCB. Us gEDA developers have
played with it a little bit, and are very impressed with the UI
experience.

Interesting. I always found the gEDA UI very awkward and counter intuitive.
Kicad does pretty much what I would expect.

Personally, I tend to see it as more
suited to smaller boards/student projects, but I may be wrong and it
may be just as capable as gEDA of scaling up. It would be interesting
to do a head-to-head comparison of gEDA/PCB vs. Kicad to see which can
handle larger designs, more layers, more nets, larger boards, etc.
Hmmm, an interesting topic for a FreeDog get-together.

Interesting yes, but given Kicad's relative youth it's hardly a fair
contest. It's current main limitations regarding scalability are the number
of layers and the lack of a decent autorouter.

Ian

Stuart

May 23, 2006

DJ Delorie wrote:

I think the key to the gtk pcb's sudden slowdown can be found in
queuing theory. As you move the mouse, the X server generates events.
They come at a certain pace, and you deal with then in a certain time.
The size of the queue of events is determined by the input rate and
completion rate. One interesting rule - when the input rate exceeds
the completion rate, the queue eventually becomes infinite. This
"trip point" happens when the redraw exceeds a certain complexity,
such as the sample board, and depends on your hardware speed too.

This is right on, and the queuing theory issues are a fundamental partt
of understanding the hysteresis problems presented by exceeding working
set. If it takes 0.1 seconds (as you state below) to redraw the screen
while running cleanly out of l1/L2/L3 cache, then you can accept mouse
events at 100 per second and keep up with the users motion without
creating a backlog that grows queue length. The problem starts when the
working set is exceeded, processor bandwidth drops due to cache
faulting, and all of a sudden it starts taking 10 time longer to redraw
the screen.

The lesstif HID was designed for my 400MHz laptop, so I went to great
lengths to avoid this problem (having been stung by it before). It
does two things to avoid the queue.

I actually used to use PCB on a 233mhz Compaq a few years back when I
was traveling, and it was pretty usable under RH9, a 2.2 kernel, and
96mb of memory. Light paging traffic, mostly caused by crond which I
would normally turn off to get rid of the jerkiness.

First, I combine motion events. When I get a motion event I remove
all motion events from the queue and query the *current* mouse
pointer. Thus, if the system is busy, I end up skipping events to
keep up.

Bravo, that is the first step in countering working set problems ...
reduce the total work load linearly, and work harder in each cache
context before faulting to another. By combining motion events you slow
down the rate of context switching to the X server, and do more work
per context switch.

Second, I redraw the screen only when the program is otherwise idle.
The event handlers only update the data structures, they don't usually
draw on the screen, just set a flag to do so later. When the event
queue is empty, I test the flag and, if set, *then* I redraw the
screen. The net result is, if the redraw takes 0.1 second, the screen
will be done redrawing 0.1 second after you stop moving the mouse.

AKA latency hiding, by defering work to a less critical time.

Also, note that PCB's core uses an rtree to store the data it needs
for a redraw. If you're looking at the whole board, you have no
choice but to go through the whole data list. However, if you zoom
in, the working set shrinks to only those objects that are visible.

That was visible in the first Gtk release last year, that zooming in
would reduce the latency lag, and at some point it would suddenly
become realtime again.

Linked lists and trees frequently have a very poor memory usage
efficiency with small data structures and lots of pointer overhead,
combined with a kitchen sink problem (everything that is related to an
object is tossed into the same structure). FpgaC suffers from this
pretty badly.

Let me explain ... the problem is that to get to one or two words of
data, we frequently reference a structure that has maybe a dozen
related variables that are not used for every operation - plus a couple
pointers for linking the objects, all without cache line alignment.
When working sets start thrashing caches, there are smarter ways to
conserve working set by getting better memory utilization:

1) separate out variables which a heavily searched/used from those that
are not critical, so large working set, latency critical operations
fetch from memory only what is needed. Using this strategy, the
attributes necessary for drawing are in one object, and other
non-critical attributes are in a secondary object. It might even be
useful to compact some of these attributes in the latency critical
object, and keep a non-compact native form in the non-critical
attribute object.

2) use segmented tables (arrays, vectors) instead of single object
linked lists where possible to avoid the pointer overheads. Using this
strategy there may still be linked lists and trees, but each leaf node
includes a dozen or more objects in a table. Thus the ratio of usable
data to pointer overhead greatly improves.

3) Use some care in designing and allocation of your objects so that
they do not span multiple cache lines. Since a full cache line is
read/written as a memory operation, when an object uses the end of one
and the beginning of another cache line, two cache lines are partially
used, which cuts memory bandwidth in half.

using these strategies can improve working set performance by a factor
of 3 to 10, and application performace once the working set exceeds
cache sizes by 3 to 20 times.

One last tidbit ... dynamically linked shared libraries have signficant
working set bloat and poor cache line balancing ... it's sometimes
useful to statically link to get better cache performance ... but that
is another long discussion about why.

Toy student applications don't need to worry about these problems most
of the time. Larger production applications where interactive
performace and batch run times are import, frequently can not avoid
these optimizations.

My two cents worth from 30 years of performance engineering experiece
from fixing bloated applications.

DJ Delorie · May 24, 2006

fpga_toys@yahoo.com writes:

set. If it takes 0.1 seconds (as you state below) to redraw the
screen while running cleanly out of l1/L2/L3 cache, then you can
accept mouse events at 100 per second

10 per second.

and keep up with the users motion without creating a backlog that
grows queue length. The problem starts when the working set is
exceeded,

Even simpler. If you're getting 10 mouse events per second, and it
takes 0.099 seconds to process them, you're OK. If it takes 0.101
seconds to process them, you're toast. The trip point isn't filling
cache, it's just that the redraw finally takes longer than the
available time, even if it's still running cleanly out of cache.

By combining motion events you slow down the rate of context
switching to the X server, and do more work per context switch.

More importantly, I redraw the board less often. Redraws are
expensive no matter how much cache you have.

2) use segmented tables (arrays, vectors) instead of single object

PCB does this.

May 24, 2006

DJ Delorie wrote:

I have a not-for-distribution sample board which demonstrates the 10
second pause he's referring to. So far, it looks like a "catch up
with mouse events" scenario. I've also experienced the slow pre-hid
Gtk that some people complain about.

Ok DJ,

I spent all night playing with the Gtk current release version,
including striping the board I sent you down to the PCI frame and a few
hundred connections around the edge connector, cap's, and the like. All
other chips and connections removed from the board. Resulting
complexity is less than a typical microprocessor student board.

It still totally lags, and is in short a piece of crap runing on a
uniprocessor 2Ghz P4 with 4GB.
Of course, the lesstif version runs like a bat out of hell without any
problems, on either this cut down example, or the full example I sent
you.

So, in short ... it's NOT a "catch up with mouse events" scenario at
all, as moving the PCI frame to the right as before has nearly the same
8-10 second redraw all the rubber banded lines problems, and it takes
equally long to redraw it back original with "u" key for undo ... that
is NOT a catch up with the mouse problem.

Trying to drag a bounding box is suffering badly with 1/2 to 2 second
delays on mouse movements, cursor left/right movement and pan with
arrow keys lags badly, in short the whole thing just lags like hell
with a minor toy level design with NO components other than a couple
dozen caps, and a few hundred wires.

A bunch of other things are total broke as well. Try tab to flip to the
back side, and drag out a bounding box to select a region. The box
frame isn't clipped, scaled or mirrored to the area the mouse drags out
.... and neither is the region that actually selects.

If you select a large number of wires, such as the right side of the
ProofOfConcept board I sent,
the entire right half ... 3 collums, select, and pull down delete
selected, it chugs away for a long time ... hitting undo chugs away
again for a long time.

None of this happens with the old Xaw version, or the new Lesstif
version ....

so in short ... the Gtk version just plain sucks rocks after a year of
development as the prime recomended default release canidate.

Your Lesstif version has all the performance of the original Xaw
version ... so my hat's off to DJ.

May 24, 2006

DJ Delorie wrote:

More importantly, I redraw the board less often. Redraws are
expensive no matter how much cache you have.

Does the Gtk version have an extra flush the display list to the view
port call someplace in the main processing loop ... it's display
behavior is far too agressive wanting to updat the display.

DJ Delorie · May 24, 2006

fpga_toys@yahoo.com writes:

So, in short ... it's NOT a "catch up with mouse events" scenario at
all, as moving the PCI frame to the right as before has nearly the
same 8-10 second redraw all the rubber banded lines problems, and it
takes equally long to redraw it back original with "u" key for undo
... that is NOT a catch up with the mouse problem.

I think this relates to the "deferred redraw" thing I mentioned about
the lesstif hid. In this case, what's happening with the gtk hid is
that the view is being refreshed for EACH trace that gets moved. Even
at 30 FPS, for hundreds of traces that's many seconds to redraw.

If you want to take a stab at adding deferred refresh to the gtk hid,
it would be much appreciated.

May 24, 2006

DJ Delorie wrote:

If you want to take a stab at adding deferred refresh to the gtk hid,
it would be much appreciated.

That's probably the solution for one problem. It has a number of
others, all related to excessive lag in responsiveness, some of which
don't have to do with drawing .... like a huge lag when you take the
cursor off the view port to another window, and back. I suspect this
is a number of problems, that are additive, not a single quick kill
..... unless it's a single extra viewport update that got left in from
debugging a year ago.

May 24, 2006

fpga_t...@yahoo.com wrote:

DJ Delorie wrote:
If you want to take a stab at adding deferred refresh to the gtk hid,
it would be much appreciated.

There are too many other things that are broke, like the bounding box
problem on the back side, the lag on bounding box, arrow key
activities, and the like that all smell like this project was never
finished ... somebody quit in the middle of the port without doing a
good job of debug and checkout.

Your version works ... with some nits you are already aware of.

I think I'll go back to 050318 until you guys are finally done ... it
works just fine too, and I've grown skilled at using it's UI ... yours
has everything different ... from mouse actions, menus, pull downs, etc
.... and you don't seem done yet.

Has anyone produced a board using Kicad?

Guest

Stuart Brorson

Guest

Ales Hvezda

Guest

Guest

DJ Delorie

Guest

Ian Bell

Guest

Guest

DJ Delorie

Guest

Guest

Guest

DJ Delorie

Guest

Guest

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Has anyone produced a board using Kicad?

Guest

Stuart Brorson

Guest

Ales Hvezda

Guest

Guest

DJ Delorie

Guest

Ian Bell

Guest

Guest

DJ Delorie

Guest

Guest

Guest

DJ Delorie

Guest

Guest

Guest

Log in

Welcome to EDABoard.com

Sponsor