Branch prediction using counters.

Skybuck Flying · Aug 8, 2005

Hi,

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.

Then the execute-ahead logic would execute those branches with the highest
counter. Also a execute-ahead flag could be inserted into the jump
instruction to indicate if execute-ahead is allowed. For example a compiler
can use this flag to indicate that execute-ahead is not possible etc.

Ofcourse branch prediction is nothing new and processors nowadays have that.
So I was browsing through some patents. (Not all because that would take way
too much time... It would maybe take a day maybe even a few days to browse
through them all

)

So far I have seen one patent which mentions using a "taken/not taken" bit
which indicates if a branch was taken or not.

So I have few questions about branch prediction.

1. Is it common to use counters as described above to do branch prediction
or is this something novel ?

2. Suppose it's not novel... then why only use 1 bit to do branch prediction
?

Some reasons which I can think of:

1. It requires less memory than bigger counters.
2. Counters might take to long to change. For example Branch A might be
executed many times, then suddenly Branch B has to be executed many times.
Using large counters might take to long for the prediction to catch up

What are your thoughs on this ?

Any references to patents or other information about branch prediction ?

Bye,
Skybuck.

Ken Smith · Aug 8, 2005

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Hi,

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.
[...]
1. Is it common to use counters as described above to do branch prediction
or is this something novel ?

No, its is not common. For that matter I've never heard of the idea being
used. There are problems that the counter. It should not be part of the
instruction because that would mean that code space has to be written into
as the program runs. This will slow things down.

Many processors have caches on them. The cache will contain the code that
was done on the last pass around the loop and as a result repeating a loop
the same way is faster than taking some new path.

Many pipelined processors perform an instruction or two after the jump
instuction even when the jump is taken. A good compiler will take
advantage of this if it can.

Some processors have a conditional skip the next instruction and some have
conditional instructions. These allow logic to be done without taking a
branch. This increases the speed of things like "if X<0 then X := 0".

Some processors have multiple pipelines. In many cases this is one for
integers and one floating point. This allows those instructions that can
be done to skip past those that have to wait for a memory transfer.

--
--
kensmith@rahul.net forging knowledge

Skybuck Flying · Aug 8, 2005

"Ken Smith" <kensmith@green.rahul.net> wrote in message
news:dd7q2v$ns0$4@blue.rahul.net...

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:
Hi,

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions
and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.
[...]
1. Is it common to use counters as described above to do branch
prediction
or is this something novel ?

No, its is not common. For that matter I've never heard of the idea being
used. There are problems that the counter. It should not be part of the
instruction because that would mean that code space has to be written into
as the program runs. This will slow things down.

If it slows down the system then that would mean the bottleneck is in the
bandwidth/speed of the bus/whatever used to read/write instructions from/to
main memory.

I know games and stuff like that uses tremendous ammounts of bandwidth but
that goes via special bussess like agp or pci express.

How much memory bandwidth/speed is available to transfer instruction from/to
main memory where program code is located ?

and how much of that
bandwidth is generally used for application or maybe even games ?

In other words... is bandwidth from main memory to cpu and back, truely the
limiting factor ?

<other optimizations snipped

>

Bye,
Skybuck

Stephen Fuld · Aug 8, 2005

"Skybuck Flying" <nospam@hotmail.com> wrote in message
news:dd7sjv$1vs$1@news1.zwoll1.ov.home.nl...

"Ken Smith" <kensmith@green.rahul.net> wrote in message
news:dd7q2v$ns0$4@blue.rahul.net...
In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:
Hi,

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions
and
recording which branch is executed the most by incrementing or maybe
even
decrementing counters everytime the branch gets executed.
[...]
1. Is it common to use counters as described above to do branch
prediction
or is this something novel ?

No, its is not common. For that matter I've never heard of the idea
being
used. There are problems that the counter. It should not be part of the
instruction because that would mean that code space has to be written
into
as the program runs. This will slow things down.

If it slows down the system then that would mean the bottleneck is in the
bandwidth/speed of the bus/whatever used to read/write instructions
from/to
main memory.

You are mising an important point. There typically is no need to write to
the code space at all. Instruction caches have no write to memory
circuitry. To write the counters to memory you would have to add an
additional path, and if you were going to write the updated value on every
branch, you are going to add a huge amount of memory traffic (studies show
that typically a branch occurs once every 6-10 instructions on average). If
you are not going to write the counter back every time, when are you? How
are you going to associate the multiple counters you then must keep with
branches? There are a lot of problems you would have to solve for what is
probably a small gain (hint, current branch predictors do pretty well
already).

You really need to do some more basic study. You have depricated the MIT
on-line courses, calling them "for dummies". First of all, I doubt that any
MIT course is "for dummies". Second of all, while you may not be a dummy in
general, in this area, you are, simply because you lack the basic knowledge.
I admire your enthusiasm, but you need to channel it better so as not to
appear as foolish as you currently do to members of the group who have lots
of experience.

--
- Stephen Fuld
e-mail address disguised to prevent spam

Skybuck Flying · Aug 8, 2005

"Stephen Fuld" <s.fuld@PleaseRemove.att.net> wrote in message
news:q1LJe.78310$5N3.61411@bgtnsc05-news.ops.worldnet.att.net...

"Skybuck Flying" <nospam@hotmail.com> wrote in message
news:dd7sjv$1vs$1@news1.zwoll1.ov.home.nl...

"Ken Smith" <kensmith@green.rahul.net> wrote in message
news:dd7q2v$ns0$4@blue.rahul.net...
In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:
Hi,

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At
the
lower levels this would mean inserting counters into jump instructions
and
recording which branch is executed the most by incrementing or maybe
even
decrementing counters everytime the branch gets executed.
[...]
1. Is it common to use counters as described above to do branch
prediction
or is this something novel ?

No, its is not common. For that matter I've never heard of the idea
being
used. There are problems that the counter. It should not be part of
the
instruction because that would mean that code space has to be written
into
as the program runs. This will slow things down.

If it slows down the system then that would mean the bottleneck is in
the
bandwidth/speed of the bus/whatever used to read/write instructions
from/to
main memory.

You are mising an important point. There typically is no need to write to
the code space at all. Instruction caches have no write to memory
circuitry. To write the counters to memory you would have to add an
additional path, and if you were going to write the updated value on every
branch, you are going to add a huge amount of memory traffic (studies show
that typically a branch occurs once every 6-10 instructions on average).
If
you are not going to write the counter back every time, when are you? How
are you going to associate the multiple counters you then must keep with
branches? There are a lot of problems you would have to solve for what is
probably a small gain (hint, current branch predictors do pretty well
already).

You really need to do some more basic study. You have depricated the MIT
on-line courses, calling them "for dummies". First of all, I doubt that
any
MIT course is "for dummies". Second of all, while you may not be a dummy
in
general, in this area, you are, simply because you lack the basic
knowledge.
I admire your enthusiasm, but you need to channel it better so as not to
appear as foolish as you currently do to members of the group who have
lots
of experience.

I could depricate you as dummy

for not seeing the obvious solutions to
this idea

But I guess you didn't put much thought into it... but I have a little bit

However I do not see a need to share my wisedom to make dummies rich

In fact you didn't answer any of my questions

Bye,
Skybuck.

Jeroen · Aug 8, 2005

"Stephen Fuld" <s.fuld@PleaseRemove.att.net> schreef in bericht
news:q1LJe.78310$5N3.61411@bgtnsc05-news.ops.worldnet.att.net...

"Skybuck Flying" <nospam@hotmail.com> wrote in message
news:dd7sjv$1vs$1@news1.zwoll1.ov.home.nl...

"Ken Smith" <kensmith@green.rahul.net> wrote in message
news:dd7q2v$ns0$4@blue.rahul.net...
In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:
Hi,

<snip>

If it slows down the system then that would mean the bottleneck is in
the
bandwidth/speed of the bus/whatever used to read/write instructions
from/to
main memory.
You really need to do some more basic study. You have depricated the MIT
on-line courses, calling them "for dummies". First of all, I doubt that
any
MIT course is "for dummies". Second of all, while you may not be a dummy
in
general, in this area, you are, simply because you lack the basic
knowledge.
I admire your enthusiasm, but you need to channel it better so as not to
appear as foolish as you currently do to members of the group who have
lots
of experience.

Do a search for his name in Google newgroups to see some interesting threads

Colonel Forbin · Aug 8, 2005

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.

Actually this is sort of analogous to runtime profiling feedback to
compilers, which is used on a number of systems.

The problem is that it is often impossible to predict an event until
the outcome is observed (think quantum mechanics).

If we knew the outcome of each branch, we could simply eliminate the
branch in most cases and still have a reasonable approximation.

Inserting counters into production code typically comes at a great
price in terms of overhead.

We typically do this in development in order to optimize the most
costly sections of code for the most common outcomes, but trying
to put some sort of low level adaptive optimization metrics into
production code would consume more in overhead than the cost of
doing the task to begin with.

Skybuck Flying · Aug 8, 2005

"Colonel Forbin" <forbin@dev.nul> wrote in message
news:3ULJe.50331$B52.10566@tornado.ohiordc.rr.com...

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions
and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.

Actually this is sort of analogous to runtime profiling feedback to
compilers, which is used on a number of systems.

I used profilers, they use procedure calls to do their thing... and also
write to harddisk
making everything slllllllowwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww.

The problem is that it is often impossible to predict an event until
the outcome is observed (think quantum mechanics).

Often the same branch is executed. So prediction can help.

If we knew the outcome of each branch, we could simply eliminate the
branch in most cases and still have a reasonable approximation.

No you cannot, you have to execute the rare case sometimes as well.

Inserting counters into production code typically comes at a great
price in terms of overhead.

Define overhead

We typically do this in development in order to optimize the most
costly sections of code for the most common outcomes, but trying
to put some sort of low level adaptive optimization metrics into
production code would consume more in overhead than the cost of
doing the task to begin with.

Screw profiling, try it at lower levels and see what happens.

Instructions have to be loaded anyway... they sit in cache, counters sit
there too, not much overhead,

then after jumps counter increments and other logic has to be done which can
be done by additional gates so no real overhead there.

Also these updated instructions/jumps have to go back to main memory...
which in my oppinion isn't that much... but that's what I don't know
about... if the bandwidth is available I would say use it !

Ofcourse there are other problems to solve like compiler changes and some
things can't be pre-computed, etc... and in the end the counters might even
be counter-productive

Since the feedback on my questions is running low I have only one last thing
to say to you people:

Screw you and fuck you too lol if doesn't work lol...

Bye,
Skybuck

Colonel Forbin · Aug 8, 2005

In article <dd83ru$4ec$1@news1.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Screw you and fuck you too lol if doesn't work lol...

Thank you for that insightful contribution to our collective knowledge.

I am sure it will inspire many lurkers on these groups to come forth and
bestow you with their knowledge and to spend many hours of independent
research on their own time to answer your future questions.

The amusing thing is that we might have some useful conversations
peripherally related to issues you raised.

Aol to you, too.

Skybuck Flying · Aug 8, 2005

"Colonel Forbin" <forbin@dev.nul> wrote in message
news:EMMJe.52504$zY4.12162@tornado.ohiordc.rr.com...

In article <dd83ru$4ec$1@news1.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Screw you and fuck you too lol if doesn't work lol...

Thank you for that insightful contribution to our collective knowledge.

I am sure it will inspire many lurkers on these groups to come forth and
bestow you with their knowledge and to spend many hours of independent
research on their own time to answer your future questions.

The amusing thing is that we might have some useful conversations
peripherally related to issues you raised.

Aol to you, too.

Exactly motherfucker... so go spent all your time on discussing my filthy
wortless ideas and then... write nice reports about them how much they rule
or suck... meanwhile I have influenced the worrrrrrlldddddddd heheheh.

The fact of the matter is that I might be tooooooooooo lazzzzyyy to do it
myself... so go ahead and use it, abuse it, test it, analyze it etc......

But in the end I tooooooooooooooo will learn if it was usefull or not...

PPPPPPPPP******* hihihihihihihihihihihihihihihi

Bye,
Skybuck. ohhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh hehe.

Colonel Forbin · Aug 8, 2005

In article <dd881i$dh1$1@news5.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Exactly motherfucker... so go spent all your time on discussing my filthy
wortless ideas and then... write nice reports about them how much they rule
or suck... meanwhile I have influenced the worrrrrrlldddddddd heheheh.

The fact of the matter is that I might be tooooooooooo lazzzzyyy to do it
myself... so go ahead and use it, abuse it, test it, analyze it etc......

But in the end I tooooooooooooooo will learn if it was usefull or not...
PPPPPPPPP******* hihihihihihihihihihihihihihihi

Bye,
Skybuck. ohhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh hehe.

I think I was wrong about the crystal meth. This guy needs to take his
Haldol. His probation officer is worried...

Bob Monsen · Aug 8, 2005

Ken Smith wrote:

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Hi,

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.

[...]

1. Is it common to use counters as described above to do branch prediction
or is this something novel ?

No, its is not common. For that matter I've never heard of the idea being
used. There are problems that the counter. It should not be part of the
instruction because that would mean that code space has to be written into
as the program runs. This will slow things down.

GCC has a profiling mode that will allow one to capture this
information, for optimizing MIPS output (where it is quite useful, since
they have alternate opcodes depending on which branch is more likely).
As I recall (I've never used it) you run your code for a while, and then
get a dump of the counters, which you then feed back into GCC.

--
Regards,
Bob Monsen

If a little knowledge is dangerous, where is the man who has
so much as to be out of danger?
Thomas Henry Huxley, 1877

Skybuck Flying · Aug 8, 2005

"Colonel Forbin" <forbin@dev.nul> wrote in message
news:3HNJe.52896$zY4.35348@tornado.ohiordc.rr.com...

In article <dd881i$dh1$1@news5.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Exactly motherfucker... so go spent all your time on discussing my filthy
wortless ideas and then... write nice reports about them how much they
rule
or suck... meanwhile I have influenced the worrrrrrlldddddddd heheheh.

The fact of the matter is that I might be tooooooooooo lazzzzyyy to do it
myself... so go ahead and use it, abuse it, test it, analyze it etc......

But in the end I tooooooooooooooo will learn if it was usefull or not...
PPPPPPPPP******* hihihihihihihihihihihihihihihi

Bye,
Skybuck. ohhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
hehe.

I think I was wrong about the crystal meth. This guy needs to take his
Haldol. His probation officer is worried...

You speak in difficult medication terms... I need a manual...

Skybuck Flying · Aug 8, 2005

"Bob Monsen" <rcsurname@comcast.net> wrote in message
news:dfydncj3QpTKOmrfRVn-oA@comcast.com...

Ken Smith wrote:
In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Hi,

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions
and
recording which branch is executed the most by incrementing or maybe
even
decrementing counters everytime the branch gets executed.

[...]

1. Is it common to use counters as described above to do branch
prediction
or is this something novel ?

No, its is not common. For that matter I've never heard of the idea
being
used. There are problems that the counter. It should not be part of
the
instruction because that would mean that code space has to be written
into
as the program runs. This will slow things down.

GCC has a profiling mode that will allow one to capture this
information, for optimizing MIPS output (where it is quite useful, since
they have alternate opcodes depending on which branch is more likely).
As I recall (I've never used it) you run your code for a while, and then
get a dump of the counters, which you then feed back into GCC.

Ohhhhh that's really fancy pancy.

--
Regards,
Bob Monsen

If a little knowledge is dangerous, where is the man who has
so much as to be out of danger?
Thomas Henry Huxley, 1877

Ben Bradley · Aug 8, 2005

In comp.arch,sci.electronics.design, On Mon, 08 Aug 2005 16:38:23 GMT,
forbin@dev.nul (Colonel Forbin) wrote:

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.

Actually this is sort of analogous to runtime profiling feedback to
compilers, which is used on a number of systems.

The problem is that it is often impossible to predict an event until
the outcome is observed (think quantum mechanics).

If we knew the outcome of each branch, we could simply eliminate the
branch in most cases and still have a reasonable approximation.

Inserting counters into production code typically comes at a great
price in terms of overhead.

We typically do this in development in order to optimize the most
costly sections of code for the most common outcomes, but trying
to put some sort of low level adaptive optimization metrics into
production code would consume more in overhead than the cost of
doing the task to begin with.

I recall reading of a processor that does pipelining through
branches (I don't remember which, but it might be a TI DSP) by using
several execution units. One executes the instruction(s) where the
branch is taken, the other executes the code where the branch is not
taken. When the branch is decided, the results from the corresponding
execution unit are taken and execution continues from there, and the
other results are 'thrown away' unused. This method takes a lot more
silicon space for the processor, but results in a speed increase that
apparently can't be done otherwise.

-----
http://www.mindspring.com/~benbradley

Dan Koren · Aug 8, 2005

"Ben Bradley" <ben_nospam_bradley@frontiernet.net> wrote in message
news:jicff1hrkkkucio1ljkdfugtrm4sfkuvld@4ax.com...

In comp.arch,sci.electronics.design, On Mon, 08 Aug 2005 16:38:23 GMT,
forbin@dev.nul (Colonel Forbin) wrote:

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions
and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.

Actually this is sort of analogous to runtime profiling feedback to
compilers, which is used on a number of systems.

The problem is that it is often impossible to predict an event until
the outcome is observed (think quantum mechanics).

If we knew the outcome of each branch, we could simply eliminate the
branch in most cases and still have a reasonable approximation.

Inserting counters into production code typically comes at a great
price in terms of overhead.

We typically do this in development in order to optimize the most
costly sections of code for the most common outcomes, but trying
to put some sort of low level adaptive optimization metrics into
production code would consume more in overhead than the cost of
doing the task to begin with.

I recall reading of a processor that does pipelining through
branches (I don't remember which, but it might be a TI DSP) by using
several execution units. One executes the instruction(s) where the
branch is taken, the other executes the code where the branch is not
taken. When the branch is decided, the results from the corresponding
execution unit are taken and execution continues from there, and the
other results are 'thrown away' unused. This method takes a lot more
silicon space for the processor, but results in a speed increase that
apparently can't be done otherwise.

This is usually referred to as "speculative execution".

dk

Dan Koren · Aug 8, 2005

"Skybuck Flying" <nospam@hotmail.com> wrote in message
news:dd89al$p17$1@news5.zwoll1.ov.home.nl...

"Colonel Forbin" <forbin@dev.nul> wrote in message
news:3HNJe.52896$zY4.35348@tornado.ohiordc.rr.com...
In article <dd881i$dh1$1@news5.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Exactly motherfucker... so go spent all your time on discussing my
filthy
wortless ideas and then... write nice reports about them how much they
rule
or suck... meanwhile I have influenced the worrrrrrlldddddddd heheheh.

The fact of the matter is that I might be tooooooooooo lazzzzyyy to do
it
myself... so go ahead and use it, abuse it, test it, analyze it
etc......

But in the end I tooooooooooooooo will learn if it was usefull or not...
PPPPPPPPP******* hihihihihihihihihihihihihihihi

Bye,
Skybuck. ohhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
hehe.

I think I was wrong about the crystal meth. This guy needs to take his
Haldol. His probation officer is worried...

You speak in difficult medication terms... I need a manual...

You should try an automatic first.

Easier to control when one doesn't
know what one is doing.

dk

Ken Smith · Aug 9, 2005

In article <dd7sjv$1vs$1@news1.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:
[... me ..]

No, its is not common. For that matter I've never heard of the idea being
used. There are problems that the counter. It should not be part of the
instruction because that would mean that code space has to be written into
as the program runs. This will slow things down.

If it slows down the system then that would mean the bottleneck is in the
bandwidth/speed of the bus/whatever used to read/write instructions from/to
main memory.

I may not be the bus its self that is the limitting factor. The memory
device that holds the code may have a longer write time than read time.
This may be a physical limitation of the storage technology. To give an
extreme example, if the code is stored on a drum memory, this will reduce
the speed to one instruction per rotation.

Also, a counter is by its nature a slow operation because the highest bit
depends on all the bits below it. There are ways around this.

How much memory bandwidth/speed is available to transfer instruction from/to
main memory where program code is located ?
Never enough.
and how much of that
bandwidth is generally used for application or maybe even games ?
All of it.

In other words... is bandwidth from main memory to cpu and back, truely the
limiting factor ?

No, it is but one limiting factor among many.

BTW: If the code in question is a Booth's divide routine, the branch
prediction you suggest may actually slow the operation down instead of
speeding it up.

--
--
kensmith@rahul.net forging knowledge

Bob Monsen · Aug 9, 2005

Ken Smith wrote:

In article <dfydncj3QpTKOmrfRVn-oA@comcast.com>,
Bob Monsen <rcsurname@comcast.net> wrote:
[...]

GCC has a profiling mode that will allow one to capture this
information, for optimizing MIPS output (where it is quite useful, since
they have alternate opcodes depending on which branch is more likely).
As I recall (I've never used it) you run your code for a while, and then
get a dump of the counters, which you then feed back into GCC.

Profiling can be useful on nearly any modern processor if you can't see by
inspection which way the conditionals land. If you are hand tuning some
ASM code for the maximum speed, knowing which jumps are likely helps to
minimize the out of line jumps.

BTW: You can pulse a port bit high if you jump and another if you don't
and use a frequency counter to find the odds. This works well on
microcontrollers.

When I am coding assembler, I find it's almost always obvious which way
is more likely. Unfortunately, compilers don't usually understand what
you are trying to do all that well, and thus often get it wrong. The
profile to feedback scheme might be useful because the compiler can then
do the proper optimization by itself. However, any time I care that
much, I profile the code and rewrite any expensive chunks in assembler.

--
Regards,
Bob Monsen

If a little knowledge is dangerous, where is the man who has
so much as to be out of danger?
Thomas Henry Huxley, 1877

Ken Smith · Aug 9, 2005

In article <q1LJe.78310$5N3.61411@bgtnsc05-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
[...]

You are mising an important point. There typically is no need to write to
the code space at all. Instruction caches have no write to memory
circuitry.

You are jumping to the wild assumption that the machine has caches at all.
The CPU could be tightly coupled to the system RAM. We could make life
even more interesting for the OP by making it one of those old bit-slicy
things where the code comes from a few dozen fast ROM chips and the data
is in static RAM.

To write the counters to memory you would have to add an
additional path, and if you were going to write the updated value on every
branch, you are going to add a huge amount of memory traffic

We could always make a special code store with a counter for each location
built into it. That way the extra transfer would not be needed at the
mere cost of making it about 10 times as expensive.

[...]

probably a small gain (hint, current branch predictors do pretty well
already).
... and CPUs without them are fairly fast anyway.

--
--
kensmith@rahul.net forging knowledge

Branch prediction using counters.

Skybuck Flying

Guest

Ken Smith

Guest

Skybuck Flying

Guest

Stephen Fuld

Guest

Skybuck Flying

Guest

Jeroen

Guest

Colonel Forbin

Guest

Skybuck Flying

Guest

Colonel Forbin

Guest

Skybuck Flying

Guest

Colonel Forbin

Guest

Bob Monsen

Guest

Skybuck Flying

Guest

Skybuck Flying

Guest

Ben Bradley

Guest

Dan Koren

Guest

Dan Koren

Guest

Ken Smith

Guest

Bob Monsen

Guest

Ken Smith

Guest

Log in

Welcome to EDABoard.com

Sponsor