EDK : FSL macros defined by Xilinx are wrong

Gero,

OK, then I am correct, you not only need a good system synchronous clock
to each FPGA, but you also must use clock forwarding to send/receive
data between devices.

Austin

Geronimo Stempovski wrote:
Okay, let's be more precise: The clock frequencies I'd like to distribute
are in the range of 180 - 300 MHz, i.e. it is a challenging task. The signal
busses between the FPGAs should carry signals in that range, too. Data is
exchanged synchronously, so there is not much room for synchronization I
think...!?

Gero
 
"Symon" <symon_brewer@hotmail.com> schrieb im Newsbeitrag
news:f7805h$1sk$1@aioe.org...
You might also consider source synchronous busses to get data between
FPGAs in addition to the star topology.
Sounds interesting. What do you mean by that? Could you please go a little
bit more into details? I heard about "source synchronous busses" a while ago
but unfortunately I don't know what it is... sorry!

Thanks.

Gero
 
"Geronimo Stempovski" <geronimo.stempovski@arcor.de> wrote in message
news:46978d8c$0$20999$9b4e6d93@newsspool1.arcor-online.net...
"Symon" <symon_brewer@hotmail.com> schrieb im Newsbeitrag
news:f7805h$1sk$1@aioe.org...

You might also consider source synchronous busses to get data between
FPGAs in addition to the star topology.


Sounds interesting. What do you mean by that? Could you please go a little
bit more into details? I heard about "source synchronous busses" a while
ago but unfortunately I don't know what it is... sorry!

Thanks.

Gero
Hi Gero,
The source of the data provides a clock along with the data. You should be
able to find stuff on the FPGA manufacturers websites to help you along...
HTH, Syms.
 
"austin" <austin@xilinx.com> schrieb im Newsbeitrag
news:f78203$f1e2@cnn.xilinx.com...
Gero,

OK, then I am correct, you not only need a good system synchronous clock
to each FPGA, but you also must use clock forwarding to send/receive
data between devices.
I'm sorry, Austin! I am not so familiar with the topic. What do you mean by
"system synchronous clock"? Isn't that just the description for "identical
clock to every FPGA" or am I wrong? And how should clock forwarding be done?
If the data arrives at every FPGA with its own clock, but the FPGA is
clocked by another skewed (or even jittered) clock, how can synchronization
be achieved, anyway? I appreciate your comment. Thanks in advance.

Gero
 
Sounds interesting. What do you mean by that? Could you please go a
little
bit more into details? I heard about "source synchronous busses" a while
ago
but unfortunately I don't know what it is... sorry!
Wikipedia has a good article :

http://en.wikipedia.org/wiki/Source-synchronous

Basically when A sends stuff to B :

- A and B can be clocked by the same global clock (the good old way, with
clock skew etc)
- B can generate a clock and send it to A (then you get problems if
traces are long)
- A can generate a clock and send it to B

Last case means, A (source) can send a clock that is perfectly
synchronized with the sent data. Hence, if clock and data go through the
same trace length, B receives well synchronized clock and data, so B's
flip flops are happy. Of course, A's clock will not have the same phase as
B's internal clock, so B needs some way of putting them back in phase
(FIFO, etc).

One could argue that stuff like SATA or Ethernet are source synchronous,
too, even though there is no separate clock, as it is embedded in the data.
It would not be possible for the controller to clock the harddisk when
the frequency is so high that the length of the cable contains several
different bits propagating along the transmission line !
 
Gero,

As noted by others, you need a high quality clock to each FPGA. They do
not need to be phase matched, or equal length. Just a good very low
jitter clock to each one from a clock distribution device (like the ICT
quad LVDS clock buffer + cleanup PLL -- a good choice I use -- I think
it is a 8745?):
http://www1.idt.com/products/files/18378197/ics8745b.pdf?CFID=5561739&CFTOKEN=91108912

or equivalent. The nice thing about some of these parts is that you may
use a lower frequency less expensive oscillator, and this part will
multiply the frequency, and remove jitter, too (as opposed to buying a
much more expensive higher frequency oscillator).


This then is the basis for all timing in each FPGA.

To get from one FPGA to another, or from one FPGA to an ASIC/ASSP:

http://www.xilinx.com/publications/prod_mktg/pn0010778.pdf

You would use a source-synchronous interface. This one where you send
the data, and a clock from one, to the other. Since the data jitter
will be exaclty what the clock jitter is (they came through the same
paths, close to each other), system jitter, and clock jitter are almost
able to be ignored. The receive side uses the forwarded clock to
register the incoming data. Often the DCM is used to phase shift the
sample point to exactly the center of the "eye" so you have best timing
margin. The data paths with clocks must all be delay matched (all
signals must leave and arrive in sync with each other). We have tables
of the delay from the silicon, to the pad, and you need to have your pcb
designer take this into account.

There is also a source synchronous IO block built into V4, and you can
do things like use one forwarded clock to frame 4 bits on each wire (the
forwarded clock is running at 1/4 the system clock). The SSIO block
takes care of the serialization and parallel conversion of each IO.

http://direct.xilinx.com/bvdocs/userguides/ug070.pdf

Chapter 8.

Also look at the SPI POS 4.2 IP core, and how it is specified for inter
chip communications (this is an industry standard for wide, fast, DDR
buses).

So, in review: system synchronous (one clock to all chips) for each
chip to use for all of its internals, and generating forwarded clocks;
AND source synchronous (a forwarded clock) for each data bus between
devices.

Searching on "source synchronous" and "spi pos 4.2" and reading the
user's guides on the SSIO blocks will get you up to speed.

Austin
 
I just did some statistical analysis of a bunch of existing Spartan 3
bit streams. If you treat them as bytes, and histogram the byte codes,
there's some impressive stats, with code 0x00 of course dominant, then
0xFF, and the next 16 most common codes huge, tapering off pretty
hard. That makes sense, since LUT bits are usually zero, so simple
codes like 0x01...0x80 and simple 2-bit combos are most common. Block
ram is usually 0 in our designs, too. So a fixed dictionary should
work pretty well.

So it looks like a byte stream would do, with byte codes that say
stuff like...

00000000 end of file
001nnnnn make N zero bytes, N=1 to 31
010nnnnn make N 0xFF bytes, ditto
Are you reinventing Huffman coding ?

http://en.wikipedia.org/wiki/Huffman_coding

If you have a CPU somewhere to do the decompression, check miniLZO.
Decompression is fast and the code is very small !
Plus, it's an open source library that works, and you don't have to code
it. be lazy...

http://www.oberhumer.com/opensource/lzo/
 
John Larkin wrote:

On Sun, 15 Jul 2007 16:17:33 +1200, Jim Granville
Here's a byte histogram of a typical Spartan 3 config file. Most of
the config bits are zero. I read one Xilinx appnote about the rad
hardness of sram-based FPGAs, and they got the upset rate calculation
down by noting that most config bits actually don't matter!
On this one, that's appx 70% zeroes.
Could be interesting to see the 'RunCount' values - maybe something like
the 2 smallest, 2 highest, and 7 most common run lengths, in 11 more
columns ? to help design the best algorithm.
I also notice that a Single HI bit, is very common, so
even if that does not repeat, a specific compression coverage for those
would cover 38.5% of the non zero bytes ?

-jg



80,413 NONZERO BYTES


============== SORTED =================

NNN Byte Hex Frequency

0 0 0 181,476
1 255 FF 5,809
2 1 1 5,390
3 128 80 3,991
4 4 4 3,867
5 8 8 3,815
6 16 10 3,785
7 32 20 3,653
8 2 2 3,364
9 48 30 3,282
10 64 40 3,139
11 12 C 1,941
12 3 3 1,587
13 192 C0 1,472
14 36 24 1,038
15 112 70 1,009
16 20 14 936
17 10 A 824
18 144 90 706
19 40 28 699
20 24 18 678
21 9 9 631
22 80 50 626
23 160 A0 602
24 5 5 573
25 96 60 571
26 6 6 558
27 14 E 558
28 56 38 503
29 18 12 498
30 176 B0 487
31 19 13 480
32 60 3C 461
33 28 1C 454
34 129 81 449
35 153 99 449
36 136 88 441
37 13 D 411
38 34 22 408
39 33 21 405
40 195 C3 399
41 68 44 389
42 224 E0 383
43 204 CC 381
44 44 2C 366
45 130 82 358
46 170 AA 354
47 72 48 351
48 7 7 345
49 132 84 345
50 17 11 324
51 52 34 312
52 140 8C 308
53 65 41 299
54 240 F0 294
55 200 C8 290
56 120 78 279
57 66 42 274
58 29 1D 271
59 131 83 267
60 139 8B 242
61 49 31 236
62 30 1E 218
63 254 FE 218
64 157 9D 217
65 58 3A 216
66 45 2D 212
67 196 C4 207
68 138 8A 194
69 199 C7 191
70 51 33 191
71 227 E3 188
72 74 4A 188
73 26 1A 182
74 168 A8 181
75 50 32 180
76 37 25 173
77 122 7A 173
78 152 98 173
79 187 BB 172
80 85 55 166
81 252 FC 164
82 184 B8 161
83 156 9C 152
84 165 A5 146
85 22 16 146
86 193 C1 144
87 238 EE 139
88 42 2A 139
89 148 94 139
90 104 68 137
91 76 4C 135
92 88 58 134
93 137 89 133
94 59 3B 132
95 92 5C 130
96 11 B 129
97 84 54 122
98 102 66 119
99 208 D0 118
100 124 7C 115
101 71 47 114
102 15 F 112
103 250 FA 108
104 35 23 106
105 62 3E 103
106 41 29 101
107 251 FB 98
108 25 19 96
109 188 BC 96
110 21 15 95
111 54 36 93
112 61 3D 92
113 207 CF 91
114 116 74 89
115 55 37 88
116 143 8F 88
117 57 39 86
118 126 7E 85
119 90 5A 83
120 236 EC 83
121 226 E2 80
122 100 64 80
123 211 D3 77
124 146 92 77
125 221 DD 76
126 241 F1 76
127 82 52 75
128 203 CB 72
129 145 91 72
130 243 F3 70
131 194 C2 69
132 81 51 66
133 142 8E 66
134 46 2E 63
135 180 B4 63
136 67 43 61
137 239 EF 60
138 27 1B 60
139 70 46 60
140 253 FD 59
141 97 61 59
142 154 9A 58
143 186 BA 57
144 197 C5 55
145 134 86 55
146 53 35 54
147 108 6C 54
148 69 45 53
149 191 BF 52
150 99 63 52
151 38 26 52
152 158 9E 52
153 98 62 52
154 172 AC 52
155 247 F7 50
156 179 B3 49
157 190 BE 48
158 113 71 47
159 248 F8 47
160 141 8D 46
161 175 AF 45
162 115 73 45
163 201 C9 45
164 225 E1 45
165 78 4E 45
166 114 72 44
167 162 A2 44
168 245 F5 43
169 189 BD 43
170 73 49 43
171 228 E4 43
172 135 87 42
173 133 85 42
174 216 D8 42
175 223 DF 41
176 202 CA 39
177 161 A1 38
178 87 57 37
179 185 B9 37
180 94 5E 36
181 23 17 34
182 31 1F 34
183 147 93 34
184 118 76 34
185 163 A3 33
186 77 4D 32
187 125 7D 32
188 174 AE 32
189 178 B2 32
190 86 56 31
191 106 6A 31
192 149 95 30
193 234 EA 30
194 198 C6 29
195 206 CE 29
196 242 F2 29
197 232 E8 28
198 164 A4 27
199 220 DC 27
200 244 F4 26
201 39 27 25
202 93 5D 24
203 177 B1 24
204 121 79 23
205 150 96 23
206 105 69 22
207 83 53 21
208 89 59 20
209 182 B6 20
210 110 6E 20
211 119 77 18
212 95 5F 18
213 63 3F 18
214 181 B5 17
215 117 75 16
216 212 D4 16
217 219 DB 15
218 101 65 15
219 230 E6 15
220 218 DA 15
221 43 2B 14
222 205 CD 14
223 209 D1 14
224 171 AB 13
225 75 4B 12
226 235 EB 11
227 169 A9 11
228 222 DE 11
229 79 4F 10
230 127 7F 10
231 123 7B 10
232 151 97 8
233 215 D7 8
234 47 2F 8
235 109 6D 8
236 173 AD 8
237 214 D6 8
238 166 A6 8
239 167 A7 7
240 213 D5 7
241 103 67 6
242 217 D9 6
243 233 E9 5
244 246 F6 5
245 111 6F 4
246 229 E5 4
247 210 D2 4
248 155 9B 3
249 91 5B 3
250 249 F9 3
251 231 E7 2
252 183 B7 2
253 159 9F 2
254 237 ED 1
255 107 6B 1
 
Jim Granville <no.spam@designtools.maps.co.nz> wrote:

John Larkin wrote:

On Sun, 15 Jul 2007 16:17:33 +1200, Jim Granville
Here's a byte histogram of a typical Spartan 3 config file. Most of
the config bits are zero. I read one Xilinx appnote about the rad
hardness of sram-based FPGAs, and they got the upset rate calculation
down by noting that most config bits actually don't matter!

On this one, that's appx 70% zeroes.
Could be interesting to see the 'RunCount' values - maybe something like
the 2 smallest, 2 highest, and 7 most common run lengths, in 11 more
columns ? to help design the best algorithm.
I also notice that a Single HI bit, is very common, so
even if that does not repeat, a specific compression coverage for those
would cover 38.5% of the non zero bytes ?
Another optimisation may come from compressing the frame headers and
deleting unused areas. Xilinx configuration data (which also includes
commands BTW) is organised in frames with a CRC, preamble, location
and postamble. There are appnotes on partial (re)configuration from
Xilinx explaining these in more detail. The initialisation frames of
blockrams may be omitted completely if they are not initialised which
saves several kB.

--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl
 
Austin, thank you very much for your detailed answer!

It became quite clear for me now. Just one more question: How do I
practically generate the forwarded clocks? Just a toggling 1-bit signal
(1..0..1..0..)? Do I have to use dedicated clock I/Os? I think so...!?

Regards, Gero
 
On Jul 11, 7:26 pm, kumar <mailsati...@gmail.com> wrote:
Hello every one,
I donot know if it is the right
forum to post this issue here but thought that people of this group
will have the knowledge of JTAG.
We have designed a board which
contains an ALTERA fpga and an EPROM(EPC2LC20).The design is done with
respect to some existing reference schematics.We have populated our
PCB so as to get the power to all the components and then we have
placed the components making up the JTAG circuitary along with the
FPGA and EPROM.
,Altera's Byte blaster is being used
which is working fine. have checked all the circuitary which seems to
be alright but we keep getting the error"Unable to access the JTAG
chain".We have also tried the steps in the ALTERAs trouble shooters.
Do we need to proceed in a different
way while programming the chip/EPROM for the first time.And also what
is the passive serial mode in the Quartus tool for programming. Any
help would be highly appreciated.Thanks in advance.
comp.arch.fpga would be a more appropriate group.

I don't understand "seems to be alright" and "unable to access" in the
same sentence. Sounds like your JTAG chain is broken.

Have you followed Altera's guidelines? <http://www.altera.com/
literature/hb/cfg/cfg_cf52008.pdf>

G.
 
devices schrieb:

I chose the simple following example (perhaps
too simple or too specialized). Here's a flip
flop with an enable pin:

process (clk)
begin
if (clk'event and clk='1') then

if (ce = '1') then
q <= d;
end if;

end if;
end process;
For most target libraries this will result in a D-Flipflop and a
multiplexer controlled by ce. The multiplexer selected d or q as input
to the D-FF.


- FIGURE B

+---------------------+
| _ |
| | \ +--------+ |
+--|0 | | | |
| |---|In Out|--+-> q
d--|1 | | |
| / | FF |
- +-|> |
| | +--------+
| |
ce clk
That is correct.


1) If Figure B holds true, does it ALWAYS
go like this?
Yes. Because you did not model a latch.

A latch is a storage element. A D-latch has an enable signal and a data
input.

process(en,data)
begin
if (en='1') then
latch<=data;
end if;
end process;


The process you have written first could be changed into:

process (clk)
begin
if (clk'event and clk='1') then
q<=q_next;
end if;
end process;

process(ce,d,q)
begin
if (ce = '1') then
q_next<=d;
else q_next<=q;
end process;

This describes the same behavior. As you can see, there is always a
value assigned to q_next. This means: no storage needed.


Ralf
 
Ralf Hildebrandt wrote:
devices schrieb:

I chose the simple following example (perhaps
too simple or too specialized). Here's a flip
flop with an enable pin:

process (clk)
begin
if (clk'event and clk='1') then

if (ce = '1') then
q <= d;
end if;

end if;
end process;

For most target libraries this will result in a D-Flipflop and a
multiplexer controlled by ce. The multiplexer selected d or q as input
to the D-FF.


- FIGURE B

+---------------------+
| _ |
| | \ +--------+ |
+--|0 | | | |
| |---|In Out|--+-> q
d--|1 | | |
| / | FF |
- +-|> |
| | +--------+
| |
ce clk

That is correct.


1) If Figure B holds true, does it ALWAYS
go like this?

Yes. Because you did not model a latch.

A latch is a storage element. A D-latch has an enable signal and a data
input.

process(en,data)
begin
if (en='1') then
latch<=data;
end if;
end process;


The process you have written first could be changed into:

process (clk)
begin
if (clk'event and clk='1') then
q<=q_next;
end if;
end process;

process(ce,d,q)
begin
if (ce = '1') then
q_next<=d;
else q_next<=q;
end process;

This describes the same behavior. As you can see, there is always a
value assigned to q_next. This means: no storage needed.
Yes, and thanks for fixing the font.

The combined structure could
also be viewed as a latch
with a synchronous delay
rather than a wire in the feedback loop.

This example is also the limit case
in the one vs two process debate.

-- Mike Treseler
 
"Ralf Hildebrandt" <Ralf-Hildebrandt@gmx.de> wrote in message
news:5g6nfuF3d3hceU1@mid.individual.net...
devices schrieb:

I chose the simple following example (perhaps
too simple or too specialized). Here's a flip
flop with an enable pin:

process (clk)
begin
if (clk'event and clk='1') then

if (ce = '1') then
q <= d;
end if;

end if;
end process;

For most target libraries this will result in a D-Flipflop and a
multiplexer controlled by ce. The multiplexer selected d or q as input
to the D-FF.


- FIGURE B

+---------------------+
| _ |
| | \ +--------+ |
+--|0 | | | |
| |---|In Out|--+-> q
d--|1 | | |
| / | FF |
- +-|> |
| | +--------+
| |
ce clk

That is correct.


1) If Figure B holds true, does it ALWAYS
go like this?

Yes. Because you did not model a latch.

A latch is a storage element. A D-latch has an enable signal and a data
input.

process(en,data)
begin
if (en='1') then
latch<=data;
end if;
end process;


The process you have written first could be changed into:

process (clk)
begin
if (clk'event and clk='1') then
q<=q_next;
end if;
end process;

process(ce,d,q)
begin
if (ce = '1') then
q_next<=d;
else q_next<=q;
end process;

This describes the same behavior. As you can see, there is always a
value assigned to q_next. This means: no storage needed.
Thanks for your reply. In any case, avoiding the double
branch is what i was looking for. I wanted to omit the
"else" and safely hand data to a flip flop.
 
On Jul 18, 9:47 am, Ralf Hildebrandt <Ralf-Hildebra...@gmx.de> wrote:
devices schrieb:

I chose the simple following example (perhaps
too simple or too specialized). Here's a flip
flop with an enable pin:
process (clk)
begin
if (clk'event and clk='1') then

if (ce = '1') then
q <= d;
end if;

end if;
end process;

For most target libraries this will result in a D-Flipflop and a
multiplexer controlled by ce. The multiplexer selected d or q as input
to the D-FF.

- FIGURE B

+---------------------+
| _ |
| | \ +--------+ |
+--|0 | | | |
| |---|In Out|--+-> q
d--|1 | | |
| / | FF |
- +-|> |
| | +--------+
| |
ce clk

That is correct.

1) If Figure B holds true, does it ALWAYS
go like this?

Yes. Because you did not model a latch.

A latch is a storage element. A D-latch has an enable signal and a data
input.

process(en,data)
begin
if (en='1') then
latch<=data;
end if;
end process;

The process you have written first could be changed into:

process (clk)
begin
if (clk'event and clk='1') then
q<=q_next;
end if;
end process;

process(ce,d,q)
begin
if (ce = '1') then
q_next<=d;
else q_next<=q;
end process;

This describes the same behavior. As you can see, there is always a
value assigned to q_next. This means: no storage needed.

Ralf
Actually, most FPGA synthesizers will implement the register and mux
as a clock enabled register (which incorporates the mux and the
register as you stated, but is a primitive that does not consume a LUT
for the mux).

I really wish the advise "don't use an if without an else" in
combinatorial logic had never been given. There are a couple of much
more effective and verifiable ways to eliminate the possibility of
inferring a latch:

First, use clocked processes for everything, and include the logic in
the clocked process. No combinatorial processes, no latches. There are
lots of ways to avoid using combinatorial processes in virtually any
circumstance (except when an entity has a completely combinatorial
path from input to output), and using variables helps a lot.

Second, if you HAVE to use a combinatorial process, assign a default
value to every signal driven by the process, right up front in the
process, before anything else happens. It is MUCH easier to see and
verify that every driven signal is in that default list, than it is to
verify that every if contains an else AND that anything assigned in
the if is also assigned in the else.

Example:

process(ce,d,q)
begin
q_next <= q; -- default assignment
if ce = '1' then
q_next <= d;
end if;
end process;

Andy
 
On Jul 19, 12:36 pm, Ben Jackson <b...@ben.com> wrote:
On 2007-07-19, Xilinx User <anonym...@net.com> wrote:

Is it including `i' in * and then complaining about it?
Presumably it is including 'memory' in * and then complaining about
it. An entire memory/array is not legal in an event control, so there
is justification for complaining.

The LRM does not specify how this issue is supposed to be handled.
There was plenty of discussion in committee about it, but nothing was
ever done about it. Many tools do appear to handle it the way you
would want for combinational logic, by making the block sensitive to
the entire memory.
 
jjohnson@cs.ucf.edu wrote:
That appears to be related to the number of processors inside one box.
If a single CPU is just hyperthreaded, the processor takes care of
instruction distribution unrelated to a variable like number_of_cpus,
right?
No. Hyperthreading means that the hardware is only virtually doubled.
The CPU maintains the state and the register set of two independent
threads and tries to utilize all its function units. If one thread has
to wait for data from the memory some instructions of the other thread
can be issued to the function units. Likewise, if one thread spends its
time in the FPU, the other thread can use the remaining function units.
If both threads execute the same type of instructions a hyperthreaded
CPU rarely has an advantage.

Running on a hyperthreaded CPU the operating system sees two cores and
has to schedule its workload like there were two physical cores to gain
any benefit. If your software only has one thread hyperthreading like
multicores won't speed it up.

And if there are two single-core processors in a box, obviously
it will utilize "number_of_cpus=2" as expected. Does anyone know how
that works with dual-core CPUs? i.e, if I have two quad-core CPUs in
one box, will setting "number_of_cpus=7" make optimal use of 7 cores
while leaving me one to work in a shell or window?
I don't know how Quartus makes use of the available CPUs but basically
as seen from software there is no difference between two single cores
and one dual-core.

In 32-bit Windows, is that 3GB limit for everything running at one
time? i.e., is 4GB a waste on a Windows machine? Can it run multiple
2GB processes and go beyond 3 or 4GB? Or is 3GB an absolute O/S limit,
and 2GB an absolute process limit in Windows?
3 GB is a practical limit because the PCI bus and other memory-mapped
devices typically occupy some hundred megabytes of address space. So you
can't use this memory space to access RAM. There are techniques to map
memory to other address regions beyond the 4 GB border but you need
special chipsets and proper operating system support.

Andreas
 
On Jul 28, 10:51 pm, Andreas Hofmann <ahn...@gmx.net> wrote:

3 GB is a practical limit because the PCI bus and other memory-mapped
devices typically occupy some hundred megabytes of address space. So you
can't use this memory space to access RAM.
These are usually not mapped into the address space of a user process.

Kolja Sulimma
 
3 GB is a practical limit because the PCI bus and other memory-mapped
devices typically occupy some hundred megabytes of address space. So you
can't use this memory space to access RAM.

These are usually not mapped into the address space of a user process.
Nope, but the (32-bit) kernel needs to see the mmap'ed peripherals + the
userspace RAM if implementation of stuff like file reading, etc is to be
efficient (without juggling with pages)...
 
PFC wrote:
3 GB is a practical limit because the PCI bus and other memory-mapped
devices typically occupy some hundred megabytes of address space. So you
can't use this memory space to access RAM.

These are usually not mapped into the address space of a user process.

Nope, but the (32-bit) kernel needs to see the mmap'ed peripherals +
the userspace RAM if implementation of stuff like file reading, etc is
to be efficient (without juggling with pages)...
Anandtech ran an article which does quite a good job in explaining the 2
and 3 GB barriers.
http://www.anandtech.com/gadgets/showdoc.aspx?i=3034
 

Welcome to EDABoard.com

Sponsor

Back
Top