For years, microcomputer programming has been dominated by small amounts
of memory. On the PC, for example, a lot of programming was oriented toward
working within or around the 640K barrier. Now in more and more places,
these barriers are lifting. With the widespread use of Windows Enhanced
mode, the standard microcomputer has megabytes of readily accessible memory,
a larger pool of virtual memory, and even a flat-memory model.
Yet, in the same way that a citizen of the former Soviet Union might still
hang on to a Lenin pin, programmers still cling to the old ways. In Windows
programming circles, for example, the little discussion of performance one
finds seems to be dominated by memory-management considerations that don't
make much sense anymore. Windows Enhanced mode, OS/2 2.0, and Win32/NT are
all demand-paged virtual-memory systems. Yet the majority of Windows programming
books are still filled with dire predictions of what will happen if you
don't keep your segments discardable, movable, and small.
The PC world isn't 640K anymore, and in more and more places it's not chopped
into 64K pieces anymore, either. It's really time to wake up, smell the
coffee, and throw out all the old baggage.
Oddly enough, though, we don't really need any "new ideas." In
fact, with the increasing popularity of flat-memory models and demand-paged
virtual memory, it's time to dust off your old college textbooks. Why? Because
PC systems are finally starting to resemble the way we were taught computers
are supposed to work!
Except that, rather than dusting off your old textbooks, I would suggest
picking up a new one.
About halfway through writing an article on demand-paged virtual memory
in Windows Enhanced mode for Microsoft Systems Journal, I realized that
if the article was going to have any substance at all, it would have to
discuss (or at least be based on some awareness of) virtual memory in general,
not just the way it happens to be implemented in one mode of one version
of one Microsoft product.
So I started going through my book collection, looking for background reading
on demand-paged virtual memory. Many of the books Ray Duncan and I have
reviewed in "Programmer's Bookshelf"--Dewar and Smosna's Microprocessors:
A Programmer's View (reviewed in DDJ, September 1990), Hennessy and
Patterson's Computer Architecture: A Quantitative Approach (October
1990), and Tanenbaum's Modern Operating Systems (May and June 1992)--discuss
virtual memory. There's a ton of literature available on this subject.
The nicest discussion of the subject, though, and the most useful to a programmer
rather than a chip designer, was the 25-page section on virtual memory I
found in a book from 1990 that we somehow haven't reviewed here yet: Harold
Stone's High-Performance Computer Architecture. It may be a little
strange to examine this book now, particularly since Hennessy and Patterson's
book seems to have blown away everything else in this field. But I was surprised
to find Stone's discussion of virtual memory and other topics--including
cache memory, pipelining, and multiprocessors--gave me more of what I as
a programmer actually needed to know than Hennessy and Patterson's wonderful,
definitive work.
Why should a programmer care about this stuff in the first place? After
all, demand-paged virtual memory is supposed to be transparent! You can
dereference a pointer to access a page of memory, even if that page is currently
on disk; a page-fault handler within the operating system will take care
of loading the page without you being aware of it. Obviously, the presence
of virtual memory is no more relevant to your average programmer than is,
say, the presence of an instruction prefetch queue on the processor.
Unfortunately, disk access is several orders of magnitude slower than access
to main memory. Consequently, there is one overwhelming reason why programmers
must understand the workings of "transparent" virtual memory:
performance. The reason for most programmers to study virtual memory, then,
is so they can understand its performance implications for their software.
Stone's High-Performance Computer Architecture does a great job of drawing
out just these implications. Rather than merely describing how virtual memory
works, he presents a detailed performance model in terms understandable
to applications programmers as well as systems designers. The very simple
"working set" concept is key here, and is of course discussed
in every other book on the subject, but somehow Stone manages to convey
this concept in a way that is genuinely helpful rather than merely informative.
For a few years, programmers working with systems such as Windows and OS/2
were worrying themselves silly with rules about segment sizes and segment
attributes. One venerable Microsoft University lecturer pounded into the
minds of an entire generation of Windows programmers the need to, "Keep
your segments as small as possible, as discardable as possible, and as unlocked
as possible."
Well, in Windows Enhanced mode, in OS/2 2.0, and in Win32/NT, all this pretty
much goes out the window. What replaces these old, out-moded ideas of how
to get good performance? The even older notion of "working set."
We had a period of a few years in which we had to do memory management and
the like in very odd ways, and that period is now thankfully coming to a
close.
"Working set" is really just the 90/10 rule expressed in a different
way. The 90/10 rule states that you will spend 90 percent of your time working
over 10 percent of your code. But it also states that 90 percent of the
software's running time occurs in only 10 percent of the code. This is the
whole basis for virtual memory: Potentially, a program can run at full speed
with only 10 percent of itself--or whatever the working set is--loaded into
memory at any given time. Unlike that nasty segment stuff, the programmer
does not specify any of this in advance. The operating system "discovers"
a program's working set on-the-fly, through page faults.
As Stone shows, paged virtual memory depends on the fact that all programs
have reasonably sized working sets or "footprints;" that is, that
all programs can run for a while with only discrete, page-sized bits of
themselves in memory at any given time. All programs?! Well, that's the
problem: A virtual-memory operating system can't know in advance how all
programs, or even how one program, will behave. All it knows is the probable
behavior of the average program. The average program will behave well under
virtual memory.
Your program may not, however, if the way it accesses memory doesn't correspond
to the model that virtual memory is based on. The model is simple: If your
program accessed x[i] at time t, then it is very likely to refer to x[i+1]
or x[i- 1] at time t + 1.
What began as a simple statistical description of the behavior of programs
has now been turned into a prescription: Your software had better behave
this way. Hence the great relevance of the section on virtual memory in
Stone's book, such as his discussion of "Improving Program Locality,"
to programmers interested in getting decent performance in Windows, OS/2,
or any other demand-paged system.
I've focused here on virtual memory, but this is just one section of High-Performance
Computer Architecture. The 100-page chapter on memory-system design
is actually largely devoted to a detailed analysis of cache memory: cache
analysis, cache writes, replacement policies, performance metrics, and so
on. A lot of the cache-memory discussion sounds just like the virtual memory
discussion, except for one small thing: Virtual memory involves hitting
the disk, and disks are very slow. This one point makes virtual memory and
cache memory fundamentally different. Other chapters in the book discuss
pipelining, numerics, vector computers, multiprocessing, and multiprocessor
algorithms. The chapters on multiprocessing are noteworthy for their sensible
position that, until the communication and synchronization overhead of multiprocessing
is reduced, multiprocessor systems are likely to involve just a handful
of processors, not the 1000-processor behemoths one might imagine.
I read the second edition of this book when it first came out in 1990 and,
frankly, I didn't get much out of it at the time. Yet, as I've tried to
indicate here, when I picked it up again in late 1992, much of it seemed
amazingly relevant to my daily work. Material that once would have seemed
unfortunately irrelevant to daily PC programming practice is becoming more
important every day. Why? Because the 32-bit Intel architecture and the
operating systems sold on top of it are becoming more and more like other
32-bit systems every day.
In fact, Intel might even come to regret the day it started pushing 32 bits,
because 32-bit code is portable to other architectures in a way that segmented
16-bit code never was. Well, that is pretty unlikely, but certainly PC programmers
will increasingly be able to benefit from the lessons learned on other processors
and other operating systems, and from textbooks such as High-Performance
Computer Architecture.