A year ago, in the May 1990 issue of DDJ, I wrote a round-up of books
on the C++ programming language. The bottom line was that, if you planned
on reading only one C++ book, then Stanley B. Lippman's C++ Primer
based on C++ 2.0 was the book to read. That's still true.
Some excellent books on C++ have appeared in the last year, reflecting the
growing maturity of the language. Some assume that the reader is already
familiar with C++. We have entered the second generation of C++ usage. The
best of the new C++ books is Jonathan S. Shapiro's A C++ Toolkit.
This book can be read in one evening, and is an enjoyable, brief introduction
to software reuse. Shapiro's goal is to prod you into thinking about using
C++ for "reusable programming."
In a discussion of "The Failure of Libraries," Shapiro cites the
example of two different Unix library routines for handling regular expressions.
"No interesting program has ever used either of these library packages!"
(p.3). Something better than libraries is needed if we are to have a "software-components
subindustry"; object-oriented programming languages such as C++ are
an attempt to solve this problem.
We've all heard this one before. But simple things, such as using the example
of linked lists rather than the tired complex-numbers example, make this
a convincing argument for using C++.
Computer science has the interesting property that the vast majority of
problems are solved with a very small number of fundamental data structures.
These data structures are used so often that they have achieved the status
of koans. The most common by far is the linked list. If you have been programming
for more than a year or two, you can probably write them in your sleep.
Initially, I thought linked lists were too basic a topic for this book.
A recent project changed my mind.
I found myself working on a project that needed linked lists in several
places. Having written several hundred linked list data structures in my
career, I threw one together without bothering to build a class. No sooner
had I completed the first list than I need to build a second, and cranked
out the code for that one, too. Does this sound familiar? Alarm bells went
off in my head and I decided that this chapter was worth including.
In addition to the pleasure of reading some decent prose for a change, Shapiro's
book provides a fresh view of the major classical data structures. There
are chapters on bit sets, lists, arrays, dynamic arrays, binary trees, hash
tables, and atoms. Shapiro uses C++ to say something new and interesting
about these structures.
But all is not wine and roses. The section "Coping with Compiler Brain
Death" (pp. 76-7) explains what happens when your compiler can't inline
a function that has been declared inline. "An Implementation Note About
Virtual Functions" (pp. 86-7) says that if you have a class with virtual
functions, but without any noninline member functions, then your compiler
is likely to emit tons o'vtbls.
Chapter 15, on memory management, makes this point: "C++ programs,
much more than C programs, take advantage of the heap. As a result, C++
objects are more frequently allocated in the heap than their C counterparts.
Careful memory management is a crucial aspect of C++ performance. As compilers
get better, it will very likely become the dominant issue in tuning C++
applications" (p. 161).
At the beginning of the book, Shapiro makes a point that seems to sum up
the big problem with C++, a problem that has no solution, and that stems
from C++'s greatest asset, which is its strong tie to C. "There are
places where the need to support C features prevents C++ from supporting
object-oriented features as well as one might like, and a surprising number
of programs will run up against these problems in one way or another"
(p. ix).
That remark leads straight into this month's next C++ book, the long-awaited
Data Abstraction and Object-Oriented Programming in C++ by Keith
Gorlen, Sanford Orlow, and Perry Plexico. Gorlen et al. discuss how to stretch
C++ as far as it will go in the direction of object-oriented languages such
as Smalltalk, and away from the language's machine-oriented C heritage.
This book is based on the the NIH Class Library, a Smalltalk-like class
library for C++, which the authors developed as part of a project involving
biomedical research on Unix-based workstations at the National Institutes
of Health (NIH).
The NIH Class Library addresses a very real problem: C++ compilers do not
come with extensive class libraries. If you need a LinkedList class, you
write it (or borrow the one from Shapiro's book!). If you want the += operator
to signify concatenation when applied to a string, then you have to write
a String class with an operator+=( ) member function. C++ gives you the
mechanisms, but after that you're on your own. Fundamentally, C++ is still
C. C++ programmers can be jealous of programmers using object-oriented languages
such as Smalltalk, which come with extensive class libraries. When you buy
Digitalk Smalltalk/V, you get a massive class hierarchy. When you buy a
C++ compiler, you get iostream.h. I am convinced that this Spartan approach,
remaining true to the language's C origins, is precisely why C++ has succeeded.
It is a compromise between C on the one hand and object-oriented programming
on the other.
But that doesn't change the fact that you need a class library. The NIH
Class Library brings some of the flavor of Smalltalk to C++; its class hierarchy
has Object at the top, a Collection class underneath that, a Bag class underneath
that, and so on. If you have Turbo C++ or the newer Borland C++, note that
the sample CLASSLIB is a scaled-down implementation of this same idea.
From the guided tour of the NIH Class Library given by Gorlen et al., I
got the sense that C++ provides just enough object-oriented features to
be tempting, but not enough to really work. How could it? C++ is still C.
For example, the constructor for a BigInt class must nonintuitively take
a string of digits because "this is the only way we can legally write
very large integer constants in C++" (p. 34). You can't write BigInt
n = 18446744073709551615 because that number has 20 digits and is not a
legal integer constant in C--I mean, C++. Nor can you overload operator^(
) to mean exponentiation and check if (n== 2^64 - 1) because in C--I mean
in C++, the ^ operator is unary not binary.
This sort of restriction means that the promises of C++ often can't be fulfilled.
One promise is that with operator overloading, we can give "an easily
readable, 'mathematical' appearance" to mathematics programs (p.96).
I believe that the NIH Class Library comes as close as possible to this
goal, but it can't succeed, because C++ does not provide a free-form collection
of overloadable operators.
C++ seems to hold out the promise of working at a higher level, only to
pull you up short at the last minute with a stern reminder that this is
still C. Restrictions of this sort are necessary if C++ is to remain a serious
tool for developing commercial software. The authors of the NIH Class Library
show what can be done within these restrictions. In addition to reading
their book, you can get the NIH Class Library source code, either from the
publisher (an additional $16.95) or by downloading it from BIX (listings
area c.plus.plus; files nih30.zip, nih30.inf, and cppoops.zip). This is
probably the largest collection of public C++ source code available, and
is well worth examining.
One final note on this book. For years, I have been expecting to see the
phrase "switch statement considered harmful" in print. One of
the chief benefits of C++ is that its virtual functions (dynamic binding)
can eliminate the need for switch statements. Anyone who has seen one of
the 14-page "switch statements from hell" that regularly appear
in Microsoft Windows source code cannot doubt that the switch statement
should nearly always be replaced by some sort of table (of function pointers,
for instance). Anyhow, I was glad to read the brief note, "The switch
statement is considered harmful" (p. 104).
Our final book is Margaret A. Ellis and Bjarne Stroustrup, The Annotated
C++ Reference Manual. These 447 pages are an expansion and update to
the 70-page Reference Manual that appeared in the back of Stroustrup's 1986
book The C++ Programming Language.
The new Ellis and Stroustrup book is nearly as unreadable as the original
Stroustrup book, and if you are doing anything with C++, it's just as essential.
Besides its approval as base document for the ANSI standardization of C++
(the cover is stamped "ANSI Base Document"), Ellis and Stroustrup's
book contains many annotations and commentaries that clarify points in the
original reference manual, plus lengthy discussions of the many new features
added since 1986.
Opening the book to a random chapter, we find 22 pages of in-depth Talmudic
commentary on the following topics: Single Inheritance, Multiple Inheritance,
Multiple Inheritance and Casting, Multiple Inheritance and Implicit Conversion,
Virtual Base Classes, Virtual Base Classes and Casting, Single Inheritance
and Virtual Functions, Multiple Inheritance and Virtual Functions, Virtual
Function Tables, Instantiation of Virtual Functions, Virtual Base Classes
with Virtual Functions, and Renaming.
I came away from Ellis and Stroustrup's book with very grave worries about
the complexity of C++. It starts on p. 22 with the remark that a certain
variable "may not be eliminated even if it appears to be unused."
The reason is that the constructor or destructor for the variable's class
may have side-effects. You may say that no one should write a class where
the mere creation of an "unused" variable changes the program's
behavior, but there are several important C++ applications for just this
sort of nonintuitive behavior. On the same page, Ellis and Stroustrup provide
a beautiful example of a Tracer class. The importance of such "unused"
variables also comes out in static initializers for modules.
The point is simply that some of the nicest applications of C++ also reveal
its innate complexity: Here we have a language in which you simply cannot
look at a line of code and know what it's doing. An assembly-language programmer
might say the same thing about C, but to me there is a difference when we
are talking about a language in which deleting an unused variable might
break the program!
C++ is soon going to become even more complex. All three books discuss the
two major forthcoming features of C++: templates (parametized types) and
try/catch/throw (exception handling). These much-needed features will undoubtedly
interact in many interesting ways with all of the language's existing features.