![]() |
|
When it comes down to it, PANIC! UNIX System Crash Dump Analysis Handbook, by Chris Drake and Kimberley Brown, should really be titled Kernel Debugging Techniques for Solaris, since it is so Solaris oriented. However, if you use another variant of UNIX, you may still find this book useful for gaining a specific range of techniques -- assuming you can map from Solaris-specific features (such as SPARC assembler) to those that match your platform. Although the authors claim to have attempted to generalize the text for any UNIX variant, you'll still need to do a lot of reading between the lines before using platforms other than Sun. There are also a lot of references to Sun's documentation, such as the standard reference manuals, Sun's online documentation (Answer books), and some very SPARC-specific features (such as the boot PROMs and FORTH command interpreter).
Still, PANIC! is well organized, the index is accurate and well structured, and the layout well done. The included CD-ROM contains some debugger macros from the book, and ctags output from two different versions of UNIX (read Solaris). However, the CD-ROM does not appear to be referenced anywhere in the text other than on the back cover. The diagrams and crash dumps are well laid out and the general text of the book is in an easy-to-read style. Having both worked for Sun as support engineers, the authors definitely know the subject matter from experience.
PANIC! consists of three parts. Part One leads off with some general background information including how to deliberately crash a Solaris system, using a kernel debugger, kernel-level include files, and symbol tables. Part Two, "Advanced Studies," gives an overview of assembler and some kernel features that come to be noticed in a crash such as the kernel stack unwinding, virtual memory, scheduling, file systems, device drivers, interprocess communication, traps, watchdog resets, interrupts, and multiprocessor kernels. Part Three presents case studies.
The information contained in Part One is not much more than what you would expect to find in the working notes of a kernel-level developer. A lot of space is devoted to writing macros. This is as close to programming as the book gets and anyone using any tool for long enough is likely to develop a library of short cuts and customizations for it.
Part Two is a recap on basic operating-system concepts covered in most computer science curriculum. The VME bus gets a quick mention while the authors are on the subject of interrupts and general discussion on vectored interrupts, and priority levels follows. A short chapter on multiprocessor kernels covers multiaccess data protection, semaphores, and mutexes. Books devoted to the topic are a far better introduction and throughout the book there is discussion of topics that seems to target an audience that are less computer literate than even the least skilled programmers and system administrators. For example, the inclusion of topics "What is a header file?" and "High level vs. low level languages."
Part Three is one of the most interesting parts of the book because it includes real-world examples. More can be learned from real problems and techniques used to solve them than any hard and fast rules laid down anywhere. Often, examples of lateral thinking are enough for people to approach things from a different angle. The crashes discussed include network trouble, stomped on modules, hanging instead of swapping, pipe problems, third-party modules, genuine hardware faults, and disk problems.
One such problem is an example of a "panic" that was due to a deadlock caused by a third-party streams-loadable kernel module. Not only device-driver programmers, but also anyone who messes with the kernel can bring down a system. Some interesting techniques from the real-world examples include searching the operating-systems vendor update documentation for strings from panics to look for which patch to apply. The lesson here is not to waste your time solving problems that other people have already looked into.
Once a problem is found, the solution is often easy to implement. Simply finding the bug causes the most headaches. Considering that it sometimes seems that 95 percent of device-driver development is debugging, PANIC! would be useful for device-driver programmers, UNIX fanatics, and hardcore system administrators. And finally, this book should be required reading for all of Sun's help desk staff and support engineers.
-- Regan Russell