[ ERCB Home |
New |
Feature |
Brief |
DDJ |
Letters |
Links
]
Writing Localized Software
Review by Charles Pfefferkorn
Copyright (C) Dr. Dobb's Journal, January, 1996
International markets for software are big and getting bigger. Companies
like Microsoft, in fact, earn over half of their revenues outside the United
States. At one time, international markets were happy receiving the previous
version of newly released U.S. software. Today, however, these markets want
the latest version now. The participants in these markets read the most
recent editions of U.S. computer magazines and actively search the Internet
for the most up-to-date information. As a result, U.S. companies are forced
to reduce the time between U.S. and localized release dates, and many are
releasing them simultaneously. To participate in the international market,
you need to understand both its business and technical aspects.
The three books I'll examine here will help you design and develop international
software. While there is some overlap, they focus on different aspects of
the process. Software Internationalization and Localization: An Introduction
provides an overview, covering several platforms and providing information
about International Standards and various business issues. Understanding
Japanese Information Processing focuses on processing Japanese text.
It includes C code for converting between various Japanese character-set
encoding methods and special functions for repairing Japanese text damaged
by e-mail programs and newsgroup readers. It also provides access to online
information about Japanese. The third book is Developing International
Software for Windows 95 and Windows NT. It includes code samples, tables,
figures, checklists, and troubleshooting guides. All three books provide
glossaries, references to additional documents, and numerous appendices.
Software Localization
Software Internationalization and Localization, by Emmanuel Uren,
Robert Howard, and Tiziana Perinotti, discusses the creation of products
for international markets. The book lists over 40 separate engineering issues,
including different languages, character sets, writing systems, currencies,
currency formats, measurement systems, number formats, calendars, date formats,
standards, legal systems, and cultures.
Western European languages, for instance, use diacritical characters and
additional non-English letters. Eastern European languages include Cyrillic
script and Greek letters. Asian languages use thousands of ideographic characters
derived from traditional Chinese characters. Arabic and Hebrew use a bidirectional
writing system when English words are included. Different languages have
different capitalization, hyphenation, spelling, and grammar rules, and
imply different typography. Even number formats vary: In the U.S., the decimal
separator is a period and the thousand separator is a comma. In most Western
European nations, it is the opposite. A less obvious difference is numeric
rounding rules. Even colors, symbols, and sounds (like those of emergency
vehicles and telephones) vary from culture to culture.
The book also describes issues associated with translation. Translated text
is often longer than the original. Words that are different in one language
may translate into the same word in another.
Software Internationalization focuses on using IBM PCs with either
DOS or Windows 3.x, but the book also provides UNIX and Macintosh information.
The technical information is descriptive and references are included; source
code, however, is not, and some of the technical information is becoming
dated.
The book also includes a chapter on non-Western European languages and a
chapter on International Standards and International Standards Organizations.
Software Internationalization concludes with a discussion of international
business issues: development models, business relationships, distribution
channels, legal issues, logistics, government regulations, custom duties
and taxes, repatriating funds, and the cost of doing international business.
There is also a chapter on developing products in Europe and marketing them
in the United States.
Japanese Information Processing
Japanese text is written using four types of characters: romaji (Roman characters),
hiragana, katakana, and kanji. Romaji includes the standard English alphabet
and numerals. Hiragana and katakana are syllabaries for Japanese sounds.
Hiragana is used for grammatical words, inflectional endings for verbs and
adjectives, and some nouns. Katakana is used for words of foreign origin
and for emphasis. Kanji includes the characters borrowed from the Chinese
over 1500 years ago.
In Understanding Japanese Information Processing, Ken Lunde carefully
describes the evolution of Japanese character-set standards and their relationship
to ISO character-set standards. The primary Japanese character set standards
are JIS X 0208-1990 and JIS X 0212-1990. JIS X 208-1990 contains 6879 characters,
of which 6355 are kanji, divided into two groups: 2965 in Level 1 and 3390
in Level 2. JIS X 0212-1990 contains 6067 characters, of which 5801 are
supplemental kanji. Lunde also describes other Asian language standards
and international character sets including ISO 10646 and its subset, Unicode.
In Unicode, 121,403 characters of Chinese origin (Chinese, Japanese, and
Korean) are mapped into 20,902 unique characters using Han Unification rules.
Separate, but related to the character sets are the encoding methods. The
three major Japanese encoding methods are JIS, Shift- JIS, and EUC. JIS
is a modal system for encoding various character sets, including JIS X 0208-1990
and JIS X 0212-1990. It is used primarily for passing information between
computing systems. Shift-JIS is a nonmodal modification developed by Microsoft
and used by many other platforms, including Japanese PCs and KanjiTalk (the
Japanese Macintosh OS). Shift-JIS supports faster internal processing, but
does not support Level 2 or supplemental kanji. EUC (Extended UNIX Code)
is the internal coding system used by most UNIX workstations and is defined
by ISO 2022-1993. The appendices of Lunde's book also include information
about Japanese corporate character sets and encoding methods.
Since the major Asian character sets are extremely large, entering characters
is difficult. While kanji tablets with thousands of keys exist, other input
methods for Asian languages have been developed that use combinations of
software and hardware. Lunde examines these options and describes typography
issues.
For some developers, the most important part of the book will be the algorithms
(presented in C) for converting between different encodings, handling text
streams, automatically detecting the Japanese encoding used for a text file,
and repairing JIS-encoded files. These algorithms are included in the set
of tools, which the author provides via the Internet.
Lunde devotes an entire chapter to Japanese text-processing tools, including
operating systems, text editors, word processors, page-layout software,
online dictionaries, machine-translation software, and terminal software.
The chapter on using Japanese e-mail and newsgroups includes advice on how
to repair files damaged by network mail programs and newsgroup readers.
In the appendices, he lists professional organizations, mailing lists, and
FTP sites for additional software and documents.
Developing International Software
Developing International Software for Windows 95 and Windows NT,
by Nadine Kano, focuses on developing international software on Windows
95 and Windows NT. The early chapters discuss general issues associated
with internationalizing and localizing software. Kano stresses the importance
of planning and having written specifications that define localization requirements.
She also describes Microsofts experience in developing international software
using a single team for both the domestic and international versions. Finally,
Kano discusses the trade-offs Microsoft made in developing Windows 95.
Other issues covered include designing an international user interface,
researching legal issues, setting up a development environment, testing,
assisting translators, and coding practices.
Chapter 3 covers encoding character sets. Windows 95 uses a code-page model.
For Japanese, Windows 95 uses code page 932, a Shift-JIS encoding; Windows
NT uses Unicode. To produce a single code base for both Windows 95 and Windows
NT, you must use generic prototypes and compiler switches. All Win32 API
functions contain two entry points: one for traditional string parameters
and one for Unicode string parameters.
To localize the user interface, use resource files to define pictures, strings,
messages, menus, dialog boxes, and version information. Chapter 4 describes
how to organize these resources and link them to your source code. Chapter
5 describes how to use Microsoft Win32 NLSAPI to support linguistic and
cultural conventions such as date, time, calendar, number, and currency
formats. This API also provides sorting and character-type information.
Like the rest of Win32 API, NLSAPI exists in two forms (-A APIs and -W APIs).
On Windows NT you can use either form, but on Windows 95 you can only use
the -A forms.
Chapter 6 covers multilingual input, fonts, and multilingual text layout.
Chapter 7 covers processing of Far Eastern writing systems (Chinese, Japanese,
and Korean), including the use of Input Method Editors (IMEs) supported
by Windows NT and Windows 95. On Windows NT 3.5, the interface to the IMEs
depends on the target language. A unified API is provided by Windows NT
3.51 and Windows 95.
While many coding examples are included, you will still need to use other
Microsoft reference materials, including reference manuals for Windows NT
and Windows 95 and the appropriate SDKs.
Conclusion
Both Software Internationalization and Localization and Developing
International Software for Windows 95 and Windows NT cover the basics
of developing international products. Windows 95 and Windows NT developers
will prefer the latter. Developers using other platforms will probably prefer
the former, as will those interested in an introduction to the business
aspects of developing international software. Both books are excellent.
Ken Lundes' Understanding Japanese Information Processing is an essential
reference book for developers processing Japanese text. It will also appeal
to individuals interested in the Japanese language.
Software Internationalization and Localization: An Introduction
Emmanuel Uren, Robert Howard, and Tiziana Perinotti
Van Nostrand Reinhold, 1992
300 pp., $39.95
ISBN 0-442-01498-8
Understanding Japanese Information Processing
Ken Lunde
O'Reilly & Associates, 1993
470 pp., $29.95
ISBN 1-56592-043-0
Developing International Software for Windows 95 and Windows NT
Nadine Kano
Microsoft Press, 1995
800 pp. $35.00
ISBN 1-55615-840-8
Electronic Review of Computer Books
Created 5/1/96 / Last modified 6/7/96 / webmaster@ercb.com