Tuesday, April 8, 2008

Unicode Consortium Announces Release of Unicode Standard Version 5.1

The Unicode Consortium has announced the release of Unicode Version 5.1,
containing over 100,000 characters, and provides significant additions
and improvements that extend text processing for software worldwide.
Some of the key features are: increased security in data exchange,
significant character additions for Indic and South East Asian scripts,
expanded identifier specifications for Indic and Arabic scripts,
improvements in the processing of Tamil and other Indic scripts,
linebreaking conformance relaxation for HTML and other protocols,
strengthened normalization stability, new case pair stability, plus
others given below. The Version 5.1.0 data files and documentation are
final and posted on the Unicode site. In addition to updated existing
files, implementers will find new test data files (for example, for
linebreaking) and new XML data files that encapsulate all of the Unicode
character properties. A major feature of Unicode 5.1.0 is the enabling
of ideographic variation sequences. These sequences allow standardized
representation of glyphic variants needed for Japanese, Chinese, and
Korean text. Unicode 5.1 contains significant changes to properties and
behaviorial specifications. Several important property definitions were
extended, improving linebreaking for Polish and Portuguese hyphenation.
The Unicode Text Segmentation Algorithms, covering sentences, words,
and characters, were greatly enhanced to improve the processing of Tamil
and other Indic languages. The Unicode Normalization Algorithm now
defines stabilized strings and provides guidelines for buffering.
Standardized named sequences are added for Lithuanian, and provisional
named sequences for Tamil. Unicode 5.1.0 adds 1,624 newly encoded
characters. These additions include characters required for Malayalam
and Myanmar and important individual characters such as Latin capital
sharp s for German. Version 5.1 extends support for languages in Africa,
India, Indonesia, Myanmar, and Vietnam, with the addition of the Cham,
Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts. The
Unicode Collation Algorithm (UCA), the core standard for sorting all
text, is also being updated at the same time. The major changes in UCA
include coverage of all Unicode 5.1 characters, tightened conformance
for canonical equivalence, clearer definitions of internationalized
search and matching, specifications of parameters for customizing
collation, and definitions of collation folding. The next version of
the Unicode locale project (CLDR) is also being prepared on the basis
of Unicode 5.1, and is now open for public data submission.

