Search This Blog

Friday, December 21, 2007

XML Entity Definitions for Characters

W3C announced the release of a First Public Working Draft for the
specification "XML Entity Definitions for Characters." The document has
been produced by members of the W3C Math Working Group as part of the
W3C Math Activity; it is one of three drafts relevant to MathML published
on 2007-12-14. The document defines several sets of names which are
assigned to Unicode characters; these names may be used for entity
references in SGML/XML-based markup languages. Notation and symbols
have proved very important for scientific documents, especially in
mathematics. In the majority of cases it is preferable to store
characters directly as Unicode character data or as XML numeric character
references. However, in some environments it is more convenient to use
the ASCII input mechanism provided by XML entity references. Many entity
names are in common use, and this specification aims to provide standard
mappings to Unicode for each of these names. In the Working Draft, two
tables listing the combined sets are presented, first in Unicode order
and then in alphabetic order; then tables documenting each of the entity
sets are provided. Each set has a link to the DTD entity declaration
for the corresponding entity set, and also a link to an XSLT2 stylesheet
that will implement a reverse mapping from characters to entity names.
In addition to the stylesheets and entity files corresponding to each
individual entity set, a combined stylesheet is provided, as well as
two combined sets of DTD entity declarations. The first is a small file
which includes all the other entity files via parameter entity references;
the second is a larger file that directly contains a definition of each
entity, with all duplicates removed.

Example (sets) include: [1] C0 Controls and Basic Latin, C1 Controls and
Latin-1 Supplement; [2] Latin Extended-A, Latin Extended-B; [3] IPA
Extensions, Spacing Modifier Letters; [4] Combining Diacritical Marks,
Greek and Coptic; [5] Cyrillic; [6] General Punctuation, Superscripts
and Subscripts, Currency Symbols, Combining Diacritical Marks for
Symbols; [7] Letterlike Symbols, Number Forms, Arrows... The editor notes:
It is hoped that the entity sets defined by this specification may form
the basis of an update to "ISO 9573-13-1991". However, pressure of other
commitments has currently prevented this document being processed by
the relevant ISO committee, thus the entity sets are being presented with
Formal Public identifiers of the form "-//W3C//..." rather than "ISO...."
It is hoped that an update to TR 9573-13 may be made later. The present
version of TR 9573-13 defines the sets of names, but does not give
mappings to Unicode. TR 9573-13 is maintained by ISO/IEC JTC 1/SC 34/WG 1
(Markup Languages). An Outgoing Liaison Statement from SC34 was recently
communicated to the W3C MathML WG regarding cancellation of the project
for TR 9573-13, Second Edition [Revision of TR 9573-13, SGML support
facilities -- Techniques for using SGML - Part 13: Public entity sets for
SGML for mathematics and science], in accordance with Resolution 13
adopted at the SC 34 plenary meeting held in Kyoto, Japan, 2007-12-08/11.
More Information
See also the source files: Click Here

No comments: