Search This Blog

Wednesday, August 29, 2007

Fast Incremental Updates of XML Records

"XML is a very popular format for exchanging collections of records.
You can export these records from a relational database, or they can
be in formats such as Atom, which is structured around a collection of
entry elements. A common architectural pattern is to synchronize data
sets by having one system export a set of records to another; this
export is often in the form of a large XML file that contains the
entire record set. Such systems have some common efficiency problems:
[1] The XML exports can be so large that they use up excessive
bandwidth in transmission; [2] For large files, the processing needed
to validate and import the XML takes a long time. In this article, I
suggest a simple batch of techniques to address these problems. You
should always be quick to look to several decades of experience when
solving such problems. The crux of the techniques presented in this
article follows the lines of the age-old diff and patch utilities
well known in UNIX diff is a utility that compares two files (or sets
of files) and reports the differences in a standard format. patch can
read this standard format and apply the represented updates to some
other file... I focus on XML with particular characteristics: (1) The
root element serves as an envelope whose children are a series of
record elements; (2) Each record element has a unique ID attribute
or child element; (3) Within each record is a consistent order of
elements. The last requirement might seem stringent, but it doesn't
necessarily mean that your schema must mandate the order. In practice,
incremental updates usually involve comparison of successive export
files from the same process, and in such scenarios, matters such as
the order of elements within records tend to be consistent. In the
worst case, if the schema allows arbitrary order, and you don't want
to rely on the order in the actual exports, you can process the XML
to impose an order..." CHECK HERE

No comments: