This article explains how to put GRDDL-enabled agents to the task of
extracting valuable information from machine-processable metadata
embedded in documents -- courtesy of prevailing semantic web standards.
HTML and XHTML traditionally have had only modest support for metadata
tags. The World Wide Web Consortium (W3C) is working on including richer
metadata support in HTML/XHTML with emerging standards such as RDF with
attributes (RDFa), embedded RDF (eRDF), and so on. These standards allow
more specific metadata to be attached to different structural and
presentation elements, which provides a unified information resource.
Gleaning Resource Descriptions from Dialects of Languages (GRDDL,
pronounced griddle) offers a solution to the embedded metadata problem
in a flexible, inclusive, and forward-compatible way. It allows the
extraction of standard forms of metadata (RDF) from a variety of sources
within a document. People usually associate XHTML with GRDDL, but it is
worth noting that GRDDL is useful for extracting standardized RDF
metadata from other XML structures as well. GRDDL theoretically supports
a series of naming conventions and standard transformations, but it does
not require everyone to agree to particular markup strategies. It allows
you to normalize metadata extraction from documents using RDFa,
microformats, eRDF, or even custom mark-up schemes. The trick is to
identify the document as a GRDDL-aware source by specifying an HTML
metadata profile. The profile indicates to any GRDDL-aware agents that
the standard GRDDL profile applies. Anyone wishing to extract metadata
from the document should identify any relevant 'link' tags with a 'rel'
attribute of transformation and apply it to the document itself. This
approach avoids the conventional problem of screen scraping, where the
client has to figure out how to extract information. With GRDDL, the
publisher indicates a simple, reusable mechanism to extract relevant
information. While there is currently no direct support for GRDDL in
any major browser, that situation is likely to change in the near future.
Until then, it is not at all difficult to put a GRDDL-aware proxy in
between your browser and GRDDL-enabled pages, which the Piggy Bank
FireFox extension from MIT's SIMILE Project does.
No comments:
Post a Comment