Search This Blog

Thursday, March 18, 2010

Public Data: Translating Existing Models to RDF

"As we encourage linked data adoption within the UK public sector,something we run into again and again is that (unsurprisingly) particulardomain areas have pre-existing standard ways of thinking about the datathat they care about. There are existing models, often with multipleserialisations, such as in XML and a text-based form, that are supportedby existing tool chains. In contrast, if there is existing RDF in thatdomain area, it's usually been designed by people who are more interestedin the RDF than in the domain area, and is thus generally more focusedon the goals of the typical casual data re-user rather than theprofessionals in the area...
To give an example, the international statistics community uses SDMXfor representing and exchanging statistics... SDMX includes a well-thoughtthrough model for statistical datasets and the observations within them,as well as standard concepts for things like gender, age, unit multipliersand so on. By comparison, SCOVO, the main RDF model for representingstatistics, barely scratches the surface in comparison. This isn't theonly example: the INSPIRE Directive defines how geographic informationmust be made available. GEMINI defines the kind of geospatial metadatathat that community cares about. The Open Provenance Model is the resultof many contributors from multiple fields, and again has a number ofserialisations.
You could view this as a challenge: experts in their domains already havemodels and serialisations for the data that they care about; how can wepersuade them to adopt an RDF model and serialisations instead? Butthat's totally the wrong question. Linked data doesn't, can't and won'treplace existing ways of handling data. The question is really abouthow to enable people to reap these benefits; the answer, becauseHTTP-based addressing and typed linkage is usually hard to introduceinto existing formats, is usually to publish data using an RDF-basedmodel alongside existing formats. This might be done by generating anRDF-based format (such as RDF/XML or Turtle) as an alternative to thestandard XML or HTML, accessible via content negotiation, or byproviding a GRDDL transformation that maps an XML format into RDF/XML...
Modelling is a complex design activity, and you're best off avoidingdoing it if you can. That means reusing conceptual models that have beenbuilt up for a domain as much as possible and reusing existing vocabularieswherever you can. But you can't and shouldn't try to avoid doing designwhen mapping from a conceptual model to a particular modelling paradigmsuch as a relational, object-oriented, XML or RDF model. If you'remapping to RDF, remember to take advantage of what it's good at suchas web-scale addressing and extensibility, and always bear in mind howeasy or difficult your data will be to query. There is no pointpublishing linked data if it is unusable..."
http://www.jenitennison.com/blog/node/142See also Linked Data: http://www.w3.org/standards/semanticweb/data

No comments: