Search This Blog

Thursday, January 3, 2008

Can IBM Bring the Semantic Web to Notes and Outlook?

OmniFind Personal Email Search tries to extract useful information like
addresses or phone numbers from inboxes, and lets organizations
customize semantic tagging to avoid irrelevant results. While email
search itself isn't new (Google's Desktop Search will happily index your
inbox along with the rest of your hard drive), the IBM software is
slightly different. Rather than finding a specific email message or
thread, Omnifind is aimed at searching for unstructured data: the
information buried within an inbox. And it looks like one of the first
genuinely useful desktop applications based on the Semantic Web -- an
idea that has been somewhat eclipsed by Web services and Web 2.0, but
which could eventually unite them with SOA. The key technology in the
tool is UIMA (Unstructured Information Management), an IBM-led open-source
framework for analyzing text and other unstructured data. This is
essentially pattern recognition: a series of ten digits with hyphens,
brackets or spaces in the right places is a phone number, two letters
followed by five numbers is a zip code, etc. The tool uses this to
generate semantic XML tags automatically, overcoming what has been the
biggest barrier to the Semantic Web: that people don't have the time
or inclination to add metadata to documents manually... Omnifind also
lets users edit the default tags or create their own, using regular
expressions to represent search patterns. IBM suggests that these be
used to customize the search to a specific organization, finding
information like employee IDs or package tracking numbers. It could
also be used to weed out irrelevant search results, most of which
are caused by the one-size-fits-all approach that public search engines
must take. The long-term goal of UIMA is to apply the same automated
pattern recognition to other kinds of data, which will likely be harder.
Email is in some senses the low-hanging fruit, as it isn't entirely
unstructured: There are the formal fields like "To", plus the informal
structure of salutations and signatures that it inherited from regular

No comments: