Wednesday, April 16, 2008

The Spirit of Schematron in Test Driven Development (TDD)

Test Driven Development is a relatively popular methodology nowadays
and I think XML tools can play crucial aspect in better testing. Testing
frameworks are more than capable of using and testing XML based
applications, but just in case you have ever had trouble, here are a
few tips. XSLT makes for an excellent transformation tool for massaging
XML data. This means it also can be a helpful tool to reduce large XML
data sets to something manageable, whether it is XML or not. For example
[see the] simple XSLT stylesheet that will return content on errors
checking an Atom Feed, which is is exceptionally simple, but hopefully
it makes the point. In the example, you'll also notice that the output
was not contained in a XML Element. Sometimes it is easier to just parse
a simple text file line by line, so this might be that situation. Likewise,
having a designated set of test elements could be helpful -- think reports
transformed to HTML). That said, the goal is not to create some enormous
test framework in XML and XSLT. The real goal is to use a great tool for
transforming XML to something you can use easily. I wouldn't necessarily
suggest trying to validate the content of an element or do complex string
parsing. XSLT 1.0 isn't really the easiest language for string parsing
or complex math with out a little help. You can always add your own
extension functions to help out, but hopefully keeping things simple by
massaging the data gets you 80% of the way. The idea here is make things
palatable to your own tastes... I like XML, but I hate XML Schema and
DTDs. RELAX NG is slightly better option, but when you just want to make
sure some value is present, the above methods can be a simpler solution.
The essence of the above suggestions come from Schematron, an excellent
validation tool that is as simple as knowing XPath. Schematron in fact
has been implemented using XSLT, so adding it to your existing test
framework should be relatively simple. There are times when XML seems
to present a subtle problem within the world of object oriented languages.
It's not a hard problem on a technical level. Working with XML is
relatively simple with many examples and resources. Things get hard when
you don't have good tools to help you along the way. The XML landscape
to your programming language of choice when XML has more than enough
tools to seamlessly integrate testing your XML along side your models,
views, controllers and integrations.

Don't Be Surprised By E-Discovery

E-discovery requires government agencies to know what electronic
documents they have and be able to find them quickly if someone requests
them for a court case. That's no small task considering the enormous
volume of electronic documents created by the typical organization.
Email messages and attachments represent a good chunk of the problem,
but word-processing documents, PDFs and other digital information also
contribute to the management challenge. The amended Federal Rules of
Civil Procedure, which has heightened awareness of e-discovery, cover
a wide range of data types under the umbrella of electronically stored
information... E-discovery experts recommend establishing a taxonomy
and creating metadata tags for electronic information. The taxonomy
provides a general way to classify information, and metadata provides
detail on information to make searches more fruitful. The Electronic
Discovery Reference Model project devised an Extensible Markup Language
(XML) schema to consistently describe electronic information. [Penny]
Quirk said EDRM created the XML e-discovery standard to ensure that
consistent and common nomenclature is used for business records during
the e-discovery process; the project is scheduled for completion in this
year's second quarter... Electronic documents culled in e-discovery and
used in litigation demand special treatment: documents compiled in
significant cases at the Justice Department are kept as permanent records
of the government. Records in garden-variety cases in federal court are
considered temporary, but they might still be housed for a number of
years at one of the National Archive's Federal Records Centers. The
National Archives tapped Lockheed Martin in 2005 to build an Electronic
Records Archives system that will help the agency ingest electronic
records flagged for permanent storage; the aim now is to accept
government reco ds in any format, encapsulating each electronic document
in an XML metadata wrapper.

Proposal for IETF NETCONF Data Modeling Language Working Group

The IESG Secretary announced that a new IETF working group has been
proposed in the Operations and Management Area, described in a draft
NETMOD Charter. The NETCONF Working Group has completed a base protocol
to be used for configuration management. However, the NETCONF protocol
does not include a standard content layer. The specifications do not
include a modeling language or accompanying rules that can be used to
model the management information that is to be configured using NETCONF.
This has resulted in inconsistent syntax and interoperability problems.
The purpose of NETMOD is to support the ongoing development of IETF
and vendor-defined data models for NETCONF. The WG will define a
"human-friendly" modeling language defining the semantics of operational
data, configuration data, notifications, and operations. This language
will focus on readability and ease of use. This language must be able
to serve as the normative description of NETCONF data models. The WG
will use YANG as its starting point for this language. Language
abstractions that facilitate model extensibility and reuse have been
identified as a work area and will be considered as a work item or
may be integrated into the YANG document based on WG consensus. The
WG will define a canonical mapping of this language to NETCONF XML
instance documents, the on-the-wire format of YANG-defined XML content.
Only data models defined in YANG will have to adhere to this on-the-wire
format. In order to leverage existing XML tools for validating NETCONF
data in various contexts and also facilitate exchange of data models
SDL data modeling framework (ISO/IEC 19757) with additional annotations
to preserve semantics. The initial YANG mapping rules specifications
are expressly defined for NETCONF modeling. However, there may be
future areas of applicability beyond NETCONF, and the WG must provide
suitable language extensibility mechanisms to allow for such future
work. The NETMOD WG will only address modeling NETCONF devices and the
language extensibility mechanisms... Initial deliverables: (1) An
architecture document explaining the relationship between YANG and
its inputs and outputs; (2) The YANG data modeling language and
semantics; (3) Mapping rules of YANG to XML instance data in NETCONF;
(4) YIN, a semantically equivalent fully reversible mapping to an
XML-based syntax for YANG. YIN is simply the data model in an XML
syntax that can be manipulated using existing XML tools (e.g., XSLT);
(5) Mapping rules of YANG to DSDL data modeling framework (ISO/IEC 19757),
including annotations for DSDL to preserve top-level semantics during
translation; (6) A standard type library for use by YANG. The IESG
has not made any determination as yet; please send your comments to
the IESG mailing list by April 22, 2008.

W3C Invites Public Comment on Content Transformation Guidelines 1.0

W3C announced that the Mobile Web Best Practices Working Group has
published the First Public Working Draft for "Content Transformation
Guidelines 1.0." This document provides guidance to managers of content
transformation proxies and to content providers for how to coordinate
when delivering Web content. Content transformation techniques diverge
widely on the web, with many non-standard HTTP implications, and no
well-understood means either of identifying the presence of such
transforming proxies, nor of controlling their actions. From the point
of view of this document, Content Transformation is the manipulation in
various ways, by proxies, of requests made to and content delivered by
an origin server with a view to making it more suitable for mobile
presentation. The W3C MWI BPWG neither approves nor disapproves of
Content Transformation, but recognizes that is being deployed widely
across mobile data access networks. The deployments are widely divergent
to each other, with many non-standard HTTP implications, and no
well-understood means either of identifying the presence of such
transforming proxies, nor of controlling their actions. This document
establishes a framework to allow that to happen.

Use HATS to Generate Atom Feeds for Mainframe Applications

Nowadays, content distributors deliver all content, including news and
site updates, as feeds. Most enterprise applications use feeds for
various purposes, including to monitor an application and check the
status of a project. Content providers publish a feed link on their site
that users register with a feed reader. The feed reader checks for
updates to the registered feeds at regular intervals. When it detects
an update in the content, the feed reader requests the updated content
from the content provider. The feeds contain only a summary of the content,
but they provide a link to the detailed content. Atom Syndication Format
and RSS are the most common specifications of feeds. We're using Atom
feeds in this article, but you can change easily to RSS feeds with a
little modification. This article leverages a product called IBM
WebSphere Host Access Transformation Services (HATS), which converts
any given green-screen, character-based 3270 or 5250 host application
into a Web application (HTML) or rich-client application. HATS also allows
programmatic interfaces to convert the identified content in these host
applications into any other format. We take a step-by-step approach to
show you how to write a HATS program that converts the host application
content into Atom feeds... Delivering data as Atom feeds in mainframes
opens a new world of possibilities for enterprise applications.
Organizations can use mashup editors to extract data from companies with
external or internal feeds and create new applications or information.
For example, call centers can take advantage of mashups by passing a
calling customer's ZIP code information to Google Maps to identify the
location of the customer. This can help the call center employees
personalize the conversation by enquiring about the weather from the
customer's location, and so on. The delivery of data as Atom feeds in
mainframe servers is one of the fundamental building blocks that enables
an organization to embrace Web 2.0.

Apache Abdera: Atom, AtomPub, and Java

The Apache Abdera project, an open source Atom Syndication and Atom
Publication Protocol implementation currently still in its incubation
phase, has recently reached its 0.40 milestone, an important step towards
graduation [as an Apache project]. Snell: "While Atom and AtomPub
certainly began life as a way of syndicating and publishing Weblog
content, it has proven useful for a much broader range of applications.
I've seen Atom being used for contacts, calendaring, file management,
discussion forums, profiles, bookmarks, wikis, photo sharing, podcasting,
distribution of Common Alerting Protocol alerts, and many other cases.
Atom is relevant to any application that involves publishing and managing
collections of content of any type... Abdera is an open source
implementation of the Atom Syndication Format and Atom Publishing Protocol.
It began life as a project within IBM's WebAhead group and was donated to
the Apache Incubator in June 2006. Since then, it has evolved into the
most comprehensive open-source, Java-based implementation of the Atom
standards.. Abdera has been part of the Apache Incubator for long enough.
While there are still some details to work out, I would very much like
to see Abdera graduate to its own Top Level Project at Apache, and become
host to a broad range of Atom-based applications." Diephouse: "Look to
some of the public services out there: most of the APIs for Google are
based on AtomPub. Microsoft is moving toward it for web APIs too. These
services are all going beyond just blogs. AtomPub goes beyond public web
APIs as well -- I've noticed that many enterprises are starting to use
AtomPub for some of their internal services as well. Both AtomPub and
SOAP/WSDL give you a way to build a service for others to use. But AtomPub
takes a fundamentally different approach to helping users implement
services. It implements constraints which give new types of freedom.
Because the data format is constrained -- every entry has a title, entry,
id, and content/summary -- I can use an Atom feed from any type of
application and get some useful information out of it... Abdera includes
support for developing/consuming AtomPub services, an IRI library, a URI
template library, unicode normalization, extensions for things like XML
signature/encryption, GData, GeoRSS, OAuth, JSON and more. One of the
cool new things in the latest release are a set of 'adapters' which allow
you to have an AtomPub service without any coding by storing entries in
JDBC, JCR or the filesystem...

Sunday, April 13, 2008

Who Trumps bin Laden as a Cyberthreat? Look in the Mirror.

From the San Francisco RSA 2008 Conference: "It turns out al-Qaida's
leader and his cohorts aren't the biggest threat to our cybersecurity.
You are... Security gurus have long urged the business world to turn
network security into part of the corporate DNA. The message is not
fully getting through. And now we're seeing the predictable results.
In years past, [Symantec CEO John] Thompson and other computer security
executives have pushed the idea of making cyber-security as familiar
to most people as the fire prevention campaign underwritten by the
government in the 1960s and 1970s. Considering the amount of money
Uncle Sam is spending on cyber-security these days, that's a pipedream.
Department of Homeland Security Secretary Michael Chertoff, who also
presented a keynote on Tuesday, offered litte indication Washington
was about to ride to the rescue. In remarks during his prepared speech
and subsequent press conference, Chertoff offered a dutiful recitation
of what he described as the President's interest in shoring up the
nation's digital security. Give Chertoff credit for being candid about
where DHS has come up short. He said the government needs to reduce
its (literally) thousands of network access points to around 50. At
the same time, Chertoff wants his department to faster detect and
analyze computer anomalies. A big part of that will involve a revamp
of U.S. CERT's early warning system... In the end, however, money
talks and you-know-what walks. The feds only have a $115 million budget
to work with. Chertoff's department has requested $192 million for
the new fiscal year but that's still doing it on the cheap. By
comparison, we spend $720 million in Iraq each day [actually their own money, joke of the day,].

More Information