Mastodon
Where is there an end of it? | All posts by alex

XML Prague 2009, Day 1

Night Falls on Old Prague

I am in Prague for the XML Prague conference, and for a week of meetings of ISO/IEC JTC 1 SC 34. Here is a running report of day 1 of the conference ...

Day 1 kicked off, after a welcome from Mohamed Zergaoui, with a presentation from Mike Kay (zOMG - no beard!) on the state of XML Schema 1.1. Mike gave a lucid tour of XML Schema's acknowledged faults, but maintained these must not distract us too much from the technology's industrial usefulness. XML Schema 1.1 looks to me mostly like a modest revamp: some tidying and clarification under the hood. One notable new feature is however to be introduced: assertions - a cut down version of the construct made popular by Schematron. Mike drew something of collective intake of breath when he claimed it was to XML Schema 1.1's advantage that it was incorprating multiple kinds of validation, and that it was "ludicrous" to validate using multiple schema technologies.

A counterpoint to this view came in the next presentation from MURATA Makoto. Murata-san demonstrated the use of NVDL to validate Atom feed which contain extension, claiming NVDL was the only technology that allows this to be done without manually re-editing the core schemas every time a new extension is used.

After coffee, Ken Holman presented on "code lists" - a sort of cinderalla topic within XML validation but an important one, as code lists play a vital role in document validity in most real world XML documents of any substance. Ken outlined a thorough mechanism for validation of documents using code lists based on Genericode and Schematron.

Before Lunch,  Tony Graham took a look at "Testing XSLT" and gave an interesting tour of some of the key technologies in this space. One of his key conclusions, and one which certainly struck a chord with me, was the assertion that ultimately the services of our own eyes are necessary for a complete test to have taken place

Continuing the theme, Jeni Tennison introduced a new XSLT testing framework of her invention: XSpec. I sort of hope I will never have to write substantial XSLTs which merit testing, but if I do then Jeni's framework certainly looks like TDB for TDD!

Next, Priscilla Walmsley took the podium to talk about FunctX a useful-looking general-purpose library of XPath 2.0 (and therefore XQuery) function. Priscilla's talk nicely helped to confirm a theme that has been emerging today, of getting real stuff done. This is not to say there is not a certain geeky intellectualism in the air - but: it's to a purpose.

After tea, Robin Berjon gave an amusing tour of certain XML antipatterns. Maybe because his views largely coincided with mine I thought it a presentation of great taste and insight. Largely, but not entirely :-)

Next up, Ari Nordström gave a presentation on "Practical Reuse in XML". His talk was notable for promoting XLink, which had been a target of Robin Berjon's scorn in the previous session (though now without some contrary views from the floor). Also URNs were proposed as an underpinning for identification purposes - a proposal which drew some protests from the ambient digiverse

To round off the day's proceedings, George Cristian Bina gave a demo of some upcoming features in the next version of the excellent oXygen XML Editor. This is software I am very familiar with, as I use it almost daily for my XML work. George's demo concentrated on the recent authoring mode for oXygen, which allows creation of markup in a more user-friendly wordprocessor-like environment. I've sort of used this on occasion, and sort of felt I've enjoyed it at the time. But somehow I always find myself gravitating back to good old pointy-bracket mode. Maybe I am just an unreconstructed markup geek ...


Breakfast Geek-out
Breakfast Geek-out

At Westminster

At Westminster

I had a few minutes before a meeting yesterday – and so a chance to take a photo of Big Ben. The only lens I was carrying was the (small and light) Nikon E Series 50mm, which gives a slightly unusual short-telephoto focal length of 75mm (in real money) when used on a DX camera.

So, a trip over Westminster Bridge was necessary to get much into the frame.

I'm pleased with the painterly light/haze effect in this picture. That may be accounted for by the layer of dust I noticed had built up on the front element later in the day; or the software smudging caused by the slightly-misaligned frames which make up this 3-exposure HDR....

The Tree by King's College Bridge

The Tree by King's College Bridge

I seem to have got into the habit of taking HDR pictures into the sun; I like the result and the way that this "justifies" the user of HDR (a normally exposure just wouldn't work here).

Real Conformance for ODF?

There has been quite a lot of hubbub recently about ODF conformance, in particular about how conformance to the forthcoming ODF 1.2 specification should be defined.

A New Conformance Clause

Earlier versions of ODF (including ISO/IEC 26300) already defined conformance - it was simply a question of obeying the schema. So in ODF 1.1, for example, we had this text:

Conforming applications [...] shall read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place [...] (1.5)

and that was the simple essence of ODF conformance.

This is now up for reconsideration. The impetus for altering the existing conformance criteria appears to have come from a change in OASIS's procedures, which now require that specifications have “a set of numbered conformance clauses”, a requirement which seems sensible enough.

However, the freshly-drafted proposal which the OASIS TC has been considering goes further than just introducing numbered clauses: it now defines two categories of conformance:

  1. “Conforming OpenDocument Document” conformance
  2. “Conforming OpenDocument Extended Document” conformance

as shorthand, we might like to characterise these as the “pure” and “buggered-up” versions of ODF respectively.

The difference is that the “pure” version now forbids the use of foreign elements and attributes (i.e. those not declared by the ODF schema), while the “buggered-up” version permits them.

Ructions

The proposal caused much debate. In support of the new conformance clause, IBM's Rob Weir described foreign elements (formerly so welcome in ODF) as proprietary extensions that are “evil” and as a “nuclear death ray gun”. Questioning the proposal, KOffice's Thomas Zander wrote that he was “worried that we are trying to remove a core feature that I depend on in both KOffice and Qt”. Meanwhile Microsoft's Doug Mahugh made a counter-proposal suggesting that ODF might adopt the Markup Compatibility and Extensibility mechanisms from ISO/IEC 29500 (OOXML).

Things came to a head in a 9-2-2 split vote last week which saw the new conformance text adopted in the new ODF committee specification by will of the majority. Following this there was some traffic in the blogosphere with IBM's Rob Weir commenting and Microsoft's Doug Mahugh counter-commenting on the vote and the circumstances surrounding it.

Shadow Play

What is to be made of all this? Maybe Sun, whose corporate memory still smarts from Microsoft's “extend and embrace” Java attempts, thinks this is a way to prevent a repeat of similar stunts for ODF. Or perhaps this is a way to carve out a niche for OpenOffice to enjoy “pure” status while competitor applications are relegated to the “buggered-up” bin. Maybe it is envisaged that governments might be encouraged to procure only systems that deal in “pure” ODF. Maybe foreign elements really are the harbinger of nuclear death.

Who knows?

Whatever the reasons behind the reasons, there is clearly an “absent presence" in all these discussions: Microsoft Office. And in particular the forthcoming Microsoft Office 2007 SP2 with its ODF support. It is never mentioned, except in an occasional nudge-nudge wink-wink sort of way.

This controvery is most bemusing. This is in part because the “Microsoft factor” appears not to be a factor anyway, since MS Office will (we are told) not use foreign elements for its ODF 1.1 support. But the main reason why this is bemusing is that this discussion (whether or not to permit foreign elements) is completely unreal. There seems to be an assumption that it matters – that conformance as defined in the ODF spec means something important when it comes to real users, real procurement, real development or real interoperability.

It doesn't mean anything real - and here's why...

Making an ODF-conformant Office Application

Let us consider the procurement rules of an imaginary country (Vulgaria, say). Let us further imagine that Vulgaria's government wants to standardize on using ODF for all its many departments. After many hours of meetings, and the expenditure of many Vulgarian Dollars on consultancy fees, the decision is finally made and an official draws up procurement rules to stipulate this:

Any office application software procured by the Government of Vulgaria must support ODF (ISO/IEC 26300), and must conform to the 'pure' conformance class defined in clause x.y of that Standard, reading and emitting only ODF documents that are so conformant".

Sorted, they think.

Now imagine a software company that has its eye on making a big sale of software licenses to Vulgaria. Unfortunately, its office application does not meet the ODF conformance criterion set out by the procurement officer. The marketing department is duly sad. But one day a bright young developer gets to hear of the problem and proposes a solution. He boldy proclaims “I can make our format ODF-conformant today!”, and proceeds to show how.

First he gets a template ODF document, like this:

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
office:version="1.0">
<office:body>
<office:text>
<text:p></text:p>
</office:text>
</office:body>
</office:document-content>

This document (he points out) meets the “pure” conformance criteria. Our young hacker then does a curious thing: he takes an existing (non-ODF) file from their office software, BASE-64 encodes it, and inserts the resulting text string into the element in the template document.

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
office:version="1.0">
<office:body>
<office:text>
<text:p><!-- several MBs of BASE-64 encoded content here --></text:p>
</office:text>
</office:body>
</office:document-content>

There, he proudly proclaims. All we need to do it to wrap our current documents with the ODF wrapper when we save, and unwrap when we load – I can have a fresh build for you tomorrow.

The rest of the story is not so happy: the software company makes the sale and the government of Vulgaria finds after installation that none of the files from it will interoperate with any other ODF files from other sources, despite the software company having met its procurement rules to the letter.

Far fetched?

Okay, that story makes an extreme example – but it neverthess illustrates the point. It is possible for a smart developer to represent pretty much anything as a “pure” ODF document; any differences and incompatibilities can ever-so-easily be shoehorned into conformant ODF documents. That some software deals only in such pure ODF means precisely zero in the real world of interoperability.

The central consideration here is that ODF conformance only ever was (and is only projected to be) stated in terms of XML, and XML is (in)famously “all syntax and no semantics”. The semantics of an ODF document (broadly, all the narrative text in the specification) play no part in conformance can remain unimplemented in a conformant processor. An ODF developer can safely use just the schema and never read much else. All those descriptions of element behaviour can be ignored for the purposes of achieving ODF conformance. [N.B. mistakes in this para corrected following comment from Rob Weir, below]

So my question is: what is the current debate on ODF conformance really about? It looks to me like mis-directed effort.

What ODF might usefully do is to look at the “application description” feature introduced into OOXML. This describes several types of applications, including a type called “full”. Such applications have “a semantic understanding of every feature within [their] conformance class”, and

“Semantic understanding” is to be interpreted that an application shall treat the information in Office Open XML documents in a manner consistent with the semantic definitions given in this Specification.

In other words, it is possible to specify in OOXML procurement that the processor should heed the narrative description within that Standard (not just the XML grammar). ODF currently lacks this. In my view if there is to be any connection between a definition of ODF conformance and the experience of users in the real world, then something like OOXML's “application description” feature is urgently needed. And it might be better done now, than hastily inserted during a JTC 1 BRM ...

SC 34 Meetings, Okinawa – Day 5 and Summary

Okinawan Entertainment
Singer Azusa Miyagi from the Okinawan pop duo Tink Tink

Day 5 found us all attending a joint session of the working groups to sort out more administrative details and share the recommendations made by WG 5. Anybody interested in seeing the state of play in WG 4’s work can consult the WG 4 web site, where progress on defect correction can be tracked.

Overall this has been I think a successful meeting: the two new working groups are up and running and their work is well underway. There was perhaps the occasional trace of residual angst overhanging from last year: NBs are keen to assert their sovereignty in the decision making process, and the Ecma delegates are keen to be assured the JTC 1 processes can deliver the mechanisms and timeliness needed to keep IS29500 in shape. In general however, there has been a decided “unclenching” as delegates warmed to the (let’s face it) sometime drudgery of maintaining XML document formats. This was all helped by the exceptional hospitality shown by JISC and ITSCJ in hosting the meetings, and in particular by the efforts of WG 4 convenor Murata Makoto. Whenever one wanted to know where to eat, what to drink, or where the prettiest singers could be found, Murata-san was your man!

It was great also to work with Jesper Lund Stocholm, who has also been covering these meetings on his blog. It would be better still if more countries followed Denmark, and more companies followed the shining example of CIBER, in supplying experts for assisting in this important work.

It was something of a shock coming from the 23°C sunshine of Okinawa to freezing snow-bound Britain. And also a shock to review the amount of standards work piling up to be done before the next SC 34 meetings in Prague: defects to be filed, maintenance agreements to be hammered out, agendas to write, ballots to vote on and proposals to draft. I am expecting the Prague meeting to be particularly vibrant, not least since it is preceded by XML Prague 2009. I have not been to an XML Prague before, but have heard only good things about it. Certainly, the programme looks fascinating (though I make no claims for my own presentation). It certainly seems that Prague is going to be the centre of the world for XML-heads everywhere in late March …