SC 34 Meetings, Prague, Days 2, 3 & 4

What a Difference a Day Makes
MURATA Makoto, WG 4 convenor,
seems pleased with progress on the maintenance of OOXML

I had been intending to write a day-by-day account of these meetings but as it turned out there simply was not time for blogging (tweeting, on the other hand …). Another activity which suffered was photography: I had wanted to take a load of pictures of über-photogenic Prague – but somehow the work seemed to expand into all my notionally free time too. What I did manage to snap is here. And it is also well worth checking out Doug Mahugh’s photos for more.

WG 1 met again on Tuesday to finish its business (you can read the meeting report here) and then my attention mostly turned mostly to the activities in WG 4 (for OOXML maintenance) and WG 5 (which is for OOXML / ODF interop).

OOXML One Year On

Overall, WG 4 is making excellent progress. To date, 169 defects have been collected for OOXML (check out the defect log) and the majority of these have either been closed, or a resolution agreed. Amendments were started for 3 of the 4 parts of OOXML to allow a bunch of small corrections to be put in place, and the even-more-minor problems will be fixed by publishing technical corrigenda. Overall, I think stakeholders in OOXML can feel pretty confident that the Standard is being sensibly and efficiently maintained.

I was personally very pleased to see National Bodies well-represented (the minutes are here) – to the extent that I’d now ideally like to see some more big vendors coming to the table so their views can be heard. Microsoft (of course) was; but where (for example) are Apple, Oracle and the other vendors who participated in Ecma TC 45 while OOXML was being drafted? To them – and to anybody who wants to get involved in this important work – I say: participate!

Over at Rick Jelliffe’s blog Rick has been carrying out something of an exposé of the unfortunate imbalance in the stakeholders represented in the maintenance of ODF at OASIS (something which will become even more acute if Sun is, in the end, snapped-up by IBM). Personally I think Rick is right that it is vitally important to have a good mix of voices at the standardisation table: big vendors, small vendors, altruistic experts, users, government representatives, etc. WG 4 is getting there, but it too has some way to go.

Tougher Issues

Some more controversial topics were however not resolved during the meeting, and I think it may be worth exploring these in more detail:

The Namespace Problem

One issue in particular has proved thorny: whether to changes Namespaces in the OOXML schemas. This topic took a good slice of Tuesday, and then segued into a bar session afterwards; this then carried on over supper and by the time we broke it was midnight. And still no conclusion has been reached. WG 4 has issued a document outlining the state of discussions to date …

I have already expressed my own views on this; but during these Prague meetings some important new considerations were brought to bear. Go over to Jesper Lund Stocholm’s blog to get the thorny details

Personally, I think the “strict” format is a new format, and that changing the Namespace is only part of the solution. I would like to see the media type changes and for OOXML to recommend a new file extension (.dociso anyone?) to reduce the chances that users suffer the (to me) unacceptable fate of silent data loss that Jesper highlights.

Whither Transitional?

Another hot topic of discussion is what the “transitional” version of OOXML is really for. One interesting and slightly surprising fact that emerged after the BRM is that the strict schemas are a true subset of the transitional schemas. Should this nice link be preserved? Microsoft are keen for new features introduced in the “strict” version of OOXML to be mirrored in the “transitional” version – presumably, in part, because Office 14 will use transitional features.

Openness

When the maintenance of OOXML was being planned, one of the principles agreed on by the National Bodies was that the process should be as open as possible, consistent with JTC 1 rules. One aspect of this is the question of whether WG 4’s mailing list archive should be open to the public. Some NBs were a little nervous of this for the reason that their committee members might be less free to post candid comments if they were open to public scrutiny, and possible repercussions with their boss and/or the tinfoil brigade in the blogosphere. There was also the troubling precedent of the mailing list of the U.S. INCITS V1 committee, which opened its archive to public view during the DIS 29500 balloting period, only to see it die completely as contributors refused to post in public view.

The issue will now be put to public ballot, and I am hopeful that mechanisms have been put in place which will allow NBs to support opening of the archive. With public standards, public meeting reports, public discussion documents and a public mailing list archive I think WG 4 will demonstrate that an excellent degree of openness is indeed possible even within the constraints of the current JTC 1 Directives.

Overturning BRM decisions

The UK proposed an interesting new defect during the Prague meetings, which centred on one of the decisions made at the BRM.

Nature of the Defect:

As a result of changes made at the BRM, a number of existing Ecma-376 documents were unintentionally made invalid against the IS29500 transitional schema. It was strongly expressed as an opinion at the BRM by many countries that the transitional schema should accurately reflect the existing Ecma-376 documents.

However, at the BRM, the ST_OnOff type was changed from supporting 0, 1, On, Off, True, False to supporting only 0, 1, True, False (i.e. the xs:boolean type). Although this fits with the detail of the amendments made at the BRM, it is against the spirit of the desired changes for many countries, and we believe that due to time limitations at the BRM, this change was made without sufficient examination of the consequences, was made in error by the BRM (in which error the UK played a part), and should be fixed.

Solution Proposed by the Submitter

Change the ST_OnOff type to support 0, 1, On, Off, True and False in the Transitional schemas only.

The result of the BRM decision being addressed here was apparent in a blog entry I wrote last year, which attracted rather a lot of attention.

Simply put, the UK is now suggesting the BRM made a mistake here, and things should be rectified so that existing MS Office documents “snap back” into being in conformance with 29500 transitional.

This proposal caused some angst. Who were we (some asked) to overturn decisions made at the BRM? My own view is less cautious: this was an obvious blunder, the BRM got it wrong (as it did many things, I think). So let’s fix it.

Whither WG 5?

In parallel with WG 4, WG 5 (the group responsible for ODF/OOXML interoperability) also met. One of the substantive things it achieved was to water down the title of the ongoing report being prepared on this topic, changing it from:

OpenDocument Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation

to:

OpenDocument Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation Guidelines

Adding the word “guidelines” to the title should make it clear to anybody noticing this project that it is not an “answer” to ODF/OOXML interoperability, merely a discursive document. For myself, I have doubts about the ultimate usefulness of such a document.

It is disappointing to see the poor rate of progress on meaningful interoperability and harmonisation work. Of course these things are motherhood and apple pie in discussion – but when the time comes to find volunteers to actually help, few hands go up. In my view, the only hope of achieving any meaningful harmonisation work is to get Another Big Vendor interested in backing it, and I know some behind-the-scenes work will be taking place to beat the undergrowth and see if just such a vendor can be found.

SC 34 Meetings, Prague - Day 1

Digs

SC 34 have a week of meetings in Prague. Today only WG 1 was meeting and I – for the first time – was convening it; an honour, and a slightly daunting one at that.

It was, though, very reassuring to feel that, for the first time since the OOXML days, things were returning to normal, and that the structural changes SC 34 has put in place has allowed WG 1 to return to its true purpose for XML infrastructure technologies: principally schema languages.

It was also great to see wide International participation, with experts in attendance from Canada, China, The Czech Republic, France, Japan, South Africa the United Kingdom.

We had a full agenda and the meeting day varied from some in-the-trenches technical work (prinipally on 19757-8 - DSRL) to some more strategic topics. A couple of these are worth a special mention.

XML 1.0 Fifth Edition

The first is the issue of what to do about XML 1.0 Fifth Edition. The particular revision has caused consternation in some parts of the XML community, by breaking compatibility with earlier versions of XML. XML titans such as Tim Bray, James Clark and David Carlisle have lined up to condemn the move, and Elliotte Rusty Harold has gone so far as to write that "The W3C Core Working group has broken faith with the XML community", and that,

Perhaps the time has come to say that the W3C has outlived its usefulness. Really, has there been any important W3C spec in this millennium that's worth the paper it isn't printed on? [...] I think we might all be better off if the W3C had declared victory and closed up shop in 2001.

Which, if nothing else, shows that when standards get passed which people don't like, the poor standards bodies get it in the neck — a phenomenon regular readers of this blog will have come across before.

So, the practical question is: what do we do about this in SC 34? If we have some Standards which refer to the Fifth edition, and others which refer to earlier editions, then there is a danger those standards are not interoperable, which flies in the face of JTC 1 requirements.

The initial mood around the table seemed to be that politics could be avoided by adopting an approach of "user beware". We would allow standards to mix references to the different versions and if implementations blew up on users then they'd know who to blame: the W3C.

On further reflection, however, consensus seems to be homing in on the idea that it would be better to keep all of our references pointing to XML 1.0 Fourth Edition for now, and to wait until the XML technologies around the Fifth edition matured (W3C has some work to do making XML 5Ed compatibile with other W3C technologies). Then we (and thus users) would be able to embrace 5Ed more enthusiastically; for amid the turmoil it does provide some features (such as a bigger repertoire of name characters) that are wanted by some of our non-Western users.

Schema Copyright

Another interesting issue surrounded schema copyright. When a user downloads a free ISO or IEC standard from ITTF's list, they are bound by a EULA which, inter alia, stipulates:

Under no circumstances may the electronic file you are licensing be copied, transferred, or placed on a network of any sort without the authorization of the copyright owner.

Now this raises a number of questions, but the immediate one facing WG 1 is the issue of schemas. When a standard contains a schema, it is perfectly reasonable for a user to want to extract that and use it for validation – which in most scenarios definitely will require it to be "transferred, or placed on a network".

Following an exchange with Geneva it became apparent that what we should be doing is to include a separate license with the schema, which derogates from the EULA to grant the necessary permissions. Geneva suggested a BSD-esque licence but suggested SC 34 should sensibly innovate around it:

The following permission notice and disclaimer shall be included in all copies of this XML schema ("the Schema"), and derivations of the Schema:
 
Permission is hereby granted, free of charge in perpetuity, to any person obtaining a copy of the Schema, to use, copy, modify, merge and distribute free of charge, copies of the Schema for the purposes of developing, implementing, installing and using software based on the Schema, and to permit persons to whom the Schema is furnished to do so, subject to the following conditions:
 
THE SCHEMA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SCHEMA OR THE USE OR OTHER DEALINGS IN THE SCHEMA.
 
In addition, any modified copy of the Schema shall include the following
notice:

THIS SCHEMA HAS BEEN MODIFIED FROM THE SCHEMA DEFINED IN ISO xxxxx-y, AND SHOULD NOT BE INTERPRETED AS COMPLYING WITH THAT STANDARD.
 

Already the experts are starting to hack this around and one well-supported thought was to have it submitted to the OSI to ensure it was compatible with any conceivable FOSS scenario. If any reader has expertise in this area, I'd be very interested to hear from them...

XML Prague 2009, Day 1

Night Falls on Old Prague

I am in Prague for the XML Prague conference, and for a week of meetings of ISO/IEC JTC 1 SC 34. Here is a running report of day 1 of the conference ...

Day 1 kicked off, after a welcome from Mohamed Zergaoui, with a presentation from Mike Kay (zOMG - no beard!) on the state of XML Schema 1.1. Mike gave a lucid tour of XML Schema's acknowledged faults, but maintained these must not distract us too much from the technology's industrial usefulness. XML Schema 1.1 looks to me mostly like a modest revamp: some tidying and clarification under the hood. One notable new feature is however to be introduced: assertions - a cut down version of the construct made popular by Schematron. Mike drew something of collective intake of breath when he claimed it was to XML Schema 1.1's advantage that it was incorprating multiple kinds of validation, and that it was "ludicrous" to validate using multiple schema technologies.

A counterpoint to this view came in the next presentation from MURATA Makoto. Murata-san demonstrated the use of NVDL to validate Atom feed which contain extension, claiming NVDL was the only technology that allows this to be done without manually re-editing the core schemas every time a new extension is used.

After coffee, Ken Holman presented on "code lists" - a sort of cinderalla topic within XML validation but an important one, as code lists play a vital role in document validity in most real world XML documents of any substance. Ken outlined a thorough mechanism for validation of documents using code lists based on Genericode and Schematron.

Before Lunch,  Tony Graham took a look at "Testing XSLT" and gave an interesting tour of some of the key technologies in this space. One of his key conclusions, and one which certainly struck a chord with me, was the assertion that ultimately the services of our own eyes are necessary for a complete test to have taken place

Continuing the theme, Jeni Tennison introduced a new XSLT testing framework of her invention: XSpec. I sort of hope I will never have to write substantial XSLTs which merit testing, but if I do then Jeni's framework certainly looks like TDB for TDD!

Next, Priscilla Walmsley took the podium to talk about FunctX a useful-looking general-purpose library of XPath 2.0 (and therefore XQuery) function. Priscilla's talk nicely helped to confirm a theme that has been emerging today, of getting real stuff done. This is not to say there is not a certain geeky intellectualism in the air - but: it's to a purpose.

After tea, Robin Berjon gave an amusing tour of certain XML antipatterns. Maybe because his views largely coincided with mine I thought it a presentation of great taste and insight. Largely, but not entirely :-)

Next up, Ari Nordström gave a presentation on "Practical Reuse in XML". His talk was notable for promoting XLink, which had been a target of Robin Berjon's scorn in the previous session (though now without some contrary views from the floor). Also URNs were proposed as an underpinning for identification purposes - a proposal which drew some protests from the ambient digiverse

To round off the day's proceedings, George Cristian Bina gave a demo of some upcoming features in the next version of the excellent oXygen XML Editor. This is software I am very familiar with, as I use it almost daily for my XML work. George's demo concentrated on the recent authoring mode for oXygen, which allows creation of markup in a more user-friendly wordprocessor-like environment. I've sort of used this on occasion, and sort of felt I've enjoyed it at the time. But somehow I always find myself gravitating back to good old pointy-bracket mode. Maybe I am just an unreconstructed markup geek ...


Breakfast Geek-out
Breakfast Geek-out