SC 34 WG meetings in Paris last week

The croissants of AFNOR

Last week I was in Paris for a stimulating week of meetings of ISO/IEC JTC 1/SC 34 WGs, and as the year draws to a close it seems an opportune time to take the temperature of our XML standards space and look ahead to where we may be going next.

WG 1 (Schema languages)

WG 1 can be thought of as tending to the foundations upon which other SC 34 Standards are built - and of these foundations perhaps none is more important than RELAX NG, the schema language of many key XML technologies including ODFDocBook and the forthcoming MathML 3.0 language. WG 1 discussed a number of potential enhancements to RELAX NG, settling on a modest but useful set which will enhance the language in response to user feedback. 

A proposed new schema language for cross reference validation (including ID/IDREF checking) was also discussed; the question here is whether to have something simple and quick (that addresses the ID/IDREF validation if RELAX NG, say), or whether to develop a more fully-featured language capable of meeting challenges like cross-document cross-reference checking in an OOXML or ODF package. It seems as if WG 1 is strongly inclining towards the latter.

Other work centred on proposing changes for cleaning up the unreasonable licensing restrictions which apply to "freely-available" ISO/IEC standards made available by the ITTF: the click through license here is obviously out-of-date, and text is required to attach to schemas so that they can be used on more liberal, FOSS-friendly terms. (I mentioned this before in this blog entry).

WG 4 (OOXML)

WG 4 had a full agenda. One item of business requiring immediate attention was the resolution of comments accompanying the just-voted-on set of DCOR ballots. These had received wide support from the National Bodies though it was disappointing to see that the two NBs who had voted to disapprove had not sent delegates to the meeting. P-members are obliged both to vote on ballots and attend meetings in SCs and so these nations (Brazil and Malaysia are the countries in question) are not properly honouring their obligation as laid down in the JTC 1 Directives:

3.1.1 P-members of JTC 1 and its SCs have an obligation to take an active part in the work of JTC 1 or the SC and to attend meetings.

I note with approval the hard line taken by the ITTF, who have just forcibly demoted 18 JTC 1 P-members who had become inactive.

Nevertheless, all comments received were resolved and the set of corrigenda will now go forward to publication, making a significant start to cleaning up the OOXML standard.

S/T

The other big topic facing WG 4 was to the thorny problem of what has come to be called the issue of "Strict v Transitional". In other words, deciding on some strategy for dealing with these two variants of the 29500 Standard.

The UK has a clear consensus on the purpose of the two formats. Transitional (aka "T") is (in the UK view) a format for representing the existing legacy of documents in the field (and those which continue to be created by many systems today); no more, and no less. Strict (aka "S") is viewed as the proper place for future innovation around OOXML.

Progress on this topic is (for me) frustratingly slow – ah! the perils of the consensus forming process – but some pathways are beginning to become visible in the swirling mists. In particular it seems there is a mood to issue a statement that the core schemas of T are to be frozen, and that any dangerous features (such as the date representation option blogged about by WG 4 experts Gareth Horton and Jesper Lund Stocholm) are removed from T.

This will go some way to clarify for users what to expect when dealing with a 29500-conformant document. However, I foresee needed work ahead to clarify this still further since within the two variants (Strict and Transitional) there are many sub-variants which users will need to know about. In particular the extensibility mechanism of OOXML (MCE) allows for additional structures to be introduced into documents. And so, is a "Transitional" (or "Strict") document:

  • Unextended ?
  • Extended, but with only standardized extensions ?
  • Extended, but with proprietary extensions ?
  • Extended in a backwards-compatible way relative to the core Standard ?
  • Extended in a backwards-incompatible way ?

I expect WG 4 will need to work on conformance classes and content labelling mechanisms (a logo programme?) to enable implementers to convey with precision what kind of OOXML documents they can consume and emit, and for procurers to specify with precision what they want to procure.

WG 5 (Document interop)

WG 5 continues its work with TR 29166, Open Document Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation Guidelines, setting out the high-level differences between the ISO versions of the OOXML and ODF formats. I attended to hear about a Korean idea for a new work item focussed on the use of the clipboard as an interchange mechanism.

This is interesting because the clipboard presents some particular challenges for implementers. What happens (for example) when a user selects content for copying which does not correspond to well-formed XML (from the middle of one paragraph to the middle of another)? I am interested in seeing exactly what work the Koreans will propose in this space ...

WG 6 (ODF)

Although I had registered for the WG 6 meeting, I had to take the Eurostar home on Thursday and so attempted to participate in Friday's WG 6 meeting by Skype (as much as rather intermittent wi-fi connectivity would allow).

From what I heard of it, the meeting was constructive and business-like, sorting out various items of administrivia and turning attention to the ongoing work of maintaining ISO/IEC 26300 (the International Standard version of ODF).

To this end, it is heartening to see the wheels finally creak into motion:

  • The first ever set of corrigenda to ISO/IEC 26300 has now gone to ballot
  • A second set is on the way, once a mechanism has been agreed how to re-word those bits of the Standard which are unimplementable
  • A new defect report from the UK was considered (many of these comments have already been addressed within OASIS, and so fixes are known)

Most significant of all is the work to align the ISO version of ODF with the current OASIS standard so that ISO/IEC 26300 and ODF 1.1 are technically equivalent. The National Bodies present reiterated a consensus that this was desirable (better, by far, than withdrawing ISO/IEC 26300 as a defunct standard) and are looking forward to the amendment project. The world will, then, have an ISO/IEC version of ODF which is relevant to the marketplace while waiting for a possible ISO/IEC version of ODF 1.2 – as even with a fair wind this is still around two years away from being published as an International Standard.

Records

I'll update this entry with links to documents as they become available. To start with, here are some informal records: :-)

 

Nightcap

SC 34 WG meetings in Paris next week

Once again I feel that bubbling up of almost schoolboy fervour that presages a set of SC 34 meetings. In Paris no less (AFNOR shall be our hosts): the city of love, fine art and rognons à la moutarde.

What is tasty on SC 34’s menu? Well four working groups are meeting next week:

  • WG 1 (which I convene) will be carrying forward its work on foundation standards – particularly the schema languages of DSDL. We have two new (/proposed) projects to discuss: one a schema language focussed on cross-reference validation; one on associating schemas with documents using processing instructions. Probably our most successful schema language, RELAX NG, is due for an update and several new features are up for discussion: keep it coming!
  • WG 4 (OOXML) will continue its intensive maintenance on ISO/IEC 29500 – not least in handling a new set of approved corrigenda (just voted on) and dealing with the day-to-day grind of correction and improvement. There are larger questions to answer too, in particular those which concern the relationship between the Strict and Transitional forms of OOXML. I have led the preparation of a background paper on this which (thanks to the newly open WG 4 mail archive) can be accessed as a public document (PDF). I predict some lively discussion!
  • WG 5 (OOXML/ODF interop) will continue its work examing how (or not) the two formats may be used by systems which hope to interoperate. TR 29166 - dedicated to this topic - continues to take shape ahead of its projected finishing date in 2011.
  • WG 6 is the newly-created WG dedicated to the JTC 1 side of maintenance of ISO/IEC 26300:2006 (aka ODF v1.0). As a newly-created group there will no doubt be a certain amount of adminstrivia to be got through but there are more substantial issues looming too: defect reports to be advanced and the longer-term project of amending ISO/IEC 26300 to bring it into alignment with ODF 1.1 – there is general agreement that it makes sense to reduce marketplace confusion by reducing the confusing number of standard (and non-standard) “ODF” variants out there, and aligning versions between standards organisations.

Stay tuned (and follow hashtag #sc34 for real-time updates) …

SC 34 Meetings, Seattle - Day 1

At the invitation of ANSI, SC 34 is in sunny Bellevue for five jam-packed days of Standards meetings (Sunday-Thursday). This is a full and busy event, with around 60 delegates registered from 14 countries (Belgium, Brazil, Canada, China, Denmark, Finland, France, Germany, India, Korea, Norway, South Africa, UK, and the USA) and 4 liaison organisations (Ecma, OASIS, W3C and the XML Guild).

Maybe by the end of it a number of momentous questions will have been answered, including:

  • Whether the world needs a standardised way to associate XML documents with schemas
  • Whether OOXML Transitional should be evolved
  • How ISO/IEC 26300 shall be maintained within SC 34
  • How standard schemas should be licensed to users
  • How MIME types should best be used for identifying document formats

Stay tuned ...


SC 34 meetings, Copenhagen

This week I attended 4 days of meetings of SC 34 working groups. WG 4 (OOXML maintenance) and WG 5 (OOXML/ODF interoperability). Last year I predicted that OOXML would get boring and, on the whole, the content of these meetings fulfilled that prophecy (while noting, of course, that for us markup geeks and standards wonks “boring” is actually rather exciting). There was however, one hot issue, which I’ll come to later …

Defects

Since the publication of OOXML in November last year, the work of WG 4 has been almost exclusively focussed on defect correction. To date over 200 defects have been submitted (the UK leading the way having submitted 38% of them). Anybody interested in what kinds of thing are being fixed can consult the material on the WG 4 web site for a quick overview. During the Copenhagen meeting WG 4 closed 53 issues meaning that 71% of all submitted defects submitted have now been resolved. By JTC 1 standards that is impressively rapid. The defects will now go forward to be approved by JTC 1 National Bodies before they can become official Amendments and Corrigenda to the base Standard. Among the many more minor fixes, a couple of agreed changes are noteworthy:

  • In response to a defect report from Switzerland, for the Strict version (only) of IS 29500, the XML Namespace has been changed, so that consumers can know unambiguously whether they are consuming a Strict or Transitional document without any risk of silent data loss. This is (editorially) a lot of work, but the results will be I think worthwhile.
  • As I wrote following the Prague meeting, there was a move afoot to re-instate the values “on” and “off” as permissible Boolean values (alongside “true”, “1”, “false” and “0”) so that Transitional documents would accurately reflect the existing corpus of Office documents, in accord with the stated scope of the standard. This change has now been agreed by the WG.

The ISO Date Issue

The “hot issue” I referred to earlier is ISO dates. What better way to illustrate the problem we face than by using one of Denmark’s most famous inventions, the LEGO® brick …


f*cked-up lego brick
OOXML Transitional imagined as a LEGO® brick

More precisely, the problem is about date representation in spreadsheet cells. One of the innovations of the BRM was to introduce ISO 8601 date representation into OOXML. However the details of how this have been done are problematic:

  1. Despite the text of the original resolution declaring that ISO 8601 dates should live alongside serial dates (for compatibility with older documents), one possible reading of the text today is that all spreadsheet cell dates have to be in ISO 8601 format
  2. Having any such dates in ISO 8601 format is particularly problematic for Transitional OOXML, which is meant to represent the existing corpus of office documents. Since none of these existing documents contain ISO 8601 dates having them here makes no sense
  3. Even more seriously, if people start creating “Transitional” OOXML documents which contain 8601 dates, then a huge installed base of software expecting Ecma-376 documents will silently corrupt that data and/or produce surprising results. (My concern here is more for things like big ERP and Financial systems, rather than desktop apps like MS Office). Hence the odd LEGO brick above: like those bricks, such files would embody an interoperability disaster
  4. Even the idea of using ISO 8601 is pretty daft unless it is profiled (currently it is not). ISO 8601 is a big complex standards with many options irrelevant to office documents: it would be far more sensible for OOXML to target a tiny subset of ISO 8601, such as that specified by W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes
  5. Many date/time functions declared in spreadsheetML appear to be predicated on date/time values being represented as serial values and not ISO 8601 dates. It is not clear if the Standard as written makes any sense when ISO 8601 dates are used.

The solution?

Opinions vary about how the ISO date problem might best be solved. My own strong preference would be to see the Standard clarified so that users were sure that Transitional documents were guaranteed to correspond to the existing document reality – in other words that Transitional documents only contain serial date representations, and not “ISO 8601” date representations. In my view the ISO dates should be for use in the Strict variant of OOXML only.

If a major implementation (Excel 2010, say) appears which starts pumping incompatible, ISO 8601-flavoured Transitional documents into circulation, then that would be an interop disaster. The standards process would be rightly criticised for producing a dangerous format; users would be at risk of corrupted data; and guilty vendors too would be blamed accordingly.

It is imperative that this problem is fixed, and fixed soon.

Impressions of Copenhagen

Our meetings took place at the height of midsummer, and every day was glorious (all meetings started with a sad closing of the curtains). Something of a pall was cast over proceeding by thefts, in two separate incidents, of delegates’ laptop computers; but there is no doubt Copenhagen is a wondeful city blessed with excellent food, tasty beer, and an abundance of good-looking women. Dansk Standard provided most civilised meeting facilities, and entertainment chief Jesper Lund Stocholm worked hard to ensure everyone enjoyed Copenhagen to the full, especially the midsummer night witch-burning festivities! Next up is the SC 34 Plenary in Seattle in September; I’m sure there will be many more defect reports on OOXML to consider by then, and that WG 4's tireless convenor Murata-san will be keeping the pressure on to mantain the pace of fixes  …


Jesper Among the Beers
Jesper Lund Stocholm

SC 34 Meetings, Prague, Days 2, 3 & 4

What a Difference a Day Makes
MURATA Makoto, WG 4 convenor,
seems pleased with progress on the maintenance of OOXML

I had been intending to write a day-by-day account of these meetings but as it turned out there simply was not time for blogging (tweeting, on the other hand …). Another activity which suffered was photography: I had wanted to take a load of pictures of über-photogenic Prague – but somehow the work seemed to expand into all my notionally free time too. What I did manage to snap is here. And it is also well worth checking out Doug Mahugh’s photos for more.

WG 1 met again on Tuesday to finish its business (you can read the meeting report here) and then my attention mostly turned mostly to the activities in WG 4 (for OOXML maintenance) and WG 5 (which is for OOXML / ODF interop).

OOXML One Year On

Overall, WG 4 is making excellent progress. To date, 169 defects have been collected for OOXML (check out the defect log) and the majority of these have either been closed, or a resolution agreed. Amendments were started for 3 of the 4 parts of OOXML to allow a bunch of small corrections to be put in place, and the even-more-minor problems will be fixed by publishing technical corrigenda. Overall, I think stakeholders in OOXML can feel pretty confident that the Standard is being sensibly and efficiently maintained.

I was personally very pleased to see National Bodies well-represented (the minutes are here) – to the extent that I’d now ideally like to see some more big vendors coming to the table so their views can be heard. Microsoft (of course) was; but where (for example) are Apple, Oracle and the other vendors who participated in Ecma TC 45 while OOXML was being drafted? To them – and to anybody who wants to get involved in this important work – I say: participate!

Over at Rick Jelliffe’s blog Rick has been carrying out something of an exposé of the unfortunate imbalance in the stakeholders represented in the maintenance of ODF at OASIS (something which will become even more acute if Sun is, in the end, snapped-up by IBM). Personally I think Rick is right that it is vitally important to have a good mix of voices at the standardisation table: big vendors, small vendors, altruistic experts, users, government representatives, etc. WG 4 is getting there, but it too has some way to go.

Tougher Issues

Some more controversial topics were however not resolved during the meeting, and I think it may be worth exploring these in more detail:

The Namespace Problem

One issue in particular has proved thorny: whether to changes Namespaces in the OOXML schemas. This topic took a good slice of Tuesday, and then segued into a bar session afterwards; this then carried on over supper and by the time we broke it was midnight. And still no conclusion has been reached. WG 4 has issued a document outlining the state of discussions to date …

I have already expressed my own views on this; but during these Prague meetings some important new considerations were brought to bear. Go over to Jesper Lund Stocholm’s blog to get the thorny details

Personally, I think the “strict” format is a new format, and that changing the Namespace is only part of the solution. I would like to see the media type changes and for OOXML to recommend a new file extension (.dociso anyone?) to reduce the chances that users suffer the (to me) unacceptable fate of silent data loss that Jesper highlights.

Whither Transitional?

Another hot topic of discussion is what the “transitional” version of OOXML is really for. One interesting and slightly surprising fact that emerged after the BRM is that the strict schemas are a true subset of the transitional schemas. Should this nice link be preserved? Microsoft are keen for new features introduced in the “strict” version of OOXML to be mirrored in the “transitional” version – presumably, in part, because Office 14 will use transitional features.

Openness

When the maintenance of OOXML was being planned, one of the principles agreed on by the National Bodies was that the process should be as open as possible, consistent with JTC 1 rules. One aspect of this is the question of whether WG 4’s mailing list archive should be open to the public. Some NBs were a little nervous of this for the reason that their committee members might be less free to post candid comments if they were open to public scrutiny, and possible repercussions with their boss and/or the tinfoil brigade in the blogosphere. There was also the troubling precedent of the mailing list of the U.S. INCITS V1 committee, which opened its archive to public view during the DIS 29500 balloting period, only to see it die completely as contributors refused to post in public view.

The issue will now be put to public ballot, and I am hopeful that mechanisms have been put in place which will allow NBs to support opening of the archive. With public standards, public meeting reports, public discussion documents and a public mailing list archive I think WG 4 will demonstrate that an excellent degree of openness is indeed possible even within the constraints of the current JTC 1 Directives.

Overturning BRM decisions

The UK proposed an interesting new defect during the Prague meetings, which centred on one of the decisions made at the BRM.

Nature of the Defect:

As a result of changes made at the BRM, a number of existing Ecma-376 documents were unintentionally made invalid against the IS29500 transitional schema. It was strongly expressed as an opinion at the BRM by many countries that the transitional schema should accurately reflect the existing Ecma-376 documents.

However, at the BRM, the ST_OnOff type was changed from supporting 0, 1, On, Off, True, False to supporting only 0, 1, True, False (i.e. the xs:boolean type). Although this fits with the detail of the amendments made at the BRM, it is against the spirit of the desired changes for many countries, and we believe that due to time limitations at the BRM, this change was made without sufficient examination of the consequences, was made in error by the BRM (in which error the UK played a part), and should be fixed.

Solution Proposed by the Submitter

Change the ST_OnOff type to support 0, 1, On, Off, True and False in the Transitional schemas only.

The result of the BRM decision being addressed here was apparent in a blog entry I wrote last year, which attracted rather a lot of attention.

Simply put, the UK is now suggesting the BRM made a mistake here, and things should be rectified so that existing MS Office documents “snap back” into being in conformance with 29500 transitional.

This proposal caused some angst. Who were we (some asked) to overturn decisions made at the BRM? My own view is less cautious: this was an obvious blunder, the BRM got it wrong (as it did many things, I think). So let’s fix it.

Whither WG 5?

In parallel with WG 4, WG 5 (the group responsible for ODF/OOXML interoperability) also met. One of the substantive things it achieved was to water down the title of the ongoing report being prepared on this topic, changing it from:

OpenDocument Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation

to:

OpenDocument Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation Guidelines

Adding the word “guidelines” to the title should make it clear to anybody noticing this project that it is not an “answer” to ODF/OOXML interoperability, merely a discursive document. For myself, I have doubts about the ultimate usefulness of such a document.

It is disappointing to see the poor rate of progress on meaningful interoperability and harmonisation work. Of course these things are motherhood and apple pie in discussion – but when the time comes to find volunteers to actually help, few hands go up. In my view, the only hope of achieving any meaningful harmonisation work is to get Another Big Vendor interested in backing it, and I know some behind-the-scenes work will be taking place to beat the undergrowth and see if just such a vendor can be found.