Mastodon
Where is there an end of it? | SC 34 meetings, Copenhagen

SC 34 meetings, Copenhagen

This week I attended 4 days of meetings of SC 34 working groups. WG 4 (OOXML maintenance) and WG 5 (OOXML/ODF interoperability). Last year I predicted that OOXML would get boring and, on the whole, the content of these meetings fulfilled that prophecy (while noting, of course, that for us markup geeks and standards wonks “boring” is actually rather exciting). There was however, one hot issue, which I’ll come to later …

Defects

Since the publication of OOXML in November last year, the work of WG 4 has been almost exclusively focussed on defect correction. To date over 200 defects have been submitted (the UK leading the way having submitted 38% of them). Anybody interested in what kinds of thing are being fixed can consult the material on the WG 4 web site for a quick overview. During the Copenhagen meeting WG 4 closed 53 issues meaning that 71% of all submitted defects submitted have now been resolved. By JTC 1 standards that is impressively rapid. The defects will now go forward to be approved by JTC 1 National Bodies before they can become official Amendments and Corrigenda to the base Standard. Among the many more minor fixes, a couple of agreed changes are noteworthy:

  • In response to a defect report from Switzerland, for the Strict version (only) of IS 29500, the XML Namespace has been changed, so that consumers can know unambiguously whether they are consuming a Strict or Transitional document without any risk of silent data loss. This is (editorially) a lot of work, but the results will be I think worthwhile.
  • As I wrote following the Prague meeting, there was a move afoot to re-instate the values “on” and “off” as permissible Boolean values (alongside “true”, “1”, “false” and “0”) so that Transitional documents would accurately reflect the existing corpus of Office documents, in accord with the stated scope of the standard. This change has now been agreed by the WG.

The ISO Date Issue

The “hot issue” I referred to earlier is ISO dates. What better way to illustrate the problem we face than by using one of Denmark’s most famous inventions, the LEGO® brick …


f*cked-up lego brick
OOXML Transitional imagined as a LEGO® brick

More precisely, the problem is about date representation in spreadsheet cells. One of the innovations of the BRM was to introduce ISO 8601 date representation into OOXML. However the details of how this have been done are problematic:

  1. Despite the text of the original resolution declaring that ISO 8601 dates should live alongside serial dates (for compatibility with older documents), one possible reading of the text today is that all spreadsheet cell dates have to be in ISO 8601 format
  2. Having any such dates in ISO 8601 format is particularly problematic for Transitional OOXML, which is meant to represent the existing corpus of office documents. Since none of these existing documents contain ISO 8601 dates having them here makes no sense
  3. Even more seriously, if people start creating “Transitional” OOXML documents which contain 8601 dates, then a huge installed base of software expecting Ecma-376 documents will silently corrupt that data and/or produce surprising results. (My concern here is more for things like big ERP and Financial systems, rather than desktop apps like MS Office). Hence the odd LEGO brick above: like those bricks, such files would embody an interoperability disaster
  4. Even the idea of using ISO 8601 is pretty daft unless it is profiled (currently it is not). ISO 8601 is a big complex standards with many options irrelevant to office documents: it would be far more sensible for OOXML to target a tiny subset of ISO 8601, such as that specified by W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes
  5. Many date/time functions declared in spreadsheetML appear to be predicated on date/time values being represented as serial values and not ISO 8601 dates. It is not clear if the Standard as written makes any sense when ISO 8601 dates are used.

The solution?

Opinions vary about how the ISO date problem might best be solved. My own strong preference would be to see the Standard clarified so that users were sure that Transitional documents were guaranteed to correspond to the existing document reality – in other words that Transitional documents only contain serial date representations, and not “ISO 8601” date representations. In my view the ISO dates should be for use in the Strict variant of OOXML only.

If a major implementation (Excel 2010, say) appears which starts pumping incompatible, ISO 8601-flavoured Transitional documents into circulation, then that would be an interop disaster. The standards process would be rightly criticised for producing a dangerous format; users would be at risk of corrupted data; and guilty vendors too would be blamed accordingly.

It is imperative that this problem is fixed, and fixed soon.

Impressions of Copenhagen

Our meetings took place at the height of midsummer, and every day was glorious (all meetings started with a sad closing of the curtains). Something of a pall was cast over proceeding by thefts, in two separate incidents, of delegates’ laptop computers; but there is no doubt Copenhagen is a wondeful city blessed with excellent food, tasty beer, and an abundance of good-looking women. Dansk Standard provided most civilised meeting facilities, and entertainment chief Jesper Lund Stocholm worked hard to ensure everyone enjoyed Copenhagen to the full, especially the midsummer night witch-burning festivities! Next up is the SC 34 Plenary in Seattle in September; I’m sure there will be many more defect reports on OOXML to consider by then, and that WG 4's tireless convenor Murata-san will be keeping the pressure on to mantain the pace of fixes  …


Jesper Among the Beers
Jesper Lund Stocholm

Comments (8) -

  • orcmid

    6/28/2009 6:17:16 AM |

    Nice one.  Great summary of the ISO 8601 in spreadsheet-cells problem.

    It looks like you have an odd case where strict is not a subset of transitional (and the namespace situation will help with that).  I vote taking cell type "d" out of transitional, and then making it work in strict.  Uh, wait, OK, I recommend taking cell type "d" etc.

  • Alan Bell

    6/29/2009 6:17:20 PM |

    If the standard (albeit transitional flavour) is "guaranteed to correspond to the existing document reality" then there is a closed proprietary reference implementation. Reference implementations are banned I seem to recall you saying. If I am understanding you correctly you are concerned about new versions of applications continuing to spew out transitional documents (a situation with no end in sight), and your solution to this is to guarantee that whatever they do is in lock-step with Microsoft.
    This does not seem to be a step forward.

  • Alex

    6/29/2009 6:54:04 PM |

    @Alan

    There are no reference implementations for International Standards; the *text* is the law, and that is why it is imperative that is got right.

    I don't think there a question of there being "a" reference implementation here - if you mean MS Office. There are however, legacy *documents* and legacy systems which process those documents -- and that is what the Nations explicitly sought to keep compatibility with in inventing Transitional 29500.

    As I said, it's not really a problem for MS Office, as the Redmond developers can develop and deploy interceptor code to make that interoperate with ISO date documents. The problem lies with software *already written* by others: the kind of software that tracks military movements, runs payrolls and drives accounting systems using the OOXML Excel format, and which developers other than Microsoft have written. That is what we're hearing from users: "don't give us documents which claim to be compatible with the legacy, but which in reality will break our systems".

  • Alan Bell

    6/30/2009 4:19:31 AM |

    Legacy documents don't contain ISO8601 dates. This is a non-issue for the corpus of documents currently sat on disk. It may be an issue for new documents created in legacy formats, and possibly edits to existing documents. The transitional 29500 basically says implementations SHALL NOT deviate from what Microsoft Office does/did. This is wrapped up in a veneer of ISO respectability.
    I have to think that it isn't ISO's problem that some people have written systems that don't work with a published ISO standard (even when one was available . . .). This is a practical demonstration of the value of ISO standards and the danger of putting your data in a format controlled by a single vendor that does not give you software freedom. This is a bit of a hard-line viewpoint but I can't imagine the cost of shoehorning in a new date parser would be that significant compared to the cost of the system in the first place, or for that matter adding support for the published version of OOXML (or ODF).
    I do agree that robust versioning is an issue and documents claiming to be a version they are not will break systems.

  • hAl

    6/30/2009 5:03:28 AM |

    Will the issue of the On Off datatype be solved in time for Office 2010 as well ?  

  • Alex

    6/30/2009 3:18:29 PM |

    @Alan

    For a Standard which is meant to represent a legacy base, fidelity is important. So if a fault in ISO PDF emerged that meant it was out of sync with all existing PDF documents, do you really think it's okay to say (in effect) "tough" ?

    Granted, there may be some "sick cases" where something horrid was allowed, and that feature was exploited by a minority of documents -- but generally NBs need to be responsible about making sure a standard "does what it says on the tin".

  • Alex

    6/30/2009 5:02:25 PM |

    @hAl

    My understanding is that MS plan to have MS Office understand all boolean variants, but not to *emit* "on" and "off" any more (liberal in what you receive yadda yadda).

    As I say above, the reintroduction of "on" and "off" for the Transitional schema has been agreed will appear in the first set of corrigenda for IS 29500.

  • hAl

    7/14/2009 10:23:27 PM |

    Btw, I forgot.
    How is the proposed versioning of OOXML going ?

Comments are closed