Mastodon
Where is there an end of it? | All posts tagged 'microsoft'

Office Document Interop

Later this week I will be participating in one of Microsoft’s DII (Digital Interoperability Initiative) workshops in Redmond (full disclosure: my company will be reclaiming travel and accommodation expenses from Microsoft) and am looking forward to assessing some of the work in this area not as a standards person, but as a (less circumscribed) free agent.

Now, “interoperability” has become a slightly contentious word. Sun’s Simon Phipps, for example, while speaking at The 2nd International ODF User Workshop showed a slide in which he argued for “Substitutability not Interoperability”. I am not sure there is a meaningful distinction between the two terms but it is true “substitutability” does convey forcibly what (I take it) the user requirement is in this area: to be able to take advantage of open standard formats to process data with a choice of software, and to avoid application lock-in.

The word “interoperability” also no doubt has significance for Microsoft in that it is at the centre of the questions currently being asked by the European Commission, not least in the ongoing anti-trust investigations. Microsoft have responded by conspicuously committing to an interoperability programme and have made noises of the right kind in press releases, which have drawn a very guarded response from the EU:

The Commission would welcome any move towards genuine interoperability. Nonetheless, the Commission notes that today's announcement follows at least four similar statements by Microsoft in the past on the importance of interoperability.

- there is definitely a sense that Microsoft are drinking at the last-chance saloon here.

Meanwhile, on the other hand Microsoft’s competitors are keen for Microsoft to fail. The game being played has several outcomes in which the key measures are whether Microsoft achieves interoperability in practice, and whether it is perceived to do have done so.

A. Microsoft fails to achieve interoperability, and is correctly perceived to have failed.

B. Microsoft does achieve interoperability, but is incorrectly perceived to have failed.

C. Microsoft fails to achieve interoperability, but is incorrectly perceived to have succeeded.

D. Microsoft does achieve interoperability, and is correctly perceived to have done so.

The optimal outcome for Microsoft (I assume it is a sane corporate entity, and so tends to monopoly) is box C. They effectively get to maintain barriers to market entry but are judged to have played well: the EU threat goes away.

The optimal outcome for Microsoft’s competitors (I assume they are also sane) is box B. In this outome, the market becomes more open to them as a result of Microsoft’s initiatives, but Microsoft is unjustly hammered for their efforts and have to deal with the regulatory and PR fallout.

As usual we poor users (and we poor standards people) are stuck in between. The best outcome for us is either box A or box D (with a preference for box D). We end up with the freer market that interoperability promises either through Microsoft’s cooperation (D), or  - less efficiently - through punitive action taken by a regulatory system (A).

Astute readers will note that the optimal outcomes for the corporations (both Microsoft and its competitors) have a defining property: they both rely on a mismatch between actuality and perception - misinformation. It is for this reason, I believe, that there has been such an all-out assault on the blogosphere by some of Microsoft’s competitors (and, to a lesser extent, by Microsoft itself). Have no doubt about it, this is war - as IBM’s Rob Weir put it, "It’s called a 'standards war'. Look it up. Whining about it won't make it go away".

And so, as in any war, truth is at a premium. Fortunately, when it come to Office Document interoperability we do not need to rely on bitter blog exchanges, the warm words of press releases, or even on the success (or not) of workshops in Redmond. Questions of interoperability will be decided by the cold hard fact that certain bytes arranged in certain ways will betoken good behaviour; other arrangements will betoken bad behaviour. We, the users, can measure which is which, and we, the users, can improve the tests by making the standards that govern office document more thorough. If we deal honestly and standardise well, the optimal outcome for us is within reach.

Reviewing the pieces

My particular interest in interoperability is between Office document formats – word processing documents in particular. The international standardisation of ODF 1.0 (IEC/ISO 26300:2006) and OOXML (IEC/ISO 29500:2008) has sparked off plenty of commentary, and also plenty of opportunities for the spreading of misinformation. I think, in the interests of “honest dealing” it will be useful to have a review of the current state of play: exactly what specifications do we have on the table?

“ODF”

“ODF” is a term that is often used very casually. However, when examining interoperability, details and exactness matter so let us enumerate the 4 main variants (in 5 documents) that we have:

  1. The OpenDocument v1.0 specification was approved as an OASIS Standard on 1 May 2005. It is not an ISO/IEC standard.
  2. ISO/IEC 26300:2006 Open Document Format for Office Applications (OpenDocument) v1.0 is a version of the OASIS 1.0 standard with substantive revisions, as applied through the JTC 1 standardisation process. It is an ISO/IEC, but not an OASIS standard.
  3. The OpenDocument v1.0 (Second Edition) specification was approved as an OASIS Committee Specification on 19 July 2006. It is practically identical to ISO/IEC 26300:2006. It is not an OASIS standard.
  4. (Note: that the above three “1.0” variants are, practically, obsolete – as a co-chair of the OASIS ODF TC put it, “No one supports ODF 1.0 today”.)

  5. ODF 1.1 was approved as OASIS Standard on 2 February 2007. It is a small but significant update of its predecessor version. It is an OASIS standard, but not an ISO/IEC standard.
  6. “ODF 1.2” is being drafted. It is not an OASIS standard. It is not an ISO/IEC standard. Nobody knows for sure what it will end up containing. It does not formally exist.

Estimates vary about when 1.2 will be finished and standardised. A reasonable guesstimate would be that is will be published as an OASIS standard in Q1 2009, and – if it is submitted to, and passes, a JTC 1 process – become an International Standard in Q1 2010. But these are my guesstimates based on conversations with OASIS and ISO/IEC people.

Now practically speaking there are some key points to note about these versions:

  • The variant most in use currently is (4) – “ODF 1.1”. This is the variant to which the output of most “ODF” applications most nearly approximates.
  • The variant that many countries and big international organisations have signed up for is (2) – that is because it has the all-important ISO/IEC (not OASIS) imprimatur.
  • The recently released OpenOffice.org 3.0 application promises support for what is called “features of the upcoming version 1.2 of the ISO standard OpenDocument Format (ODF)”. Not quite right – and the kind of casual wearing of the “ISO” (actually ISO/IEC) brand that is causing deep disquiet among its custodians. The format that OpenOffice.org 3.0 would be more properly labelled  “OpenOffice.org 3.0 format”, and described as a guess by that product’s developers at what ODF 1.2 might eventually end up being.
  • I have seen mentions in one or two anti-MS places that Microsoft “should be adopting ODF 1.2” in their upcoming version of Office. Absolutely not! The last thing we should be doing is encouraging Microsoft to target a non-existent format, thereby allowing their developers to improvise (just as OpenOffice.org’s have). It’s not hard to see how that would almost certainly lead to an interoperability disaster.

“OOXML”

The situation for OOXML is simpler, in part because it is a newer standard.

But first, the name. “Office Open XML File Formats” is what it is called. At the SC 34 Jeju plenary it had reached the third time when somebody had mistakenly drafted projected text mentioning “Open Office” rather than “Office Open” when I turned in exasperation to the delegate next to me and whispered “whoever came up with that name deserves to be shot”. The reply? “Errr, it was me.” The responsible Microsoftie shall remain nameless.

Anyway, if there is one thing worse than calling it “Office Open XML” is it the abbreviated form “Open XML”. That poor word “open” has had such a battering – and this standard really does not need a nickname. No, in my book “OOXML” or (better) “twenty-nine-five-hundred” it has to be.

There are two main variants of this specification:

  1. Ecma 376. This was the specification prepared by Ecma International as an input into the Fast Track JTC 1 process. It also represents the format currently used by Microsoft Office 2007. I have heard mutterings that Microsoft Office fails to consume and emit conformant Ecma 376, but no test I have carried out or heard of suggests that these mutterings are true.
  2. ISO/IEC 29500:2008. This is the standardised, but yet-to-be-published version of OOXML that has been very substantially revised by the nations participating in the JTC 1 Fast Track standardisation process. The publication text has already been distributed to the standardisation nations for their information, and the standard will appear later (guesstimate: December 2008). So, strictly speaking this standard (like ODF 1.2) does not formally exist yet. Note that this specification has become a multi-part standard – and so is effectively now four standards:
    • Part 1: Fundamentals and Markup Language Reference. The core of the standard.
    • Part 2: Open Packaging Conventions. A small  specification describing a ZIP-based packaging mechanism. This is generally rather liked and there has been some talk of future versions of ODF borrowing it.
    • Part 3: Markup Compatibility and Extensibility. A small specification describing how the markup underlying OOXML may be extended.
    • Part 4: Transitional Migration Features. This is the “toxic dump” – all those nasty features like the infamous autoSpaceLikeWord95 are confined to this section, which may be ignored by implementations pursuing "strict" conformance.

There is currently no implementation of ISO/IEC 29500. The current version of Micosoft Office does not emit ISO/IEC 29500 compliant documents because of some minor lexical differences between Ecma 376 and the ISO/IEC evolution of the standard (using, of course, its Part 4 - Transitional Migration Features).

Looking forward

So we have reviewed the set of specifications we are concerned with when it comes to testing interoperability. In my next post I will describe a simple technical test (which I will be taking to Redmond) that we can use with some of the major implementations to get some evidence on what the state of “interoperability” really is at the moment.