Mastodon
Where is there an end of it? | All posts tagged 'DF'

The Maintenance of ODF – an Aide-mémoire

There is some inaccurate information swirling around on the web about the maintenance of ISO/IEC 26300:2006 – Open Document Format for Office Applications (OpenDocument) v1.0.

For those following the story of document format standardisation, this blog entry sets out the current situation ahead of the upcoming JTC 1 plenary in Nara, Japan, where this very topic is likely to be discussed and, one hopes, get debugged.

Background

The diagram above illustrates the current and planned major variants of the ODF standard.

The topmost is the OASIS standard 1.0, published by OASIS afters its approval in May 2005.

This OASIS standard was submitted by OASIS to JTC 1 for PAS transposition in October 2005. It passed its ballot with no dissent in May 2006, although a number of countries requested substantive fixes and improvements.

Because there had been no negative votes (only approves and abstention) in the ballot, the ballot resolution meeting (BRM) for the new standard was cancelled. (The UK objected to this decision at the May 2006 SC 34 plenary meeting in Seoul.)

Based on the comments from NBs, some substantive fixes and improvements were duly made to ODF, and ITTF incorporated these into the text of ISO/IEC 26300:2006, published in November 2006.

An equivalent text, an OASIS Committee Specification (not a standard, N.B.) called “OpenDocument v1.0 (Second Edition)”, had been published by OASIS in July 2006.

OASIS subsequently authored and published a new OASIS standard, ODF 1.1. This was published three months after ISO 26300:2006, i.e. in February 2007. OASIS did not seek cooperation in this from any part of ISO/IEC, nor did them submit the revised specification to JTC 1.

OASIS then began work on ODF 1.2, again without any ISO/IEC involvement.

In July 2008 the co-chair of the OASIS ODF TC announced in a blog entry: “[n]o one supports ODF 1.0 today. All of the major vendors have moved on to ODF 1.1, and will be moving on to ODF 1.2 soon.”

Throughout 2007 Japan, who were translating ISO/IEC 26300 into Japanese, fed reports of defects to OASIS via an OASIS mailing list. A formal set of Defect Reports was submitted by the Japanese National Body in December 2007 and circulated to SC 34 members and liaisons (including OASIS). The JTC 1 Directives state that the Project Editor must respond to a Defect Report for a JTC 1 standard within two months. SC 34 received no response until August 2008, when it was informed by the OASIS ODF TC that a register of errata in the OASIS standard had been published.

OASIS have produced errata document which apply corrections for some of the defects that have been reported. Note however that OASIS cannot amend the text which is the basis of ISO/IEC 26300, as this text has only the status of “Committee Specification” within OASIS. Hence they propose amending the defective OASIS 1.0 (“1st Ed”) Standard, creating a new fork of the ODF specification. SC 34 are expected to cross-apply these fixes to their corresponding locations within ISO/IEC 26300.

It is unclear whether the reported defects which also apply to ODF 1.1 are to be applied in any way.

Communications from OASIS make it clear that OASIS believes it has entered into an agreement with JTC 1 which allows it to maintain ISO/IEC 26300 in a way which exempts it from the maintenance provisions of the JTC 1 Directives.

Problems

OASIS’s continually restated stated intention in its communications with JTC 1 is to prevent divergence of ODF versions. This goal has clearly not been realised, with a proliferation of versions of ODF inside OASIS and pronounced marketplace confusion.

For example, it should be of concern to JTC 1 members that the OpenOffice.org product is promoted as supporting “features of the upcoming version 1.2 of the ISO standard OpenDocument Format (ODF)”.

OASIS’s continually restated intention in its communications with JTC 1 is to maintain a collaborative relationship. However there has not always been evidence of collaboration. Input from the ISO/IEC members has not been sought. Where input has been provided, it has sometimes been met with delay and dismissiveness.

The agreement that JTC 1 has reached with OASIS appears to be being operated in a way which breaches the JTC 1 Directives. The relevant portions of the Directives are given below (all emphasis mine):

Maintenance for a transposed PAS is also negotiated in the Explanatory Report. JTC 1's intention for maintenance is to avoid any divergence between the current JTC 1 revision of a transposed PAS and the current revision of the original specification published by the PAS submitter. Therefore, the Explanatory Report should contain a description of how the submitting organisation will work cooperatively with JTC 1 on maintenance of the standard. While JTC 1 is responsible for maintenance of the standard, this does not mean that JTC 1 itself must perform the maintenance function. JTC 1 may negotiate with the submitter the option of maintenance handled by the submitter as long as there is provision for participation of JTC 1 experts, i.e. the submitter's group responsible for maintenance is designated as the JTC 1 maintenance group. (Directives, 14.4.2)
For the maintenance of an International Standard of whatever origin normal JTC 1 rules apply. Such rules distinguish between correction of defects and revisions of or amendments to existing Standards. Note: The JTC 1 rules for maintenance are found in clause 15 of the JTC 1 Directives. For the correction of defects, JTC 1 provides for the installation of an editing group. Active participation of the submitter in such an editing group is expected and strongly encouraged. Depending on the degree of openness of the PAS submitter, JTC 1 will determine its specific approach. (Directives, M6.1.5)

Therefore it is clear that while maintenance may (in the lax wording of the Directives) be “handled” by the submitter, it is not possible for the submitter to exempt themselves from normal JTC 1 rules, as “for the maintenance of an International Standard of whatever origin normal JTC 1 rules apply”. From this it follows that a submitter’s “handling” of maintenance is limited, and that the decision-making procedures and time periods specified by the JTC 1 Directives must apply.

Remedies?

Obviously this is all an enormous mess and while it is tempting to blame lawyerly over-cleverness on the part of OASIS, or insufficient alertness on the part of JTC 1, in negotiating their so-called maintenance agreement, the true culprit is, in my view, the JTC 1 Directives – such an impenetrable document has, evidently, led to a completely different understanding of the situation from the several parties involved. This procedural mishap is, I argue, further evidence of the need to scrap and re-write the JTC 1 Directives as a short, clear and professionally drafted document. Already this year we have seen that when tough questions get asked, the Directives are not fit for purpose; we are seeing the same thing again now.

The immediate problem faced is, however, the future of ODF in JTC 1. This is not a matter for SC 34, or for the ODF TC (both of which groups are full of excellent  technical experts wanting nothing more that to produce good standards) – this is something that must be resolved at a higher level between JTC 1 and OASIS. In the usual way of things, the developers are being hampered by the management.

The essence of the problem is that a central principle is being missed now: that only a standard that has a truly international dimension to its control should benefit from the ISO, IEC or JTC 1 “brand”. Some immediate remedies might include some mix of the following:

  • Since there seems to be general agreement that ISO/IEC 26300 is an obsolete version of ODF, perhaps it should be withdrawn as an IS – maybe in parallel with a PAS submission of ODF 1.1. That would at least give the world an IS that was widely used and a veil could be drawn over the 1.0 standardisation mess.
  • SC 34 has already stated it is open to suggestions how future maintenance should be arranged in a genuinely collaborative manner. Patrick Durusau (the ODF editor) has drafted a proposed agreement in that spirit. Also, OASIS might well have a thing or two to learn by looking at how Ecma has managed to enter into a collaborative arrangement for the maintenance of ISO/IEC 29500 within JTC 1.
  • The immediate defects in ISO/IEC 26300:2006 could be resolved by the formation of an editing group in SC 34. Indeed OASIS itself seemed to expect this in the explanatory report which accompanied their initial PAS submission which stated: “OASIS requests that any corrections of defects or errata from the JTC1 process be re-presented to the OASIS Technical Committee.” Per the Directives, OASIS TC members should be encouraged to participate in any such group.

Ultimately, it is for the nations participating in JTC 1 to decide how this matter can be resolved. The current situation sells-out the nations by allowing their brand (“international”) to be perpetuated in a process from which they are effectively excluded. This is “standardisation by corporation” through the back door. Whatever is decided, this must not go on.

OOXML Gets Boring

2008 has been an exciting year for document format standards. 2009 will, I predict, be rather more boring.

This at least is the conclusion I reached after attending the recent DII workshop organised by Microsoft – and if I say the event was boring I merely mean that we can confidently expect document formats to stop being the at the centre of a spectator sport, and start returning to the land of techies and standards wonks. Boring, but reassuringly so; for while the more slashdotty spectators may prefer the ya-boo exchanges that characterised 2008, for us techies and standards wonks, boring is good – even … exciting.

Boring includes discussion of such topics as:

  • The effect of hyphenation dictionaries and justification algorithms on line breaking, and the impact of these considerations on achieving reproducible documents across implementations
  • How to decide what the chief document archetypes were for spreadsheets, word-processing documents and presentations
  • The distinction between an erratum and an amendment for an IEC/ISO standard
  • How to assemble and administer a collection of representative documents for assessing implementation conformance
  • How to validate the semantic constraints inherent in an OOXML document
  • The trade-off between format-specific and generic document APIs
  • How to facilitate server-side document generation
  • The trade-off between user convenience and standards adherence
  • The quirks of string sharing in Excel
  • How to document the implementation decisions an application makes which imposing further constraints on the underlying XML
  • How to re-purpose legacy Authorware training materials into OOXML

All good solid stuff, laying the groundwork for the people who really matter in this process (and who perhaps have too often been overlooked) – the end users. Doug Mahugh has a further write-up and links to the presentations on his blog.

Granted, a few eyebrows were raising during one presentation (which has not appeared among the others) which gave a startling frank overview of the challenges Microsoft are anticipating in implementing ISO/IEC 29500, from sucky performance in the deserialision code in PowerPoint, to dumb mistakes in Ecma 376, to coping with the fact that under certain circumstances Office 2007 emits content which is invalid against the Ecma 376 schemas.

I found this last revelation truly heartening – Microsoft did not need to make it, and to date (so far as I am aware) nobody has “caught” Office 2007 emitting invalid XML content. Yet here was MS ’fessing up and asking about ways they could stop it happening in future. All software companies (not just Microsoft) need to have a plain-dealing up-front approach to publicising problems. Such an approach has benefitted the security landscape and will have big benefits for document processing and conformance (and yes, for interoperability too). It is good to see the seeds of such a mature approach – I look forward to seeing Microsoft make this information public soon, and to something equivalent starting up for non-Microsoft ODF implementers too, bearing in mind that (with apologies to Alexander Pope):

Whoever thinks a bug-free app to see,
Thinks what n'er was, nor is, nor e'er shall be.

So, Talking of Bugs and ODF …

ODF Table Test

I had prepared a moderately hard table rendering test to take to Redmond, reasoning that table rendering is a fair indicator of the state of a layout engine beyond a basic “text and headings” level. To create this test document at home perform the following steps:

  1. Create a 5x5 Table
  2. Number the cells starting at the top left and moving left-to-right, top-to-bottom until you reach number 25 at the bottom right
  3. Merge consecutive cells to achieve the result below (note I have also coloured the merged cells to make it easier to see what has happened).

Et voila, a table rendering test. Here is that table displayed in OpenOffice 2.4 (click to enlarge):

Now, let us see how OpenOffice’s version of the table opens with the SP2 beta for Word 2007:

Good – on the face of it, a mini triumph for interoperability. For comparison, I also tried to open the document with Google Docs:

Hmm  – notice the different rendering here. Most obviously, the yellow cell which combines the original cells 4, 9 and 14 does not span downward, whereas that is what wanted to achieve when we created the table.

Looking at the ODF source, everything appears to be in order. The top-most spanned cell is marked-up as follows

<table:table-cell table:style-name="Table1.A1"
    table:number-rows-spanned="3" office:value-type="string">
  <text:p text:style-name="Table_20_Contents">4 9 14</text:p>
</table:table-cell>

The number-rows-spanned="3" attribute specified the row-spanning correctly, and the spanned-into cells (not shown) are properly marked-up with <table:covered-table-cell/> elements as the ODF spec suggests. (Interesting note: at no point is the ODF spec explicit that row spanning operations apply downwards – I have come across XML table models – Arbortext’s for example - which specify that spans apply upwards, and it is theoretically open for an ODF implementation to chose to do that too. So much for interoperability!)

So I think here we can reasonably point the finger at Google Docs and say that its table renderer is faulty – these guys need to catch up with OpenOffice and Microsoft.

Curiously, opening this test file with an early version of OpenOffice.org (1.1.5) gives the same rendering error as today’s Google docs:

And so we have seen a minor failure of interoperability. Of course it might not be so minor if these documents contained more important information (financial or medical data, e.g.) and not just pretty colours. This test however, just scratches the surface – for a more thoroughgoing examination of the poor state of ODF interoperability, readers can turn to the recent study by Shah and Kesan (and no, OOXML does not emerge hugely better from this either).

Looking Forward

Achieving interoperability appears to be the new focus for both the developers and standardisers working on document formats. For ODF, OASIS has the new Open Document Format Interoperability and Conformance (OIC) TC to advance work in this area. Microsoft is not currently represented here, and it is to be hoped they might soon overcome their shyness and participate. After all, when Office 2007 SP2 ships, Microsoft Office will quickly become the predominant ODF implementation, and it is important everybody works together to ensure they improve conformance and interoperability, and that where the ODF specification is insufficient for this, feedback is returned to ODF’s custodians.

Microsoft too are evidently thinking about interoperability and several presentations at the DII workshop were concerned with work to build a repository of representative Office documents to provide input into conformance and interoperability testing processes. Such an initiative is useful, though ideally it should take place under the aegis of a standards committee (e.g.  OASIS or SC 34 / WG 4), to parallel the activities taking place for ODF.

There was indeed, plenty of corridor discussion about how the future standards arrangements for ODF and OOXML might be best organized. I was particularly pleased to meet Dennis Hamilton (aka Orcmid), a member of the OASIS ODF TC and secretary to the OIC TC there – and we had plenty of constructive discussions about the how the current impasse in ODF maintenance might best be negotiated. However, the immediate solution to that particular problem now lies above the reach of mere committee members like us; it is between the lawyers and officials of OASIS and JTC 1.

R&R

After the workshop finished, there was time before my midnight flight for some R&R. Doug Mahugh was good enough to indulge my confession to being a Frasier fan and give me a tour of Seattle (where he grew up). It was good too to take a break from document formats and discuss less controversial topics like the US election, the identity of Mini-Microsoft, and Kirk/Spock porn!

 

Alex and Lunar Orbiter
Me being shown Seattle culture (photo: Doug Mahugh)

[UPDATE: Jesper Lund Stocholm has also just blogged on the topic of ODF support in MS Office — highly recommended.]

[UPDATE 2: wifely perspective.]

Standards News …

… Big …

It doesn’t get much bigger than this: 30 years after first joining, China has become the sixth permanent member of ISO, joining the original 5: the United States, Germany, Britain, France and Japan. More here.

… and Small …

Zooming from the world’s most populous nation to the level of the individual, Patrick Durusau has written a new piece entitled “My Standards Education” which (characteristically) presents a positive message, encouraging people who care, to get involved in standards. Anyone who has the wherewithal and is not currently contributing could does a lot worse than take up one of his suggestions and join OASIS – perhaps to contribute to the vitally important work of the new ODF Interoperability and Conformance TC. More about that particular effort from Rob Weir here.

Come on people – as Patrick would happily observe, we won’t get better standards just by writing blogs …

[Update: Patrick has just issued another post, on one aspect of different approaches ODF and OOXML take to markup.]

Office Document Interop

Later this week I will be participating in one of Microsoft’s DII (Digital Interoperability Initiative) workshops in Redmond (full disclosure: my company will be reclaiming travel and accommodation expenses from Microsoft) and am looking forward to assessing some of the work in this area not as a standards person, but as a (less circumscribed) free agent.

Now, “interoperability” has become a slightly contentious word. Sun’s Simon Phipps, for example, while speaking at The 2nd International ODF User Workshop showed a slide in which he argued for “Substitutability not Interoperability”. I am not sure there is a meaningful distinction between the two terms but it is true “substitutability” does convey forcibly what (I take it) the user requirement is in this area: to be able to take advantage of open standard formats to process data with a choice of software, and to avoid application lock-in.

The word “interoperability” also no doubt has significance for Microsoft in that it is at the centre of the questions currently being asked by the European Commission, not least in the ongoing anti-trust investigations. Microsoft have responded by conspicuously committing to an interoperability programme and have made noises of the right kind in press releases, which have drawn a very guarded response from the EU:

The Commission would welcome any move towards genuine interoperability. Nonetheless, the Commission notes that today's announcement follows at least four similar statements by Microsoft in the past on the importance of interoperability.

- there is definitely a sense that Microsoft are drinking at the last-chance saloon here.

Meanwhile, on the other hand Microsoft’s competitors are keen for Microsoft to fail. The game being played has several outcomes in which the key measures are whether Microsoft achieves interoperability in practice, and whether it is perceived to do have done so.

A. Microsoft fails to achieve interoperability, and is correctly perceived to have failed.

B. Microsoft does achieve interoperability, but is incorrectly perceived to have failed.

C. Microsoft fails to achieve interoperability, but is incorrectly perceived to have succeeded.

D. Microsoft does achieve interoperability, and is correctly perceived to have done so.

The optimal outcome for Microsoft (I assume it is a sane corporate entity, and so tends to monopoly) is box C. They effectively get to maintain barriers to market entry but are judged to have played well: the EU threat goes away.

The optimal outcome for Microsoft’s competitors (I assume they are also sane) is box B. In this outome, the market becomes more open to them as a result of Microsoft’s initiatives, but Microsoft is unjustly hammered for their efforts and have to deal with the regulatory and PR fallout.

As usual we poor users (and we poor standards people) are stuck in between. The best outcome for us is either box A or box D (with a preference for box D). We end up with the freer market that interoperability promises either through Microsoft’s cooperation (D), or  - less efficiently - through punitive action taken by a regulatory system (A).

Astute readers will note that the optimal outcomes for the corporations (both Microsoft and its competitors) have a defining property: they both rely on a mismatch between actuality and perception - misinformation. It is for this reason, I believe, that there has been such an all-out assault on the blogosphere by some of Microsoft’s competitors (and, to a lesser extent, by Microsoft itself). Have no doubt about it, this is war - as IBM’s Rob Weir put it, "It’s called a 'standards war'. Look it up. Whining about it won't make it go away".

And so, as in any war, truth is at a premium. Fortunately, when it come to Office Document interoperability we do not need to rely on bitter blog exchanges, the warm words of press releases, or even on the success (or not) of workshops in Redmond. Questions of interoperability will be decided by the cold hard fact that certain bytes arranged in certain ways will betoken good behaviour; other arrangements will betoken bad behaviour. We, the users, can measure which is which, and we, the users, can improve the tests by making the standards that govern office document more thorough. If we deal honestly and standardise well, the optimal outcome for us is within reach.

Reviewing the pieces

My particular interest in interoperability is between Office document formats – word processing documents in particular. The international standardisation of ODF 1.0 (IEC/ISO 26300:2006) and OOXML (IEC/ISO 29500:2008) has sparked off plenty of commentary, and also plenty of opportunities for the spreading of misinformation. I think, in the interests of “honest dealing” it will be useful to have a review of the current state of play: exactly what specifications do we have on the table?

“ODF”

“ODF” is a term that is often used very casually. However, when examining interoperability, details and exactness matter so let us enumerate the 4 main variants (in 5 documents) that we have:

  1. The OpenDocument v1.0 specification was approved as an OASIS Standard on 1 May 2005. It is not an ISO/IEC standard.
  2. ISO/IEC 26300:2006 Open Document Format for Office Applications (OpenDocument) v1.0 is a version of the OASIS 1.0 standard with substantive revisions, as applied through the JTC 1 standardisation process. It is an ISO/IEC, but not an OASIS standard.
  3. The OpenDocument v1.0 (Second Edition) specification was approved as an OASIS Committee Specification on 19 July 2006. It is practically identical to ISO/IEC 26300:2006. It is not an OASIS standard.
  4. (Note: that the above three “1.0” variants are, practically, obsolete – as a co-chair of the OASIS ODF TC put it, “No one supports ODF 1.0 today”.)

  5. ODF 1.1 was approved as OASIS Standard on 2 February 2007. It is a small but significant update of its predecessor version. It is an OASIS standard, but not an ISO/IEC standard.
  6. “ODF 1.2” is being drafted. It is not an OASIS standard. It is not an ISO/IEC standard. Nobody knows for sure what it will end up containing. It does not formally exist.

Estimates vary about when 1.2 will be finished and standardised. A reasonable guesstimate would be that is will be published as an OASIS standard in Q1 2009, and – if it is submitted to, and passes, a JTC 1 process – become an International Standard in Q1 2010. But these are my guesstimates based on conversations with OASIS and ISO/IEC people.

Now practically speaking there are some key points to note about these versions:

  • The variant most in use currently is (4) – “ODF 1.1”. This is the variant to which the output of most “ODF” applications most nearly approximates.
  • The variant that many countries and big international organisations have signed up for is (2) – that is because it has the all-important ISO/IEC (not OASIS) imprimatur.
  • The recently released OpenOffice.org 3.0 application promises support for what is called “features of the upcoming version 1.2 of the ISO standard OpenDocument Format (ODF)”. Not quite right – and the kind of casual wearing of the “ISO” (actually ISO/IEC) brand that is causing deep disquiet among its custodians. The format that OpenOffice.org 3.0 would be more properly labelled  “OpenOffice.org 3.0 format”, and described as a guess by that product’s developers at what ODF 1.2 might eventually end up being.
  • I have seen mentions in one or two anti-MS places that Microsoft “should be adopting ODF 1.2” in their upcoming version of Office. Absolutely not! The last thing we should be doing is encouraging Microsoft to target a non-existent format, thereby allowing their developers to improvise (just as OpenOffice.org’s have). It’s not hard to see how that would almost certainly lead to an interoperability disaster.

“OOXML”

The situation for OOXML is simpler, in part because it is a newer standard.

But first, the name. “Office Open XML File Formats” is what it is called. At the SC 34 Jeju plenary it had reached the third time when somebody had mistakenly drafted projected text mentioning “Open Office” rather than “Office Open” when I turned in exasperation to the delegate next to me and whispered “whoever came up with that name deserves to be shot”. The reply? “Errr, it was me.” The responsible Microsoftie shall remain nameless.

Anyway, if there is one thing worse than calling it “Office Open XML” is it the abbreviated form “Open XML”. That poor word “open” has had such a battering – and this standard really does not need a nickname. No, in my book “OOXML” or (better) “twenty-nine-five-hundred” it has to be.

There are two main variants of this specification:

  1. Ecma 376. This was the specification prepared by Ecma International as an input into the Fast Track JTC 1 process. It also represents the format currently used by Microsoft Office 2007. I have heard mutterings that Microsoft Office fails to consume and emit conformant Ecma 376, but no test I have carried out or heard of suggests that these mutterings are true.
  2. ISO/IEC 29500:2008. This is the standardised, but yet-to-be-published version of OOXML that has been very substantially revised by the nations participating in the JTC 1 Fast Track standardisation process. The publication text has already been distributed to the standardisation nations for their information, and the standard will appear later (guesstimate: December 2008). So, strictly speaking this standard (like ODF 1.2) does not formally exist yet. Note that this specification has become a multi-part standard – and so is effectively now four standards:
    • Part 1: Fundamentals and Markup Language Reference. The core of the standard.
    • Part 2: Open Packaging Conventions. A small  specification describing a ZIP-based packaging mechanism. This is generally rather liked and there has been some talk of future versions of ODF borrowing it.
    • Part 3: Markup Compatibility and Extensibility. A small specification describing how the markup underlying OOXML may be extended.
    • Part 4: Transitional Migration Features. This is the “toxic dump” – all those nasty features like the infamous autoSpaceLikeWord95 are confined to this section, which may be ignored by implementations pursuing "strict" conformance.

There is currently no implementation of ISO/IEC 29500. The current version of Micosoft Office does not emit ISO/IEC 29500 compliant documents because of some minor lexical differences between Ecma 376 and the ISO/IEC evolution of the standard (using, of course, its Part 4 - Transitional Migration Features).

Looking forward

So we have reviewed the set of specifications we are concerned with when it comes to testing interoperability. In my next post I will describe a simple technical test (which I will be taking to Redmond) that we can use with some of the major implementations to get some evidence on what the state of “interoperability” really is at the moment.