Real Conformance for ODF?

by Alex Brown 22. February 2009 17:12

There has been quite a lot of hubbub recently about ODF conformance, in particular about how conformance to the forthcoming ODF 1.2 specification should be defined.

A New Conformance Clause

Earlier versions of ODF (including ISO/IEC 26300) already defined conformance - it was simply a question of obeying the schema. So in ODF 1.1, for example, we had this text:

Conforming applications [...] shall read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place [...] (1.5)

and that was the simple essence of ODF conformance.

This is now up for reconsideration. The impetus for altering the existing conformance criteria appears to have come from a change in OASIS's procedures, which now require that specifications have “a set of numbered conformance clauses”, a requirement which seems sensible enough.

However, the freshly-drafted proposal which the OASIS TC has been considering goes further than just introducing numbered clauses: it now defines two categories of conformance:

  1. “Conforming OpenDocument Document” conformance
  2. “Conforming OpenDocument Extended Document” conformance

as shorthand, we might like to characterise these as the “pure” and “buggered-up” versions of ODF respectively.

The difference is that the “pure” version now forbids the use of foreign elements and attributes (i.e. those not declared by the ODF schema), while the “buggered-up” version permits them.

Ructions

The proposal caused much debate. In support of the new conformance clause, IBM's Rob Weir described foreign elements (formerly so welcome in ODF) as proprietary extensions that are “evil” and as a “nuclear death ray gun”. Questioning the proposal, KOffice's Thomas Zander wrote that he was “worried that we are trying to remove a core feature that I depend on in both KOffice and Qt”. Meanwhile Microsoft's Doug Mahugh made a counter-proposal suggesting that ODF might adopt the Markup Compatibility and Extensibility mechanisms from ISO/IEC 29500 (OOXML).

Things came to a head in a 9-2-2 split vote last week which saw the new conformance text adopted in the new ODF committee specification by will of the majority. Following this there was some traffic in the blogosphere with IBM's Rob Weir commenting and Microsoft's Doug Mahugh counter-commenting on the vote and the circumstances surrounding it.

Shadow Play

What is to be made of all this? Maybe Sun, whose corporate memory still smarts from Microsoft's “extend and embrace” Java attempts, thinks this is a way to prevent a repeat of similar stunts for ODF. Or perhaps this is a way to carve out a niche for OpenOffice to enjoy “pure” status while competitor applications are relegated to the “buggered-up” bin. Maybe it is envisaged that governments might be encouraged to procure only systems that deal in “pure” ODF. Maybe foreign elements really are the harbinger of nuclear death.

Who knows?

Whatever the reasons behind the reasons, there is clearly an “absent presence" in all these discussions: Microsoft Office. And in particular the forthcoming Microsoft Office 2007 SP2 with its ODF support. It is never mentioned, except in an occasional nudge-nudge wink-wink sort of way.

This controvery is most bemusing. This is in part because the “Microsoft factor” appears not to be a factor anyway, since MS Office will (we are told) not use foreign elements for its ODF 1.1 support. But the main reason why this is bemusing is that this discussion (whether or not to permit foreign elements) is completely unreal. There seems to be an assumption that it matters – that conformance as defined in the ODF spec means something important when it comes to real users, real procurement, real development or real interoperability.

It doesn't mean anything real - and here's why...

Making an ODF-conformant Office Application

Let us consider the procurement rules of an imaginary country (Vulgaria, say). Let us further imagine that Vulgaria's government wants to standardize on using ODF for all its many departments. After many hours of meetings, and the expenditure of many Vulgarian Dollars on consultancy fees, the decision is finally made and an official draws up procurement rules to stipulate this:

Any office application software procured by the Government of Vulgaria must support ODF (ISO/IEC 26300), and must conform to the 'pure' conformance class defined in clause x.y of that Standard, reading and emitting only ODF documents that are so conformant".

Sorted, they think.

Now imagine a software company that has its eye on making a big sale of software licenses to Vulgaria. Unfortunately, its office application does not meet the ODF conformance criterion set out by the procurement officer. The marketing department is duly sad. But one day a bright young developer gets to hear of the problem and proposes a solution. He boldy proclaims “I can make our format ODF-conformant today!”, and proceeds to show how.

First he gets a template ODF document, like this:

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
office:version="1.0">
<office:body>
<office:text>
<text:p></text:p>
</office:text>
</office:body>
</office:document-content>

This document (he points out) meets the “pure” conformance criteria. Our young hacker then does a curious thing: he takes an existing (non-ODF) file from their office software, BASE-64 encodes it, and inserts the resulting text string into the element in the template document.

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
office:version="1.0">
<office:body>
<office:text>
<text:p><!-- several MBs of BASE-64 encoded content here --></text:p>
</office:text>
</office:body>
</office:document-content>

There, he proudly proclaims. All we need to do it to wrap our current documents with the ODF wrapper when we save, and unwrap when we load – I can have a fresh build for you tomorrow.

The rest of the story is not so happy: the software company makes the sale and the government of Vulgaria finds after installation that none of the files from it will interoperate with any other ODF files from other sources, despite the software company having met its procurement rules to the letter.

Far fetched?

Okay, that story makes an extreme example – but it neverthess illustrates the point. It is possible for a smart developer to represent pretty much anything as a “pure” ODF document; any differences and incompatibilities can ever-so-easily be shoehorned into conformant ODF documents. That some software deals only in such pure ODF means precisely zero in the real world of interoperability.

The central consideration here is that ODF conformance only ever was (and is only projected to be) stated in terms of XML, and XML is (in)famously “all syntax and no semantics”. The semantics of an ODF document (broadly, all the narrative text in the specification) play no part in conformance can remain unimplemented in a conformant processor. An ODF developer can safely use just the schema and never read much else. All those descriptions of element behaviour can be ignored for the purposes of achieving ODF conformance. [N.B. mistakes in this para corrected following comment from Rob Weir, below]

So my question is: what is the current debate on ODF conformance really about? It looks to me like mis-directed effort.

What ODF might usefully do is to look at the “application description” feature introduced into OOXML. This describes several types of applications, including a type called “full”. Such applications have “a semantic understanding of every feature within [their] conformance class”, and

“Semantic understanding” is to be interpreted that an application shall treat the information in Office Open XML documents in a manner consistent with the semantic definitions given in this Specification.

In other words, it is possible to specify in OOXML procurement that the processor should heed the narrative description within that Standard (not just the XML grammar). ODF currently lacks this. In my view if there is to be any connection between a definition of ODF conformance and the experience of users in the real world, then something like OOXML's “application description” feature is urgently needed. And it might be better done now, than hastily inserted during a JTC 1 BRM ...

Comments

2/22/2009 9:30:19 PM #

Rob Weir

Alex, you seem to have entirely missed the requirement for ODF Consumers as stated in the latest approved draft: "(P1.1)It shall interpret those elements and attributes it does interpret consistent with the semantics defined for the element or attribute by this specification."

How do you reconcile that with your statement, "The semantics of an ODF document (broadly, all the narrative text in the specification) play no part in conformance"?  

Curiously, you also recommend that we copy language from OOXML that we in fact already have in the draft.  You should really read the draft.  You might find that you like it.

Of course, there are various well-known steganographic techniques that can be applied to encode additional messages and instructions in any formats, including PNG images or even plain ASCII text.  This is not a defect in any standard.  It is in the nature of data.  The problem with such techniques is the tend to break under editing, since their constraints are not specified.  Since ODF is a format specifically for office document editors, we cannot ignore the interaction of extensions and editing.  

Rob Weir United States |

2/23/2009 8:11:05 AM #

Alex

Rob hi

Are we looking at the same draft ("Committee Draft 01, February 16, 2009")?

In the P1.1 I'm looking at I have "[a conformant consumer] need not interpret the semantics of all elements, attributes and attribute values"; the text you quote is in P1.3 in the version I am looking at.

But you are right, and I am wrong ... taken together these two new clauses do take a big step towards specifying semantic conformance better.

The feature introduced into OOXML for "application descriptions" would be better still, as it defines what level of semantic conformance users can expect. As it stands for ODF 1.2 an implementation can be "conformant" yet a user has no clue from that label how full its feature support is (a lot? a little? none at all?).

On the "steganographic techniques" problem, I wonder if it's possible (if feeling paranoid) to introduce a clause which forbids the semantics of extensions from "contradicting" the semantics of native elements? That does leave things rather subjective though ... hmmm ...

- Alex.

Alex United Kingdom |

2/23/2009 4:46:23 PM #

Rob Weir

I'm not sure prohibiting contradictions gets you anywhere, since we now know that a vendor can simply say, "in a way which is fully compatible with legacy documents" to magically make anything non contradictory.  I'd rather have a requirement that is technically verifiable in a consistent manner.

In any case, lack of contradiction is not the only or even the primary concern here.  There is a vocal constituency of ODF users who want no extensions, regardless of whether they are contradictory or not.  It is not for me to argue that their concerns are not real.  Certainly other ISO standards, like ODF/A have taken this approach.  It is a legitimate approach.  Some vendors want extensions.  OK.  It is not for me to argue that their needs are not real.  So we ended up with two conformance classes, one which allows extensions and one that doesn't.  

My main remaining concern is whether the extensibility mechanisms we do allow can be used in an interoperable fashion.  Of course, we can't force a vendor to make their extensions be interoperable, but we can certainly add the technical apparatus to the standard that facilitates interoperability if a vendor choices to avail themselves of it.

Rob Weir United States |

2/23/2009 5:26:00 PM #

Alex

@Rob

I can see both sides of the extensions argument, and agree different constituencies might want different things. Personally, I have no strong preference on this.

What seems to have caused disquiet is that in what is otherwise a fairly modest semantic revision of ODF for 1.2, something so fundamental as a conformance refactoring has taken place. It is of course perfectly possible for the "non-extensionist" group to specify "no foreign elements" in their procurement contracts, if they so wish; there's no requirement for this to be embodied in a new conformance clause.

What I was hoping to convey in my piece was that this wrangling is pretty pointless. As you observe, if somebody wants to "get around" the conformance clause with fancy techniques, there will always be a way. And when it comes to interoperability, there are rather bigger questions in play than these conformance definitions (hence the OIC effort) ...

- Alex.

Alex United Kingdom |

2/23/2009 5:50:53 PM #

Rob Weir

And similarly, we could adopt a default no-extensions conformance definition and those who wish could specify their requirements as "additional extensions permitted".  But you see where that logic leads you.  You don't needs standards at all if you are willing to write arbitrarily verbose procurement requirements. I know one of one company who opposes on principle any requirement for any standard, arguing that functional requirements should be negotiated on a case-by-case basis without any preference give to any standards.

In any case, as Chair I must not confuse volume with votes.  A 9-2-2 vote is quite a strong statement regardless of the how loudly the lone company that voted 'No' complains.

I disagree with you that this is a pointless discussion.  Certainly defining the interoperability aspects of extensions is not a sufficient action to ensure a high degree of interoperability with ODF documents, but it is a necessary one.  Work such as being done in the OIC TC can help us refine our presentational and behavioral interoperability.  But to the extent we explicitly allow arbitrary extensions in ODF documents, you would be correct in arguing, if you chose to, that the work of the OIC was for naught, since the arbitrary extensions were entirely undefined and in fact undefinable and untestable by any means available to us.

In other words, we can in the ODF standard improve conformance by making more explicit the presentational and behavioral requirements for an ODF Consumer.  We're not there yet, but we're moving in the right direction.  But with arbitrary extensions, that level of interoperability is unachievable, even in principle.  It isn't an either/or question.  We need to do both.  But an argument that we shouldn't plug a leak in the boat because there is already water in the hold is disingenuous.

That isn't to say that there are not some pointless posts in this debate on the TC list, and even in some blogs.  But the overall topic is important and the time spent discussing it is not wasted.

Rob Weir United States |

2/23/2009 6:51:35 PM #

Doug Mahugh

Yes, a 9-2-2 vote is a quite strong statement, even if 5 of those votes  from from IBM and Sun alone: there is consensus on the  TC to approve the committee draft and move forward with the process of creating ODF 1.2.

Let's not confuse that vote with consensus on anything related to conformance, however: on that topic, there has never been a vote taken by the ODF TC.  And based on comments on the TC mailing list (up through and including today), it's not clear that there is any consensus on this topic.

Doug Mahugh |

2/25/2009 1:09:23 AM #

Rick Jelliffe

Alex: You mention Microsoft's failed attempt to 'embrace and extend' Java in J++ graphics, but you don't mention Eclipse's *successful* attempt in SWT.

Why the bias? Do you think that open source projects are necessarily immune from NIH? I don't think it has anything to do with open or closed-source, nor the corporate affiliations. When a standard gets too monolithic, people will only implement chunks and replace the other parts with suitable technologies from their own technology stream.

P.S. Actually, neither J++ nor SWT were 'embrace, extend and extinguish'; they were downright replacement of a major component. And I would argue that the fact that two major competitors found that Swing was too far from their legacy base to be regarded as a step forwards shows that Java was underlayered, and that in fact that the monolithic nature of WORA meant that pure Java only addressed one set of needs, not the full market range. Java has ended up fragmenting (Java ME, Java, Java+SWT, etc.) regardless of WORA.

P.P.S. I am interested in what Rob meant by "such techniques tend to break under editing". Most extensions that an application did not understand would be stripped out: I don't get how these would break anything. Being stripped out cannot be considered "breaking", surely?

Rick Jelliffe United States |

2/25/2009 7:12:23 AM #

Alex

@Rick

> Why the bias?

Wink Well, you know what an IBM apologist I am.

Actually, I wasn't necessarily passing judgement here ... just trying to understand the thinking that might be behind the current positions.

I'd forgotten about SWT; has it really become entrenched? I sort of feel ambivalent towards SWT since it made (for me) Java capable of sporting an acceptably responsive UI (in Eclipse).

- Alex.

Alex United Kingdom |

2/25/2009 11:18:28 AM #

Alan Bell

Extend is an emotive word, being part 2 of 3 when we are still waiting for Embrace to happen. Putting that aside and concentrating on the substance of the issue though it seems to me that one way to get an objective meaning for conformance in the context of an extendible standard would be for the extension not to be permitted to break if the rest of the document changes around it. Thus an application could read an ODF document, see there is an ugly mess embedded (which might not be particuarly evil like Koffice using MusicML), it could still allow the user to make arbitary changes to the sensible parts then save the document along with the ugly mess that the application does not understand. It should then open up just fine in an ODF consuming application that does understand the ugly mess. If the ugly mess gets broken because it interacts with the rest of the document then that ugly mess was not a conforming extension. If the ugly mess is no longer in the document or turns up in a place that the user does not expect it to be then the application is not behaving correctly.
An alternative strategy that looks tempting is the MIME approach where multiple renderings of the content can be included and the consumer reads the one they prefer, generally emails with both HTML and Text parts. This doesn't work very well for documents that neeed to be edited and sent back, unless that segment is protected from editing. For example a bit of musicML might be represented as text with letters for notes like AACC#BBD (no idea what that sounds like) and also a graphical representation of staves and notes drawn with SVG and also the musicML which can be played through a synthesiser. It would not be possible to edit this component without an application that understood all the multipart elements and could keep them all in sync, although it would be possible for an application to make a best effort at displaying the content for the user. I really don't like this strategy very much as it introduces a lot of redundancy and decreases interoperability of editing.

Alan Bell United Kingdom |

2/26/2009 9:29:15 AM #

Alan Bell

Changed my mind. I think that the standard should not be extensible. If applications want to push out the boundaries of what they can do then they are not using the standard, they are perhaps proposing a new feature for it. If an application wants to include MusicXML (not MusicML which is something a bit different I think) then they can propose it for the next version of the standard. (and they can do some development work to prepare for it and have the application save in OpenDocumentFormat-BuggeredUp format.) The standard should lead if it is to have any point for enhancing interoperability.
I think the real point for debate is what to call the buggered up documents.
“Conforming OpenDocument Extended Document” doesn't really do it for me. "Buggered up" or "bastardised" is a bit more clear, but is not typical ISO vocabulary. Users should be aware that they are not using a released version of the standard, in software terms something more akin to a beta or alpha of the standard. Or even a nightly build of a branch that has not and might never be committed to the main repo.

Alan Bell United Kingdom |

2/26/2009 1:32:40 PM #

Rick Jelliffe

Alan wrote: "If applications want to push out the boundaries of what they can do then they are not using the standard, they are perhaps proposing a new feature for it."

Huh? Isn't that circular? If the standard allowed extensibility, then they are using the standard even when they extend it. If the standard disallows extensibility, they are not using it when they extend.

You are assuming utterly generic applications, aren't you? For example, lets say I have a DOCBOOK editor that saves as ODF but with an extra attribute to say which docbook element was originally used to mark the corresponding ODF up. So that if the ODF was re-opened that information would be intact, but if it were opened by some other editor, that information would be ignored and even stripped.

Such an attribute is not something that you would ever expect OASIS ODF TC to add. But why would you want to block it?

If the aim is to make ODF as universal a format as possible, it means that it needs to be adoptable by as many different applications as possible, not just office suites. And that means it needs to be able round-trip metadata from other domain areas too.

I have a blog item at
broadcast.oreilly.com/.../...m-open-documents.html
on what I think is an approach that would be workable and meet the kinds of goals I have heard expressed.

(And, this is coming from the POV that I don't mind the conformance requirements in the current 1.2 committee draft because the problem would be if conforming ODF applications were allowed to reject documents with extensions: that would be a real problem. But I certainly agree with Alex that this looks like a misdirected effort, which does nothing but raise suspicion.)

Rick Jelliffe Australia |

2/27/2009 6:42:42 AM #

Alan Bell

Rick Jelliffe wrote: "Huh? Isn't that circular? If the standard allowed extensibility, then they are using the standard even when they extend it. If the standard disallows extensibility, they are not using it when they extend."
no, I don't think it is circular and I am thinking fairly generic applications, one of my use cases is a document management system that can dip into the documents it manages to pull out interesting bits of content (maybe a value in a particular cell of a spreadsheet) for workflow routing etc.
Round tripping metadata doesn't sound that bad to me, but may require more thought. A receiving applications must be able to edit the document without invalidating the metadata, not quite sure if this is alway the case with your docbook example.
Interestingly as you brought up Docbook I noticed this is an already thought about and solved issue:
www.docbook.org/tdg/en/html/ch05.html#s-notdocbook

Alan Bell United Kingdom |

3/12/2009 5:25:48 PM #

trackback

Trackback from Gray Matter

The value of a checkbox as defined by a standard

Gray Matter |

4/12/2009 5:35:09 PM #

pingback

Pingback from ctrambler.wordpress.com

Leopards finally revealing their spots? « CyberTech Rambler

ctrambler.wordpress.com |

Comments are closed

About the author

Alex Brown


Links

Legal

The author's views contained in this weblog are his, and not necessarily of any organisation. Third-party contributions are the responsibility of the contributor.

This weblog’s written content is governed by a Creative Commons Licence.

Creative Commons License     


Bling

Use OpenDNS  

profile for alexbrn at Stack Overflow, Q&A for professional and enthusiast programmers

Quotable

Note that everyone directly involved in the development of ISO standards is a volunteer or funded by outside sponsors. The editors, technical experts, etc., get none of this money. Of course, we must also consider the considerable expense of maintaining offices and executive staff in Geneva. Individual National Bodies are also permitted to sell ISO standards and this money is used to fund their own national standards activities, e.g., pay for offices and executive staff in their capital. But none of this money seems to flow down to the people who makes the standards.

Rob Weir

RecentComments

Comment RSS