A Bing v Google Moment

by Alex Brown 15. September 2010 16:24

Trying to download the latest version of OpenOffice.org™ I typed "openoffice" into Bing, and was surprised to get back a page of results which did not contain the official OpenOffice.org site. Google however, returned it as the top result.

Hmm, maybe not specific enough. I try entering "openoffice.org" into Bing. Same thing. Google again returns the official OpenOffice.org site as the top result.

Curious now, I enter into "free office download" into Google and again get the OpenOffice.org Site. Performing this with Bing I'm given a page for Microsoft® Office™ downloads and add-ins.

Search neutrality? pah - these two search engines have a very different view of the web it seems!

Microsoft Fails the Standards Test

by Alex Brown 31. March 2010 14:47

The second anniversary of the approval of ISO/IEC 29500 (aka OOXML) is upon us. The initial version of OOXML (Ecma 376 1st Edition) was rejected by ISO and IEC members in September 2007, and it was only after extensive revisions and a bitter standards war in the following months that a revised format was finally approved on April 2, 2008.

The key breakthrough of the revision process was the splitting of the specification into two variant versions, called “Strict” and “Transitional”. The National Bodies confined all the technologies they found unacceptable to the Transitional format and dictated text to be included in the standard intended to prohibit its further use:

“The intent […] is to enable a transitional period during which existing binary documents being migrated to DIS 29500 can make use of legacy features to preserve their fidelity, while noting that new documents should not use them. […]

This annex is normative for the current edition of the Standard, but not guaranteed to be part of the Standard in future revisions. The intent is to enable the future DIS 29500 maintenance group to choose, at a later date, to remove this set of features from a revised version of DIS 29500.”

I was convinced at the time, and remain convinced today, that the division of OOXML into Strict and Transitional variants was the innovation which allowed the Standard to pass. Enough National Bodies could then vote in good conscience for OOXML knowing that their preferred, Strict, variant would be under their control into the future while the Transitional variant (which – remember – they had effectively rejected in 2007) would remain purely for the purpose of accurately specifying old documents: a useful aim in itself.

Promises and reality

Just before the final votes were counted, Microsoft made some commitments. Mr Chris Capossela (Senior VP, Microsoft Office) wrote an open letter promising what would happen if the Standard passed. Two years on, we can fill out a report card for a couple of these promises and determine how well Microsoft is doing …

Microsoft's promise on standards support in products


“We've listened to the global community and learned a lot, and we are committed to supporting the Open XML specification that is approved by ISO/IEC in our products.”

On this count Microsoft seems set for failure. In its pre-release form Office™ 2010 supports not the approved Strict variant of OOXML, but the very format the global community rejected in September 2007, and subsequently marked as not for use in new documents – the Transitional variant. Microsoft are behaving as if the JTC 1 standardisation process never happened, and using technologies (like VML) in a new product which even the text of the Standard itself describes as “deprecated” and “included […] for legacy reasons only” (see ISO/IEC 29500-1:2008, clause M.5.1).

Knowledgeable experts present at the Ballot Resolution Meeting, knowing what Microsoft planned, have publicly repeated the International consensus position in alarm. XML Standards guru Rick Jelliffe (an Australian delegate at the meeting) wrote:

“If [Microsoft’s] default format is OOXML Transitional, then they have abandoned support for an Open Standards process: OOXML was only made a standard because of the changes that were made at the BRM. The original ECMA version of OOXML (which is the basis of Transitional) was soundly rejected, let no-one forget.”

And Danish expert and BRM delegate Jesper Lund Stocholm, running an analysis of Office 2010 files wrote:

“It has been the fear of many that Microsoft will never, ever care at all about the strict conformance clause of ISO/IEC 29500, and my tests clearly [are] a sign that they were right.”

Microsoft, however, takes a different view to the independent experts. Their representatives will argue (with some justification) that terms like “legacy”, “deprecated”, and “new document” are tricky to define, but then this argument extends to the bizarre assertion that the Strict variant need never be supported. I believe, however, countries expect a more reasonable, plain-dealing approach to their clearly expressed intent – not this kind of wheedling sophistry. Mr Capossela writes that Microsoft has “learned a lot”; but on the evidence before us now, this was wishful thinking.

Microsoft's promise on standards maintenance


“We are committed to the healthy maintenance of the standard once ratification takes place so that it will continue to be useful and relevant to the rapidly growing number of implementers and users around the world.”

It all started so well – defect reports came in from many national bodies and (via Ecma) from Microsoft themselves. A number of useful improvements were made to the text correcting obvious defects, and (in the Transitional variant) fixing some of the evident mismatches between what the Standard said, and what legacy documents actually contained.

But as time has gone on, the situation has deteriorated. At the recent Stockholm meetings corrections agreed at the February 2007 Ballot Resolution Meeting were still being implemented, and while fixes which were evidently required for Office 2010’s headline conformance behaviour have been given the red carpet treatment, some other reports from National Bodies have been left to languish. Unusually, in Stockholm one of SC 34’s working groups (WG 2) recommended to the plenary that the OOXML maintenance group (WG 4) be reminded to answer overdue defect reports – in the ISO world that counts as a diplomatic incident!

Most worrying of all, it appears that Ecma have ceased any proactive attempt to improve the text, leaving just a handful of national experts wrestling with this activity. It seems to me that Microsoft/Ecma believe 95% of the work has been done to ensure the standard is “useful and relevant”. Looking at the text, I reckon it is more like 95% that remains to be done, as it is still lousy with defects.

Ironically, the failure to resource maintenance properly is only going to damage Microsoft Office in the longer term. The simple validators developed by me (Office-o-tron) and by Jesper Lund Stocholm (ISO/IEC 29500 Validator) reveal, to Microsoft's dismay, that the output documents of the Office 2010 Beta are non-conformant, and that this is in large part due to glaring uncorrected problems in the text (e.g. contradictory provisions). It is also a worrying commentary on the standards-savvyness of the Office developers that the first amateur attempts of part-time outsiders find problems with documents which Redmond’s internal QA processes have missed. I confidently predict that fuller validation of Office document is likely to reveal many problems both with those documents, and with the Standard itself, over the coming years.

So – while maintenance is happening, I think calling it “healthy maintenance” would be over-optimistic given the current circumstances.

Someone has blundered?

Microsoft has many enemies who will no doubt see the current state of affairs as proof that Microsoft never even intended to be good standards citizens. Indeed standards and XML veteran Tim Bray, writing shortly after the standard’s approval, made a prediction which could now seem impressively prophetic:

“It’s Kind of Sad • The coverage suggests that future enhancements to 29500, as worked through by a subcommittee of a subcommittee of a standards committee, are actually going to have some influence on Microsoft. Um, maybe there’s an alternate universe in which Redmond-based program managers and developers are interested in the opinions of a subgroup of ISO/IEC JTC 1/SC 34, but this isn’t it.

I suppose they’ll probably show up to the meetings and try to act interested, but it’s going to be a sideline and nobody important will be there. What Microsoft really wanted was that ISO stamp of approval to use as a marketing tool. And just like your mother told you, when they get what they want and have their way with you, they’re probably not gonna call you in the morning.”

For me, the puzzle of it is that in many respects, Microsoft does appear to get it. Senior management seems to want standards conformance, as Mr Capossela’s letter demonstrates – indeed strategically, playing fair by standards has always seemed like the most obvious way for the corporation to extract itself from the regulatory thickets that have entangled it over the past decade. Microsoft employs many eminent and standards-aware people of unimpeachable record – they also obviously “get it”. And on the ground in the standards committees there are many delightful, talented and diligent people who seem fully-signed up to a standards-aware (dare I say “non-evil”?) approach—as the SC 34 meetings in Stockholm again recently evidenced.

And if we look elsewhere within Microsoft we can see – for example from their engagement with HTML 5 and work on MSIE – that they can move in the right direction when the will is there.

So why – given the awareness Microsoft has at the top, at the bottom, and round the edges – does it still manage to behave as it does? Something, perhaps, is wrong at the centre — some kind of corporate dysfunction caused by a failure of executive oversight.

But whether Microsoft senior management have directed the company to behave badly, or whether they have failed to control a bad corporate impulse, is ultimately of no interest or concern to the National Bodies engaged in Standardization: for them, the effect is the same. Some responses will, however, be necessary.

Moving forward

If Microsoft ship Office 2010 to handle only the Transitional variant of ISO/IEC 29500 they should expect to be roundly condemned for breaking faith with the International Standards community. This is not the format “approved by ISO/IEC”, it is the format that was rejected.

However, it is foolish to believe they won’t ship it as is – and before long the world will be faced with responding to that release. In my view moving forward from there will be difficult …

  • Governments, corporations, other large entities – in fact, anyone – procuring office systems with a requirement for standards-conformance need to have their eyes very wide open about what precisely they will be getting with systems which create new documents which are extended Transitional ISO/IEC 29500.
  • Microsoft Office 2007 (the current version) reads and emits unextended Transitional ISO/IEC 29500, and so – strangely – may represent a high-water mark of Microsoft Office standards conformance. Anybody wanting to work just with documents which (modulo defects) are fully specified by Standards wholly under International control may want to stick with this version of the software.
  • Microsoft should make a public open commitment to support OOXML Strict fully. A service pack bringing this support to Office should be developed as a priority.
  • JTC 1 explicitly created the Transitional variant with the intention they would “at a later date, […] remove this set of features”. Now is the time to start the wheels in motion for this removal (the text will of course remain available for the perfectly good reason that the legacy needs to be documented).
  • Any assurances Microsoft has given to regulatory bodies (such as the EU Commission) about standards conformance must be looked at very carefully giving full consideration to the circumstances of this release.
  • Ecma need to commit adequate resources to standards maintenance and pro-actively seek to improve the text, working together with SC 34, if there is any appetite to improve the Standard to the point where it can be a trouble-free, or even good, basis for interoperable office applications.

In short, we find ourselves at a crossroads, and it seems to me that without a change of direction the entire OOXML project is now surely heading for failure.

OOXML and Microsoft Office 2007 Conformance: a Smoke Test

by Alex Brown 28. March 2010 18:40

This is one in a series of popular blog articles I am re-publishing from the old Griffin Brown blog which is now closed down. This article is from April 2008. It is the same content as the original (except for some hyperlink freshening).

At the time of posting this entry caused quite a furore, even though its results were – to me anyway – as expected. Looking back I think what I wrote was largely correct, except I probably underestimated the difficulty of converting Microsoft Office to use the Strict variant of OOXML — this would require more than surgery just to the de-serialisation code!


 

I was excited to receive from Murata Makoto a set of the RELAX NG schemas for the (post-BRM) revision of OOXML, and thought it would be interesting to validate some real-world content against them, to get a rough idea of how non-conformant the standardisation of 29500 had made MS Office 2007.

Not having Office 2007 installed at work (our clients aren't using it – yet), the first problem is actually getting a reasonable sample for testing. Fortunately, the Ecma 376 specification itself is available for download from Ecma as a .docx file, and this hefty document is a reasonable basis for a smoke test ...

The main document ("document.xml") content for Part 4 of Ecma 376 weighs in at approx. 60MB of XML. Looking at it ... I'm sorry, but I'm not working on that size of document when it's spread across only two lines. Pretty-printing the thing makes it rather more usable, but pushes the file size up to around 100MB.

So we have a document and a RELAX NG schema. All that's necessary now it to use jing (or similar) and we can validate ...

Validating against the STRICT model

The STRICT conformance model is quite a bit different from Ecma 376, essentially because most of that format's most notorious features (non ISO dates, compatibility settings like autospacewotnot, VML, etc.) have been removed. Thus the expectation is that existing Office 2007 documents might be some distance away from being valid according to the strict schemas.

Sure enough, jing emitted 17MB (around 122,000) of invalidity messages when validating in this scenario. Most of them seem to involve unrecognised attributes or attribute values: I would expect a document which exercised a wider range of features to generate a more diverse set of error message.

Validating against the TRANSITIONAL model

The TRANSITIONAL conformance model is quite a bit closer to the original Ecma 376. Countries at the BRM (rather more than Ecma, as it happened) were very keen to keep compatibilty with Ecma 376 and to preserve XML structures at which legacy Office features could be targetted. The expectation is therefore that an MS Office 2007 document should be pretty close to valid according to the TRANSITIONAL schema.

Sure enough (again) the result is as expected: relatively few messages (84) are emitted and they are all of the same type complaining e.g. of the element:

<m:degHide m:val="on"/>

since the allowed attribute values for val are now "true", "false", etc. — this was one of the many tidying-up exercices performed at the BRM.

Conclusions?

Such a test is only indicative, of course, but a few tentative conclusions can be drawn:

  • Word documents generated by today's version of MS Office 2007 do not conform to ISO/IEC 29500
  • Making them conform to the STRICT schema is going to require some surgery to the (de)serialisation code of the application
  • Making them conform to the TRANSITIONAL will require less of the same sort of surgery (since they're quite close to conformant as-is)

Given Microsoft's proven ability to tinker with the Office XML file format between service packs, I am hoping that MS Office will shortly be brought into line with the 29500 specification, and will stay that way. Indeed, a strong motivation for approving 29500 as an ISO/IEC standard was to discourage Microsoft from this kind of file format rug-pulling stunt in future.

What's next?

To repeat the exercise with ISO/IEC 26300:2006 (ODF 1.0) and a popular implementation of OpenDocument. Will anybody be brave enough to predict what kind of result that exercise will have?

Document Format Standards and Patents

by Alex Brown 9. March 2010 12:21

This post is part of an ongoing series. It expands on item 9 of Reforming Standardisation in JTC 1.

Background

Historically, patents have been a fraught topic with an uneasy co-existence with standards. Perhaps (within JTC 1) one of the most notorious recent examples surrounded the JPEG Standard and, in part prompted by such problems there are certainly many people of good will wanting better management of IP in standards. Judging by some recent development in document format standardisation, it seems probable that this will be the area where progress can next be made …

Most recently, the Fast Track standardisation of ISO/IEC 29500 in 2007/8 saw much interest in the IPR regime surrounding that text, with much dark suspicion surrounding Microsoft's motives. However, the big development in this space – when it came – was from an unexpected direction …

The i4i Patent

Back in the SGML days I remember touring the floor of trade shows and noticing the S4-Desktop product from Candian company Infrastructures for Information, Inc (i4i). Like a number of other products at the time (including Microsoft's own long-forgotten SGML Author for Word, or Interleaf’s BladeRunner) it attempted to make Word™ a structure-aware authoring environment, based on the (accurate) belief that while many companies wanted structured data they didn't want to have to grapple with pointy brackets.

Keen to avoid the phenomenon that Rob Weir describes whereby

There is perhaps no occasion where one can observe such profound ignorance, coupled with reckless profligacy, as when a software patent is discussed on the web.

I will avoid any punditry about the ongoing legal course of this patent. Those interested would do well to read IP lawyer Andy Updegrove's post (and follow-up) on the legalities of this matter.

On the technical merit of the standard though, there appears to me to be unanimity among disinterested experts qualified to judge. For example Jim Mason (for 22 years the chair of the ISO committee responsible for all-things-markup) commented:

[T]his technique did not originate with i4i. It was already established in other commercial products and was, in effect, standardized in ISO/IEC 8613, Office Document Architecture. ODA essentially described a binary format for word-processor document representation, which worked by pointers into a byte stream. Its original interchange format, ODIF, started as a representation of that structure, but it was extended to have an alternative SGML stream, exported by a process similar to that described in the i4i patent. So there was prior art, specifically prior art described in public standards.

This point was expanded on by markup veteran Rick Jelliffe, who concluded:

By the end of the judgment I was left thinking "what interactive XML system with any links wouldn't be included in this?" which is utterly ridiculous.

I was creating SGML systems from 1989, and the i4i patent is just as obvious then as it is now.

In a Guardian Interview i4i chairman Loudon Owen seemed to make it clear that the patent would not be licensed on a reasonable and non-discriminatory (RAND) basis (at least – or especially – where Microsoft are concerned):

On licensing to Microsoft, Owen sounds on the edge of anger: "No. No. This is our property. We are going to build our business. There's no right for Microsoft to use it and go forward." But i4i could license it at some humungous, eye-watering price that Microsoft might have to pay, surely? No, says Owen.

The Wider Context

As part of its amicus brief (PDF) in the Bilski case pending before the Supreme Court, IBM offered what might be termed the orthodox pro-patent position. In a section headed “Software Patent Protection Provides Significant Economic, Technological, and Societal Benefits” we thus find a footnote quoting this text:

Given the reality that software source code is human readable, and object code can be reverse engineered, it is difficult for software developers to resort to secrecy. Thus, without patent protection, the incentives to innovate in the field of software are significantly reduced. Patent protection has promoted the free sharing of source code on a patentee’s terms—which has fueled the explosive growth of open source software development.

While it is somewhat surpising to learn here of the affinity between FOSS and patents, the point is of course that the idea of patents is not wholly without foundation: that a state-sanctioned restraint of trade (for such is a patent) is justified in allowing innovators to monetize their inventions. However, increasingly when we listen to the voices of actual FOSS (and non-FOSS) people the view seems to be that any advantages are outweighed by the problems of patents. For example Mike Kay (developer of the superb Saxon family of XSLT, XQuery, and XML Schema processing products) in an open letter to his MP argues against software patenting in a piece which is well-worth reading in its entirety:

The software business does not need incentives to innovate. If you don't innovate, you die. [...] [I]n the software business, patenting of ideas benefits no-one: certainly, it does not benefit society or the economy at large, which is the only possible justification for governments to interfere with the market and grant one company a monopoly over an idea.

And, in specific reference to the i4i patent:

recently an otherwise unsuccessful company has been awarded a similar [i.e. 9-figure] sum against Microsoft, for an idea which most people in the industry considered completely trivial and obvious.

More colourfully Tim Bray lists some horror-story cases (again well worth reading) and opines that the whole patent system is "too broken to be fixed". He also addresses the question of whether patent activity benefits society, and comes down firmly against:

And here are a few words for the huge community of legal professionals who make their living pursuing patent law: You’re actively damaging society. Look in the mirror and find something better to do.

The Myth of Unencumbered Technology

Given the situation we are evidently in, it is clear that no technology is safe. The brazen claims of corporations, the lack of diligence by the US Patent Office, and the capriciousness of courts means that any technology, at any time, may suddenly become patent encumbered. Technical people - being logical and reasonable - often make the mistake of thinking the system is bound by logic and reason; they assume that because they can see 'obvious' prior art, then it will apply; however as the case of the i4i patent vividly illustrates, this is simply not so.

Turning to document format standards, we can see there most certainly are known and suspected patents in play. For example:

  • the i4i patent mentioned above (which, in his Guardian interview, the i4i Chairman refuses to rule out as applying to ODF)
  • 45 unspecified patents which Microsoft has claimed OpenOffice.org infringes, some number of which may relate to the ODF specification (and which Sun and Microsoft agreed a cease-fire over until 2014 - at least as far as Sun is/was concerned)
  • an unknown number of unspecified patents which have led IBM to include ODF under its Interoperability Specifications Pledge
  • an unknown number of unspecified patents which have led Microsoft to include OOXML under its Open Specification Promise (though presumably clear OOXML-specific patents such as US Patent 7,676,746 are in scope here)

Now, as is clear from the above, large corporations have a preferred means of neutralising their IP stake in standards: by "promises", "covenants" and the like.

The question for standardizers remains: is the current situation acceptable? and if not, what can be done to improve it?

The ISO Rules (and Are They Followed?)

Since 2007 the "big three" International SDOs (ISO, IEC and ITU-T) have operated a common patent policy predicated on the wholly reasonable premise that standards should be "accessible to everybody without undue constraints". The policy is implemented in detail by JTC 1 (which joins the forces of ISO and IEC) and which – as we know – governs the International Standards ODF and OOXML.

The Policy as implemented in the Directives has several aspects, which I would categorise as falling under the following headings …

Personal Disclosure

Anybody aware of an IPR issue has a duty to speak out:

any party participating in the work of the Organizations should, from the outset, draw their [sic] attention to any known patent or to any known pending patent application, either their own or of other organizations. (ISO Directives Part 1, Clause 3)

And indeed committee secretaries and chairs are routinely reminded by Geneva to issue a request for IPR disclosure at meetings, to jog people's memory.

Formal Disclosure in Standards

Readers of Standards can expect to have the IPR/patent situation made explicit in the text before them, and accordingly there are may textual items mandated for Standards to which patents apply. In particular it is stated, "[a] published document for which patent rights have been identified during the preparation thereof, shall include the following notice in the introduction:"

The International Organization for Standardization (ISO) [and/or] International Electrotechnical Commission (IEC) draws attention to the fact that it is claimed that compliance with this document may involve the use of a patent concerning (…subject matter…) given in (…subclause…).

Centralised Record-keeping

A JTC 1 "patent database" (served as a huge HTML document) is maintained in Geneva which gathers together all the patents applying to published standards, and the terms under which patent holders have agreed to make licenses available.

Clear Access Rights

Patent Holders who have signed the licensing declaration to ISO, IEC or ITU-T agree to license their patents under a clear regime: either RAND, ZRAND (i.e. RAND with a free-of-charge license), or – exceptionally – on a per-case commercial basis. Anybody accessing the patent database is able to see this and, by referring to the ISO/IEC governing documents, know what it means, not least because no deviations from Geneva's wording are permitted:

the patent holder has to provide a written statement to be filed at ITU-TSB, ITU-BR or the offices of the CEOs of ISO or IEC, respectively, using the appropriate "Patent Statement and Licensing Declaration" Form. This statement must not include additional provisions, conditions, or any other exclusion clauses in excess of what is provided for each case in the corresponding boxes of the form.

Problem Handling

And if things go wrong:

2.14.3 Should it be revealed after publication of a document that licences under patent rights, which appear to cover items included in the document, cannot be obtained under reasonable and non-discriminatory terms and conditions, the document shall be referred back to the relevant committee for further consideration.

Unfortunately, when we hold up the big two document standards of ODF and OOXML against the goals set out, we see there is work still to be done …

Moving Forward

While the "broken stack" of patents is beyond repair by any single standards body, at the very least the correct application of the rules can make the situation for users of document format standards more transparent and certain. In the interests of making progess in this direction, it seems a number of points need addressing now.

  • Users should be aware that the various covenants and promises being pointed-to by the US vendors need not be relevant to them as regards standards use. Done properly, International Standardization can give a clearer and stronger guarantee of license availability – without the caveats, interpretable points and exit strategies these vendors' documents invariably have.
  • In particular it should be of concern to NBs that there is no entry in JTC 1's patent database for OOXML (there is for DIS 29500, its precursor text, a ZRAND promise from Microsoft); there is no entry whatsoever for ODF. I would expect there to be declarations from the big US vendors who profess patent interests in these standards, and I would expect this to be addressed as a matter of urgency (perhaps in parallel with the publication of these standards' forthcoming amendments)
  • In the case of the i4i patent, one implementer has already commented that implementing CustomXML in its entirety may run the risk of infringement (and this is probably, after all, why Microsoft patched Word in the field to remove some aspects of its CustomXML support). OOXML needs to be referred back to its committee (this may be JTC 1, not SC 34) for a decision on what happens next. My personal guess is that CustomXML will be left in OOXML Transitional (patent-encumbrance will be just one more of the many warning stickers on this best-avoided variant), and modified in, or removed from, OOXML Strict
  • When declaring their patents to JTC 1, patent holders are given an option whether to make a general declaration about the patents that apply to a standard, or to make a particular declaration about each and every itemized patent which applies. I believe NBs should be insisting that patent holder enumerate precisely the patents they hold which they claim apply to ODF or OOXML, as this will give greater transparency about what is (or is not) covered and prevent the vague threat ("there may be patents but we're not saying what") which seems to apply at the moment.

There is obviously much to do, and I am hoping that at the forthcoming SC 34 meetings in Stockholm this work can begin. Certainly, anybody reading this blog post now knows there are outstanding IPR issues which we as standardizers have a duty to raise …

SC 34 meetings, Copenhagen

by Alex Brown 27. June 2009 14:57

This week I attended 4 days of meetings of SC 34 working groups. WG 4 (OOXML maintenance) and WG 5 (OOXML/ODF interoperability). Last year I predicted that OOXML would get boring and, on the whole, the content of these meetings fulfilled that prophecy (while noting, of course, that for us markup geeks and standards wonks “boring” is actually rather exciting). There was however, one hot issue, which I’ll come to later …

Defects

Since the publication of OOXML in November last year, the work of WG 4 has been almost exclusively focussed on defect correction. To date over 200 defects have been submitted (the UK leading the way having submitted 38% of them). Anybody interested in what kinds of thing are being fixed can consult the material on the WG 4 web site for a quick overview. During the Copenhagen meeting WG 4 closed 53 issues meaning that 71% of all submitted defects submitted have now been resolved. By JTC 1 standards that is impressively rapid. The defects will now go forward to be approved by JTC 1 National Bodies before they can become official Amendments and Corrigenda to the base Standard. Among the many more minor fixes, a couple of agreed changes are noteworthy:

  • In response to a defect report from Switzerland, for the Strict version (only) of IS 29500, the XML Namespace has been changed, so that consumers can know unambiguously whether they are consuming a Strict or Transitional document without any risk of silent data loss. This is (editorially) a lot of work, but the results will be I think worthwhile.
  • As I wrote following the Prague meeting, there was a move afoot to re-instate the values “on” and “off” as permissible Boolean values (alongside “true”, “1”, “false” and “0”) so that Transitional documents would accurately reflect the existing corpus of Office documents, in accord with the stated scope of the standard. This change has now been agreed by the WG.

The ISO Date Issue

The “hot issue” I referred to earlier is ISO dates. What better way to illustrate the problem we face than by using one of Denmark’s most famous inventions, the LEGO® brick …


f*cked-up lego brick
OOXML Transitional imagined as a LEGO® brick

More precisely, the problem is about date representation in spreadsheet cells. One of the innovations of the BRM was to introduce ISO 8601 date representation into OOXML. However the details of how this have been done are problematic:

  1. Despite the text of the original resolution declaring that ISO 8601 dates should live alongside serial dates (for compatibility with older documents), one possible reading of the text today is that all spreadsheet cell dates have to be in ISO 8601 format
  2. Having any such dates in ISO 8601 format is particularly problematic for Transitional OOXML, which is meant to represent the existing corpus of office documents. Since none of these existing documents contain ISO 8601 dates having them here makes no sense
  3. Even more seriously, if people start creating “Transitional” OOXML documents which contain 8601 dates, then a huge installed base of software expecting Ecma-376 documents will silently corrupt that data and/or produce surprising results. (My concern here is more for things like big ERP and Financial systems, rather than desktop apps like MS Office). Hence the odd LEGO brick above: like those bricks, such files would embody an interoperability disaster
  4. Even the idea of using ISO 8601 is pretty daft unless it is profiled (currently it is not). ISO 8601 is a big complex standards with many options irrelevant to office documents: it would be far more sensible for OOXML to target a tiny subset of ISO 8601, such as that specified by W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes
  5. Many date/time functions declared in spreadsheetML appear to be predicated on date/time values being represented as serial values and not ISO 8601 dates. It is not clear if the Standard as written makes any sense when ISO 8601 dates are used.

The solution?

Opinions vary about how the ISO date problem might best be solved. My own strong preference would be to see the Standard clarified so that users were sure that Transitional documents were guaranteed to correspond to the existing document reality – in other words that Transitional documents only contain serial date representations, and not “ISO 8601” date representations. In my view the ISO dates should be for use in the Strict variant of OOXML only.

If a major implementation (Excel 2010, say) appears which starts pumping incompatible, ISO 8601-flavoured Transitional documents into circulation, then that would be an interop disaster. The standards process would be rightly criticised for producing a dangerous format; users would be at risk of corrupted data; and guilty vendors too would be blamed accordingly.

It is imperative that this problem is fixed, and fixed soon.

Impressions of Copenhagen

Our meetings took place at the height of midsummer, and every day was glorious (all meetings started with a sad closing of the curtains). Something of a pall was cast over proceeding by thefts, in two separate incidents, of delegates’ laptop computers; but there is no doubt Copenhagen is a wondeful city blessed with excellent food, tasty beer, and an abundance of good-looking women. Dansk Standard provided most civilised meeting facilities, and entertainment chief Jesper Lund Stocholm worked hard to ensure everyone enjoyed Copenhagen to the full, especially the midsummer night witch-burning festivities! Next up is the SC 34 Plenary in Seattle in September; I’m sure there will be many more defect reports on OOXML to consider by then, and that WG 4's tireless convenor Murata-san will be keeping the pressure on to mantain the pace of fixes  …


Jesper Among the Beers
Jesper Lund Stocholm

Notes on Document Conformance and Portability #3

by Alex Brown 10. May 2009 13:20

Now that the furore about Microsoft’s implementation of ODF spreadsheet formulas in Office SP2 has died down a little, it is perhaps worth taking a little time to have a calm look at some of the issues involved.

Clearly, this is an area where strong commercial interests are in play, not to mention an element of sometimes irrational zeal from those who consider themselves pro or anti (mostly anti) Microsoft.

One question is whether Microsoft did “The Right Thing” by users in choosing to implement formulas the way they did. This is certainly a fair question and one over which we can expect there to be some argument.

The fact is that Microsoft’s implementation decision means that, on the face of it, they have produced an implementation of ODF which does not interoperate with other available implementations. Thus IBM blogger Rob Weir can produce a simple (possibly simplistic) spreadsheet, “Maya’s Wedding Planner” and used it to illustrate, with helpful red boxes for the slow-witted, that Microsoft’s implementation is a “FAIL” attributable to “malice or incompetence”. For good measure he also takes a side-swipe at Sun for their non-interoperable implementation. In this view, interoperability aligning with IBM’s Symphony implementation is – unsurprisingly – presented as optimal (in fact, you can hear the sales pitch from IBM now: “well, Mr government procurement officer, looks like Sun and MS are not interoperable, you won’t want these other small-fry implementations, and Google’s web-based approach isn’t suitable – so looks like Symphony is the only choice …”)

Microsoft have argued back, of course, most strikingly in Doug Mahugh’s 1 + 2 = 1? blog posting, which appears to present some real problems with basic spreadsheet interoperability among ODF products using undocumented extensions. The MS argument is that practical ODF interoperability is a myth anyway, and so supporting it meaningfully is not possible (in fact, you can hear the sales pitch from MS now: “well, Mr government procurement officer, looks like ODF is dangerously non-interoperable: here, let me show you how IBM and Sun can’t even agree on basic features; but look, we’ve implemented ISO standard formulas, so we alone avoid that – and you can assess whether we’re doing what we claim – looks like MS Office is the only choice …”)

Personally, I think MS have been disappointingly petty in abandoning the “convention” that the other suites more or less use. I accept that these ODF implementations have limited interoperability and are unsafe for any mission-critical data, but for the benefit of the “Maya’s Wedding Planner” type of scenario, where ODF implementations can actually cut it, I think MS should have included this legacy support as an option, even if they did have to qualify that support with warning dialogs about data loss and interoperability issues.

But - vendors are vendors; it is their very purpose to compete in order to maximise their long-term profits. Users don’t always benefit from this. We really shouldn’t be surprised that we have IBM, Sun and Microsoft in disagreement at this point.

What we should be surprised about is how this interoperability fiasco has been allowed to happen within the context of a standard. To borrow Rick Jelliffe’s colourfully reported words, the whole purpose of shoving an international standard up a vendor’s backside it to get them to behave better in the interests of the users. What has gone wrong here is in the nature of the standard itself. ODF offers an extremely weak promise of interoperability, and the omission of a spreadsheet formula specification in ODF 1.1 is merely one of the more glaring facets of this problem. As XML guru James Clark wrote in 2005:

I really hope I'm missing something, because, frankly, I'm speechless. You cannot be serious. You have virtually zero interoperability for spreadsheet documents.

To put this spec out as is would be a bit like putting out the XSLT spec without doing the XPath spec. How useful would that be?

It is essential that in all contexts that allow expressions the spec precisely define the syntax and semantics of the allowed expressions.

These words were prophetic, for now we do indeed face a present zero interoperability reality.

The good news is that work is underway to fix this problem: ODF 1.2 promises, when it eventually appears, to specify formulas using the new OpenFormula specification. When that is published vendors will cease to have an excuse to create non-interoperable implementations, at least in this area.

Is SP2 conformant?

Whether Microsoft’s approach to ODF was the wisest is something over which people may disagree in good faith. Whether their approach conforms to ODF should be a neutral fact we can determine with certainty.

In a follow-up posting to his initial blast, Rob Weir sets out to show that Microsoft’s approach is non-conformant, subsequent to his previous statement that “SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard”. After quoting a few selected extracts from the standard, a list is presented showing how various implementations represent a formula:

  • Symphony 1.3: =[.E12]+[.C13]-[.D13]
  • Microsoft/CleverAge 3.0: =[.E12]+[.C13]-[.D13]
  • KSpread 1.6.3: =[.E12]+[.C13]-[.D13]
  • Google Spreadsheets: =[.E12]+[.C13]-[.D13]
  • OpenOffice 3.01: =[.E12]+[.C13]-[.D13]
  • Sun Plugin 3.0: [.E12]+[.C13]-[.D13]
  • Excel 2007 SP2: =E12+C13-D13

Rob writes, “I'll leave it as an exercise to the reader to determine which one of these seven is wrong and does not conform to the ODF 1.1 standard.”

Again, this is clearly aimed at the slow witted. One can imagine even the most hesitant pupil raising their hand, “please Mr Weir, is it Excel 2007 SP2?” Rob however, is too smart to avoid answering the question himself, and anybody who knows anything of ODF will know that, in fact, this is a tricky question.

Accordingly, Dennis Hamilton (ODF TC member and secretary of the ODF Interoperability and Conformance TC) soon chipped in among the blog comments to point out that ODF’s description of formulas is governed by the word “Typically”, rendering it arguably just a guideline. And, as I pointed out in my last post, it is certainly possible to read ODF as a whole as nothing more than a guideline.

(I am glad to be able to report that the word “typically” has been stripped from the draft of ODF 1.2, indicating its existence was problematic.)

Curious readers might like to look for themselves at the (normative) schema for further guidance. Here, we find the formal schema definition for formulas, with a telling comment:

<define name="formula">
  <!-- A formula should start with a namespace prefix, -->
  <!-- but has no restrictions-->
  <data type="string"/>
</define>

Which is yet another confirmation that there are no certain rules about formulas in ODF.

So I believe Rob’s statement that “SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard” is mistaken on this point. This might be his personal interpretation of the standard, but it is based on an ingenious reading (argued around the meaning of comma placement, and privileging certain statements over other), and should certainly give no grounds for complacency about the sufficiency of the ODF specification.

As an ODF supporter I am keen to see defects, such as conformance loopholes, fixed in the next published ODF standard. I urge all other true supporters to read the drafts and give feedback to make ODF better for the benefit of everyone, next time around.

SC 34 Meetings, Prague, Days 2, 3 & 4

by Alex Brown 9. April 2009 09:12
What a Difference a Day Makes
MURATA Makoto, WG 4 convenor,
seems pleased with progress on the maintenance of OOXML

I had been intending to write a day-by-day account of these meetings but as it turned out there simply was not time for blogging (tweeting, on the other hand …). Another activity which suffered was photography: I had wanted to take a load of pictures of über-photogenic Prague – but somehow the work seemed to expand into all my notionally free time too. What I did manage to snap is here. And it is also well worth checking out Doug Mahugh’s photos for more.

WG 1 met again on Tuesday to finish its business (you can read the meeting report here) and then my attention mostly turned mostly to the activities in WG 4 (for OOXML maintenance) and WG 5 (which is for OOXML / ODF interop).

OOXML One Year On

Overall, WG 4 is making excellent progress. To date, 169 defects have been collected for OOXML (check out the defect log) and the majority of these have either been closed, or a resolution agreed. Amendments were started for 3 of the 4 parts of OOXML to allow a bunch of small corrections to be put in place, and the even-more-minor problems will be fixed by publishing technical corrigenda. Overall, I think stakeholders in OOXML can feel pretty confident that the Standard is being sensibly and efficiently maintained.

I was personally very pleased to see National Bodies well-represented (the minutes are here) – to the extent that I’d now ideally like to see some more big vendors coming to the table so their views can be heard. Microsoft (of course) was; but where (for example) are Apple, Oracle and the other vendors who participated in Ecma TC 45 while OOXML was being drafted? To them – and to anybody who wants to get involved in this important work – I say: participate!

Over at Rick Jelliffe’s blog Rick has been carrying out something of an exposé of the unfortunate imbalance in the stakeholders represented in the maintenance of ODF at OASIS (something which will become even more acute if Sun is, in the end, snapped-up by IBM). Personally I think Rick is right that it is vitally important to have a good mix of voices at the standardisation table: big vendors, small vendors, altruistic experts, users, government representatives, etc. WG 4 is getting there, but it too has some way to go.

Tougher Issues

Some more controversial topics were however not resolved during the meeting, and I think it may be worth exploring these in more detail:

The Namespace Problem

One issue in particular has proved thorny: whether to changes Namespaces in the OOXML schemas. This topic took a good slice of Tuesday, and then segued into a bar session afterwards; this then carried on over supper and by the time we broke it was midnight. And still no conclusion has been reached. WG 4 has issued a document outlining the state of discussions to date …

I have already expressed my own views on this; but during these Prague meetings some important new considerations were brought to bear. Go over to Jesper Lund Stocholm’s blog to get the thorny details

Personally, I think the “strict” format is a new format, and that changing the Namespace is only part of the solution. I would like to see the media type changes and for OOXML to recommend a new file extension (.dociso anyone?) to reduce the chances that users suffer the (to me) unacceptable fate of silent data loss that Jesper highlights.

Whither Transitional?

Another hot topic of discussion is what the “transitional” version of OOXML is really for. One interesting and slightly surprising fact that emerged after the BRM is that the strict schemas are a true subset of the transitional schemas. Should this nice link be preserved? Microsoft are keen for new features introduced in the “strict” version of OOXML to be mirrored in the “transitional” version – presumably, in part, because Office 14 will use transitional features.

Openness

When the maintenance of OOXML was being planned, one of the principles agreed on by the National Bodies was that the process should be as open as possible, consistent with JTC 1 rules. One aspect of this is the question of whether WG 4’s mailing list archive should be open to the public. Some NBs were a little nervous of this for the reason that their committee members might be less free to post candid comments if they were open to public scrutiny, and possible repercussions with their boss and/or the tinfoil brigade in the blogosphere. There was also the troubling precedent of the mailing list of the U.S. INCITS V1 committee, which opened its archive to public view during the DIS 29500 balloting period, only to see it die completely as contributors refused to post in public view.

The issue will now be put to public ballot, and I am hopeful that mechanisms have been put in place which will allow NBs to support opening of the archive. With public standards, public meeting reports, public discussion documents and a public mailing list archive I think WG 4 will demonstrate that an excellent degree of openness is indeed possible even within the constraints of the current JTC 1 Directives.

Overturning BRM decisions

The UK proposed an interesting new defect during the Prague meetings, which centred on one of the decisions made at the BRM.

Nature of the Defect:

As a result of changes made at the BRM, a number of existing Ecma-376 documents were unintentionally made invalid against the IS29500 transitional schema. It was strongly expressed as an opinion at the BRM by many countries that the transitional schema should accurately reflect the existing Ecma-376 documents.

However, at the BRM, the ST_OnOff type was changed from supporting 0, 1, On, Off, True, False to supporting only 0, 1, True, False (i.e. the xs:boolean type). Although this fits with the detail of the amendments made at the BRM, it is against the spirit of the desired changes for many countries, and we believe that due to time limitations at the BRM, this change was made without sufficient examination of the consequences, was made in error by the BRM (in which error the UK played a part), and should be fixed.

Solution Proposed by the Submitter

Change the ST_OnOff type to support 0, 1, On, Off, True and False in the Transitional schemas only.

The result of the BRM decision being addressed here was apparent in a blog entry I wrote last year, which attracted rather a lot of attention.

Simply put, the UK is now suggesting the BRM made a mistake here, and things should be rectified so that existing MS Office documents “snap back” into being in conformance with 29500 transitional.

This proposal caused some angst. Who were we (some asked) to overturn decisions made at the BRM? My own view is less cautious: this was an obvious blunder, the BRM got it wrong (as it did many things, I think). So let’s fix it.

Whither WG 5?

In parallel with WG 4, WG 5 (the group responsible for ODF/OOXML interoperability) also met. One of the substantive things it achieved was to water down the title of the ongoing report being prepared on this topic, changing it from:

OpenDocument Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation

to:

OpenDocument Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation Guidelines

Adding the word “guidelines” to the title should make it clear to anybody noticing this project that it is not an “answer” to ODF/OOXML interoperability, merely a discursive document. For myself, I have doubts about the ultimate usefulness of such a document.

It is disappointing to see the poor rate of progress on meaningful interoperability and harmonisation work. Of course these things are motherhood and apple pie in discussion – but when the time comes to find volunteers to actually help, few hands go up. In my view, the only hope of achieving any meaningful harmonisation work is to get Another Big Vendor interested in backing it, and I know some behind-the-scenes work will be taking place to beat the undergrowth and see if just such a vendor can be found.

SC 34 Meetings, Okinawa - Days 3 & 4

by Alex Brown 29. January 2009 09:15
Hotel Walkway at Sunset
Hotel Walkway at Sunset

Two days of hard grind. A lot of administrivia to sort (meeting dates, etc.); many paragraphs of the Directives to read; many defect reports on OOXML to address; and some vigorous discussion to be had about interoperability.

Some concrete progress was made, notably:

  • The first defect reports on ISO/IEC 29500 (aka OOXML) were addressed, and fixes agreed
  • Some principles were established how updates (as opposed to fixes) for OOXML might be processed
  • Some useful discussions in WG 5 clarified the scope of the ongoing work drafting a technical report giving guidance on how 29500 (OOXML) and 26300 (ODF) can interoperate

From my perspective, the most exciting discussion during these meetings centred on a presentation from the ODF editor, Patrick Durusau, on what he called “true” interoperability. Patrick (betraying his Topic Maps background) set out a suggestion that a PSI might be created to identify the document constructs described by the two document format Standards, and that each PSI might be in turn associated with metadata and documentation related to that construct. Essentially, this approach views the “problem” of interoperability between ODF and OOXML as a problem of documentation — though Patrick also pointed out that the interoperability problem had already been solved by corporations (maybe he meant Microsoft, for example) and that these corporations were, perhaps churlishly, keeping the information to themselves.

I see the establishment of such rich descriptive material as being a first important step on a road which leads to the dissolving of what we currently see as meaningful differences between the document formats. Perhaps in time the rest of the world will come to realise too that when we talk of a preference for ODF and OOXML we are, in the main, expressing a preference for syntax, and that the juvenile “OOXML vs ODF” arguments – however much they are loaded with corporate agendas masquerading as moral superiority – will achieve precisely nothing for those who matter: the end users.

About the author

Alex Brown


Links

Legal

The author's views contained in this weblog are his, and not necessarily of any organisation. Third-party contributions are the responsibility of the contributor.

This weblog’s written content is governed by a Creative Commons Licence.

Creative Commons License     


Bling

Use OpenDNS  

profile for alexbrn at Stack Overflow, Q&A for professional and enthusiast programmers

Quotable

Note that everyone directly involved in the development of ISO standards is a volunteer or funded by outside sponsors. The editors, technical experts, etc., get none of this money. Of course, we must also consider the considerable expense of maintaining offices and executive staff in Geneva. Individual National Bodies are also permitted to sell ISO standards and this money is used to fund their own national standards activities, e.g., pay for offices and executive staff in their capital. But none of this money seems to flow down to the people who makes the standards.

Rob Weir

RecentComments

Comment RSS