Microsoft Fails the Standards Test

The second anniversary of the approval of ISO/IEC 29500 (aka OOXML) is upon us. The initial version of OOXML (Ecma 376 1st Edition) was rejected by ISO and IEC members in September 2007, and it was only after extensive revisions and a bitter standards war in the following months that a revised format was finally approved on April 2, 2008.

The key breakthrough of the revision process was the splitting of the specification into two variant versions, called “Strict” and “Transitional”. The National Bodies confined all the technologies they found unacceptable to the Transitional format and dictated text to be included in the standard intended to prohibit its further use:

“The intent […] is to enable a transitional period during which existing binary documents being migrated to DIS 29500 can make use of legacy features to preserve their fidelity, while noting that new documents should not use them. […]

This annex is normative for the current edition of the Standard, but not guaranteed to be part of the Standard in future revisions. The intent is to enable the future DIS 29500 maintenance group to choose, at a later date, to remove this set of features from a revised version of DIS 29500.”

I was convinced at the time, and remain convinced today, that the division of OOXML into Strict and Transitional variants was the innovation which allowed the Standard to pass. Enough National Bodies could then vote in good conscience for OOXML knowing that their preferred, Strict, variant would be under their control into the future while the Transitional variant (which – remember – they had effectively rejected in 2007) would remain purely for the purpose of accurately specifying old documents: a useful aim in itself.

Promises and reality

Just before the final votes were counted, Microsoft made some commitments. Mr Chris Capossela (Senior VP, Microsoft Office) wrote an open letter promising what would happen if the Standard passed. Two years on, we can fill out a report card for a couple of these promises and determine how well Microsoft is doing …

Microsoft's promise on standards support in products


“We've listened to the global community and learned a lot, and we are committed to supporting the Open XML specification that is approved by ISO/IEC in our products.”

On this count Microsoft seems set for failure. In its pre-release form Office™ 2010 supports not the approved Strict variant of OOXML, but the very format the global community rejected in September 2007, and subsequently marked as not for use in new documents – the Transitional variant. Microsoft are behaving as if the JTC 1 standardisation process never happened, and using technologies (like VML) in a new product which even the text of the Standard itself describes as “deprecated” and “included […] for legacy reasons only” (see ISO/IEC 29500-1:2008, clause M.5.1).

Knowledgeable experts present at the Ballot Resolution Meeting, knowing what Microsoft planned, have publicly repeated the International consensus position in alarm. XML Standards guru Rick Jelliffe (an Australian delegate at the meeting) wrote:

“If [Microsoft’s] default format is OOXML Transitional, then they have abandoned support for an Open Standards process: OOXML was only made a standard because of the changes that were made at the BRM. The original ECMA version of OOXML (which is the basis of Transitional) was soundly rejected, let no-one forget.”

And Danish expert and BRM delegate Jesper Lund Stocholm, running an analysis of Office 2010 files wrote:

“It has been the fear of many that Microsoft will never, ever care at all about the strict conformance clause of ISO/IEC 29500, and my tests clearly [are] a sign that they were right.”

Microsoft, however, takes a different view to the independent experts. Their representatives will argue (with some justification) that terms like “legacy”, “deprecated”, and “new document” are tricky to define, but then this argument extends to the bizarre assertion that the Strict variant need never be supported. I believe, however, countries expect a more reasonable, plain-dealing approach to their clearly expressed intent – not this kind of wheedling sophistry. Mr Capossela writes that Microsoft has “learned a lot”; but on the evidence before us now, this was wishful thinking.

Microsoft's promise on standards maintenance


“We are committed to the healthy maintenance of the standard once ratification takes place so that it will continue to be useful and relevant to the rapidly growing number of implementers and users around the world.”

It all started so well – defect reports came in from many national bodies and (via Ecma) from Microsoft themselves. A number of useful improvements were made to the text correcting obvious defects, and (in the Transitional variant) fixing some of the evident mismatches between what the Standard said, and what legacy documents actually contained.

But as time has gone on, the situation has deteriorated. At the recent Stockholm meetings corrections agreed at the February 2007 Ballot Resolution Meeting were still being implemented, and while fixes which were evidently required for Office 2010’s headline conformance behaviour have been given the red carpet treatment, some other reports from National Bodies have been left to languish. Unusually, in Stockholm one of SC 34’s working groups (WG 2) recommended to the plenary that the OOXML maintenance group (WG 4) be reminded to answer overdue defect reports – in the ISO world that counts as a diplomatic incident!

Most worrying of all, it appears that Ecma have ceased any proactive attempt to improve the text, leaving just a handful of national experts wrestling with this activity. It seems to me that Microsoft/Ecma believe 95% of the work has been done to ensure the standard is “useful and relevant”. Looking at the text, I reckon it is more like 95% that remains to be done, as it is still lousy with defects.

Ironically, the failure to resource maintenance properly is only going to damage Microsoft Office in the longer term. The simple validators developed by me (Office-o-tron) and by Jesper Lund Stocholm (ISO/IEC 29500 Validator) reveal, to Microsoft's dismay, that the output documents of the Office 2010 Beta are non-conformant, and that this is in large part due to glaring uncorrected problems in the text (e.g. contradictory provisions). It is also a worrying commentary on the standards-savvyness of the Office developers that the first amateur attempts of part-time outsiders find problems with documents which Redmond’s internal QA processes have missed. I confidently predict that fuller validation of Office document is likely to reveal many problems both with those documents, and with the Standard itself, over the coming years.

So – while maintenance is happening, I think calling it “healthy maintenance” would be over-optimistic given the current circumstances.

Someone has blundered?

Microsoft has many enemies who will no doubt see the current state of affairs as proof that Microsoft never even intended to be good standards citizens. Indeed standards and XML veteran Tim Bray, writing shortly after the standard’s approval, made a prediction which could now seem impressively prophetic:

“It’s Kind of Sad • The coverage suggests that future enhancements to 29500, as worked through by a subcommittee of a subcommittee of a standards committee, are actually going to have some influence on Microsoft. Um, maybe there’s an alternate universe in which Redmond-based program managers and developers are interested in the opinions of a subgroup of ISO/IEC JTC 1/SC 34, but this isn’t it.

I suppose they’ll probably show up to the meetings and try to act interested, but it’s going to be a sideline and nobody important will be there. What Microsoft really wanted was that ISO stamp of approval to use as a marketing tool. And just like your mother told you, when they get what they want and have their way with you, they’re probably not gonna call you in the morning.”

For me, the puzzle of it is that in many respects, Microsoft does appear to get it. Senior management seems to want standards conformance, as Mr Capossela’s letter demonstrates – indeed strategically, playing fair by standards has always seemed like the most obvious way for the corporation to extract itself from the regulatory thickets that have entangled it over the past decade. Microsoft employs many eminent and standards-aware people of unimpeachable record – they also obviously “get it”. And on the ground in the standards committees there are many delightful, talented and diligent people who seem fully-signed up to a standards-aware (dare I say “non-evil”?) approach—as the SC 34 meetings in Stockholm again recently evidenced.

And if we look elsewhere within Microsoft we can see – for example from their engagement with HTML 5 and work on MSIE – that they can move in the right direction when the will is there.

So why – given the awareness Microsoft has at the top, at the bottom, and round the edges – does it still manage to behave as it does? Something, perhaps, is wrong at the centre — some kind of corporate dysfunction caused by a failure of executive oversight.

But whether Microsoft senior management have directed the company to behave badly, or whether they have failed to control a bad corporate impulse, is ultimately of no interest or concern to the National Bodies engaged in Standardization: for them, the effect is the same. Some responses will, however, be necessary.

Moving forward

If Microsoft ship Office 2010 to handle only the Transitional variant of ISO/IEC 29500 they should expect to be roundly condemned for breaking faith with the International Standards community. This is not the format “approved by ISO/IEC”, it is the format that was rejected.

However, it is foolish to believe they won’t ship it as is – and before long the world will be faced with responding to that release. In my view moving forward from there will be difficult …

  • Governments, corporations, other large entities – in fact, anyone – procuring office systems with a requirement for standards-conformance need to have their eyes very wide open about what precisely they will be getting with systems which create new documents which are extended Transitional ISO/IEC 29500.
  • Microsoft Office 2007 (the current version) reads and emits unextended Transitional ISO/IEC 29500, and so – strangely – may represent a high-water mark of Microsoft Office standards conformance. Anybody wanting to work just with documents which (modulo defects) are fully specified by Standards wholly under International control may want to stick with this version of the software.
  • Microsoft should make a public open commitment to support OOXML Strict fully. A service pack bringing this support to Office should be developed as a priority.
  • JTC 1 explicitly created the Transitional variant with the intention they would “at a later date, […] remove this set of features”. Now is the time to start the wheels in motion for this removal (the text will of course remain available for the perfectly good reason that the legacy needs to be documented).
  • Any assurances Microsoft has given to regulatory bodies (such as the EU Commission) about standards conformance must be looked at very carefully giving full consideration to the circumstances of this release.
  • Ecma need to commit adequate resources to standards maintenance and pro-actively seek to improve the text, working together with SC 34, if there is any appetite to improve the Standard to the point where it can be a trouble-free, or even good, basis for interoperable office applications.

In short, we find ourselves at a crossroads, and it seems to me that without a change of direction the entire OOXML project is now surely heading for failure.

Defect-o-tron ?

At recent SC 34 standards meetings, I've been thinking it would be useful to have XML models for all the documents we habitually have to deal with: agendas, meeting reports, proposals, defect reports and so on.

One particular focus for me at the moment is defect reports. Currently, I prepare these as XML documents conforming to the following home-brewed schema:

namespace xh = "http://www.w3.org/1999/xhtml"

start = comment-batch
any-xhtml =
  (element xh:* { any-xhtml }
   | attribute * { text }
   | text)+
comment-batch =
  element comment-batch {
    (attribute xml:lang { token }
     & attribute xml:id { token }),
    spec,
    submitter,
    comment+
  }
spec = element spec { text }
submitter =
  element submitter {
    attribute type { token },
    text
  }
comment =
  element comment {
    attribute xml:id { token },
    type,
    document,
    (clause | corrigendum),
    page?,
    problem,
    proposal?
  }
type = element type { "G" | "T" | "E" | "C" }
clause = element clause { text }
corrigendum = element corrigendum { text }
problem = element problem { any-xhtml }
proposal = element proposal { any-xhtml }
document = element document { text }
page = element page { text }

As you can see, it's pretty simple and leverages XHTML for the meat of the content (validation is left as an exercise for the reader). Using this, a comment on a Standard looks like this:

<comment xml:id="GB-AB-25" xmlns:xh="http://www.w3.org/1999/xhtml">
<type>T</type>
<document>ISO/IEC 29500-1:2008</document>
<clause>2.4</clause>
<page>3</page>
<problem>
  <xh:p>The first and last items in the bulleted list are contradictory,
    since a document which uses the extensibility mechanisms of Part 3 cannot
    be valid to the Part 1 (or 4) schema.</xh:p>
  <xh:p>Furthermore, Part 1 need not mention the Part 4 schema here.</xh:p>
</problem>
<proposal>
  <xh:p>Qualify the validity requirement with a caveat that such
    documents are pre-processed with the behaviour specified in Part 3
    (once that behaviour is described normatively).</xh:p>
</proposal>
</comment>

Batches of comments are processing into XHTML documents (with XSLT) for review and correction, and – once approved – can then be forwarded for processing with the normal maintenance mechanisms. In the case of WG 4 (OOXML) there is a web form available for submission — but getting my nice XML into the form’s fields requires much tedious copy and pasting.

I suppose this bit can be automated with a programmatic HTTP client. But thinking about this also made me think the defect generation process can be (partly) automated. There are so many formulaic problems with the OOXML text that it should be possible to write programs which feed errors in the text directly into the maintenance process. Well, it would if it wasn’t for the various bureaucratic intervening processes. Hmmm, defect-o-tron anybody?

OOXML and Microsoft Office 2007 Conformance: a Smoke Test


This is one in a series of popular blog articles I am re-publishing from the old Griffin Brown blog which is now closed down. This article is from April 2008. It is the same content as the original (except for some hyperlink freshening).

At the time of posting this entry caused quite a furore, even though its results were – to me anyway – as expected. Looking back I think what I wrote was largely correct, except I probably underestimated the difficulty of converting Microsoft Office to use the Strict variant of OOXML — this would require more than surgery just to the de-serialisation code!


 

I was excited to receive from Murata Makoto a set of the RELAX NG schemas for the (post-BRM) revision of OOXML, and thought it would be interesting to validate some real-world content against them, to get a rough idea of how non-conformant the standardisation of 29500 had made MS Office 2007.

Not having Office 2007 installed at work (our clients aren't using it – yet), the first problem is actually getting a reasonable sample for testing. Fortunately, the Ecma 376 specification itself is available for download from Ecma as a .docx file, and this hefty document is a reasonable basis for a smoke test ...

The main document ("document.xml") content for Part 4 of Ecma 376 weighs in at approx. 60MB of XML. Looking at it ... I'm sorry, but I'm not working on that size of document when it's spread across only two lines. Pretty-printing the thing makes it rather more usable, but pushes the file size up to around 100MB.

So we have a document and a RELAX NG schema. All that's necessary now it to use jing (or similar) and we can validate ...

Validating against the STRICT model

The STRICT conformance model is quite a bit different from Ecma 376, essentially because most of that format's most notorious features (non ISO dates, compatibility settings like autospacewotnot, VML, etc.) have been removed. Thus the expectation is that existing Office 2007 documents might be some distance away from being valid according to the strict schemas.

Sure enough, jing emitted 17MB (around 122,000) of invalidity messages when validating in this scenario. Most of them seem to involve unrecognised attributes or attribute values: I would expect a document which exercised a wider range of features to generate a more diverse set of error message.

Validating against the TRANSITIONAL model

The TRANSITIONAL conformance model is quite a bit closer to the original Ecma 376. Countries at the BRM (rather more than Ecma, as it happened) were very keen to keep compatibilty with Ecma 376 and to preserve XML structures at which legacy Office features could be targetted. The expectation is therefore that an MS Office 2007 document should be pretty close to valid according to the TRANSITIONAL schema.

Sure enough (again) the result is as expected: relatively few messages (84) are emitted and they are all of the same type complaining e.g. of the element:

<m:degHide m:val="on"/>

since the allowed attribute values for val are now "true", "false", etc. — this was one of the many tidying-up exercices performed at the BRM.

Conclusions?

Such a test is only indicative, of course, but a few tentative conclusions can be drawn:

  • Word documents generated by today's version of MS Office 2007 do not conform to ISO/IEC 29500
  • Making them conform to the STRICT schema is going to require some surgery to the (de)serialisation code of the application
  • Making them conform to the TRANSITIONAL will require less of the same sort of surgery (since they're quite close to conformant as-is)

Given Microsoft's proven ability to tinker with the Office XML file format between service packs, I am hoping that MS Office will shortly be brought into line with the 29500 specification, and will stay that way. Indeed, a strong motivation for approving 29500 as an ISO/IEC standard was to discourage Microsoft from this kind of file format rug-pulling stunt in future.

What's next?

To repeat the exercise with ISO/IEC 26300:2006 (ODF 1.0) and a popular implementation of OpenDocument. Will anybody be brave enough to predict what kind of result that exercise will have?

SC 34 meetings in Stockholm last week


Stockholm dawn
The weather was mostly grey, and the days busy, limiting photo taking time …

 

I have just returned from a week of meetings of ISO/IEC JTC 1/SC 34 in Stockholm, Sweden. Here's an update on what happened ...

WG 1

WG 1 met on the Monday (as usual, we "blaze the trail", working out how the public transport works, locating nearby bars, etc).

Our main project over the last 8 years has been the DSDL project, a multi-part standard for XML schema languages. The final parts of this jigsaw are falling into place: at this meeting we agreed to ballot DSDL Part 11, a specification being co-developed with W3C for Schema Association. Jirka Kosek – a project editor – is the driving force behind this work in WG 1. Rick Jelliffe has also completed a draft of a revision to DSDL Part 3 (more commonly known as Schematron), which will now be sent to ballot to collect National Body feedback.

ISO Zip

The need for an "ISO Zip" standard has long been recognized; I remember this topic first being raised in SC 34 during four years ago at our Seoul meeting. No progress has been made since then, but there have been some problems caused by the lack of a standard — not least the sudden disappearance of a document that ODF relies on!

As the number of XML-in-Zip formats mushrooms, WG 1 has now decided to grasp this nettle and propose a project for "Document Packaging" which will aim to deliver a minimal (yet compatible) file format specification particularly suitable for XML-in-Zip document formats. If successful the new standard will be usable as a drop-in replacement for the currently non-standard references to an over-featured (for their purposes) ZIP specification used by such formats as OOXML, ODF and EPUB.

The New Work Item Proposal was presented to the SC 34 plenary where it was agreed (without dissent) that it should be balloted. National Bodies will comment over the next three months and their responses considered at the Tokyo meetings in September.

An unpleasant surprise

Over the last few years I have been editing a project for Extensible Datatypes (ISO/IEC 19757-5, currently a FCD). As is usual with experts in SC 34, the text is prepared using schema-governed XML and processed into XSL-FO for rendering to PDF according to ISO's layout specification — we have a specification, TR 9573-11 AMD, specifically for that purpose.

This work on my text is nearly done, so imagine my surprise to learn that, all of a sudden, ITTF is now only accepting work that is in "Word format" (on inquiry, this means Microsoft® Word™ 2007). The decision has caused dismay among many SC 34 experts and reeks more of the short-term commercial interests of NBs' commercial publishing wings, than of any concern for document quality, adaptability or long-term preservation. It is a shame that ownership of Windows and MS Office is now apparently a prerequisite for being an JTC 1 Project Editor, and I can imagine more than a few eyebrows being raised if any future International Standard version of ODF needs to prepared using Microsoft Word!

WG 4

WG 4 (OOXML - ISO/IEC 29500 - maintenance) met for two and a half days. There are so many strands of activity here deserving of comment that I will write a separate blog post on this later this week. Stand by!

WG 6

WG 6 (ODF - ISO/IEC 26300 - maintenance) met on Thursday afternoon and Friday morning and was well-attended. The main work of this group is production of an International Standard version of ODF 1.1 that rolls in errata to date, and excellent progress is being made on this. Special praise must go to Dennis Hamilton for pulling an all-nighter (remotely participating from Seattle) and addressing some of the gnarlier problems of text production!

In general, it is good to see the suspicions of the past years now firmly set aside and all participants pulling together in the right direction for the good of ODF. It is also especially good to see the process now working better (if not perfectly) and admitting an International dimension to ODF maintenance; this success is due in no small part to the diligence and diplomacy of the WG 6 Convenor, Francis Cave who, it was jokingly suggested at the Plenary, should be appointed for a term of 30 years as convenor, rather than the normal three!

St Francis
Francis Cave, WG 6 convenor

The CJK lobby

As is usual at these meetings, various kinds of lobbying for various kinds of thing were taking place. Perhaps the prize for most effective operators must go to the CJK (China, Japan, Korea) participants who are working hard to raise awareness of their requirements for page layout and typography. Japan is promoting a project for specifying support for KIHONHANMEN in OOXML and ODF extensions, and plans are being made for a 'Workshop on CJK Issues related to OOXML and ODF extensions' in May (details will appear on the SC 34 web site when available). I wish them well ...

 

CJK
Experts from China, Japan and Korea in discussion

Rolleiflexes

Rolleiflexes

My father-in-law, possibly amused by watching me dick around with a DSLR and laptop over the weekend, decided to dig his camera equipment out of storage

These are the cameras he used, over three decades, to take the pictures for his magnum opus (so becoming the first non-Russian to be awarded the Russian Academy of Fine Arts’ gold medal). He asserted he'd always been pleased with Rolleiflex ...

The episode has a useful pay-off ... it established with my wife a new baseline for the number of cameras it is reasonable for a man to own :-)