Australia and OOXML

by Alex Brown 20. January 2011 09:57
Somewhere too early

 

There have been some poor decisions of late in Australia. Not playing Hauritz and persisting too long with the out-of-form Clarke and Ponting probably cost Australia the Ashes and has led to terrible self-flagelation. While it’s generally not done to take pleasure in the discomfort of others, I do think an exception can be made in the case of the Australian cricket team.

From various recent blogs and tweets I’ve noticed a fuss surrounding the decision by the Australian Government Information Management Office (AGIMO) to recommend the use of OOXML as a document format, and from the tenor of the comments it would seem this is being treated as similar calamity for Australia. However, there appears to be some misunderstanding and misinformation flying around which is worth a comment …

Leaving aside the merits of the decision itself, one particular theme in the commentary is that AGIMO have somehow picked a “non-ISO” version of OOXML. I can’t find any evidence of this. By specifying Ecma 376 without an edition number the convention is that the latest version of that standard is intended; and though I do think there is a danger of over-reading this particular citation, the current version of Ecma 376 is the second edition, which is the version of OOXML that was approved by ISO and IEC members in April 2008. The Ecma and ISO/IEC versions are in lock-step, with the Ecma text only ever mirroring the ISO/IEC text. And although (as now) there are inevitably some bureaucratic and administrative delays in the Ecma version rolling in all changes made in JTC 1 prior to publication, to cite one is, effectively, equivalent to citing the other.

[UPDATE: John Sheridan from AGIMO comments below that Ecma 376 1st Edition was intended, and I respond]

SC 34 Meetings in Tokyo

by Alex Brown 16. September 2010 18:06
Tokyo Tower
Tokyo Tower

Last week I attended a busy week of meetings of ISO/IEC JTC 1/SC 34 in Tokyo. Here’s an update on what is going on in the International Standard document format scene …

(For reference, the meeting resolutions are here).

DSDL

The DSDL project is drawing to a conclusion, and one of its final parts, Part 11 (“Schema Association”) is now ready to progress to DIS (Draft International Standard) stage, following its passage (without either dissent or comment) through the CD (Committee Draft) stage: congratulations to Project Editor Jirka Kosek!

One of the interesting things about this project is procedural: it is a standard being developed in parallel with the W3C (you can see their version of it here). I encourage everybody to take a look and report any comments to our mailing list, dsdl-comment@dsdl.org.

EPUB

e-book markets are taking off worldwide, and the dominant format is emerging to be EPUB, standardized by the International Digital Publishing Forum (IDPF) – a consortium. Although there is an International Standard in this space - IEC 62448 - it is fair to say, I think, that in the market, EPUB has clearly won. IEC 62448 is up for revision but technically it appears to lack some key features National Bodies expect to see in International Standards – notably comprehensive support for BIDI (bi-directional text) as used by many Middle Eastern writing systems. I am sure National Bodies (including the UK’s) will be monitoring this space very closely: in 2010 it is surely not acceptable to produce International (International!) Standards which ignore large portions of this planet’s population.

Following some behind-the-scenes discussion between IDPF and SC 34 people, it has been agreed that it could be worthwhile exploring whether and how the EPUB format could undergo International Standardization in SC 34. EPUB builds on some SC 34 technologies (notably DSDL), makes use of ZIP, and can find a suitable group of experts in SC 34 with a broad range of documentation and publishing experience. To progress matters, a resolution was agreed to ballot whether an exploratory study should be initiated:

SC 34 establishes an ad hoc group on EPUB of IDPF with the following terms of reference:
  • to discuss with IDPF whether their EPUB work should be standardized at SC 34 and to present a plan for such standardization and any other recommendations to the next SC 34 Plenary in 2011 March.
  • to determine the major stakeholders concerned with EPUB standardization and propose additional liaisons that would enable these stakeholders to be represented in any standardization process.
Membership is open to ISO/IEC JTC 1/SC 34 P and O members, liaison organizations, and subgroup representatives.

SC 34 appoints Dr. Makoto MURATA (Japan) and Dr. Yong-Sang CHO (Korea) as the Co-Conveners of this ad hoc group.

Personally, I think that since IDPF has done such a good job on EPUB already it would be a shame to do anything that would risk that ongoing goodness, and that some kind of parallel standardization activity with SC 34 would be appropriate – perhaps in the same kind of way that SC 2 and the Unicode Consortium keep ISO/IEC 10646 and Unicode in parallel. We shall see …

ISO “Zip”

Back in March I blogged about how SC 34 had opened a ballot on standardizing aspects of the “Zip” format. The proposal failed, with 10 countries voting against starting work (11 countries were in favour on this question and 10 abstained – so the necessary super-majority was not obtained).

Nevertheless in discussion in Tokyo it emerged that National Bodies were broadly in favour of an eventual “Zip” standard (or at least a standards-compatible specification) of some kind, and that it was more the nature of the proposal, rather than its aim, that was in question. Since I was the person who drafted the proposal, this is in large part mea culpa!

One of the chief reasons why the ballot failed was U.S. corporate influence. With IBM spearheading the effort, and the likes of Oracle and Microsoft generally going along with it, committee members in several countries were – I hear – energized to oppose the proposal. One particular concern was that PKWare, Inc. had had no opportunity to have input in the process. This concern is certainly reasonable, and there is no doubt that PKWare is quite a major stakeholder. But “Zip” is bigger even than that since it sits at the heart of some many specifications (e.g. ODF, OOXML and EPUB) and is in the foundations of many other technologies (e.g. Java, Linux … and Windows). So, to gather the widest possible stakeholder input going forward to a possible second try at standardizing “Zip”, SC 34 resolved to embark upon a study period:

SC 34 accepts the WG 1 recommendation contained in SC 34 N 1494 to initiate a study period with aim of establishing a firmer rationale for standardization of aspects of the “Zip” format.

SC 34 asks WG 1 that a report be submitted in time for consideration at the SC 34 meetings in Prague in 2011-03 and that time be allocated to this activity during the WG 1 meeting in Beijing in 2010-12.

SC 34 instructs its Secretariat to issue a call for participation in this Study Period to SC 34 national and liaison member bodies.

SC 34 requests the SC 34 chairman bring this activity to the attention of members of JTC 1, and other SC chairs, at the upcoming JTC 1 Plenary in Belfast.

One delegate told me the “Zip” vote had engendered some very heated arguments, more intense than those even of the OOXML wars – and many experts in SC 34 take the high politics already in evidence as an indication that there may be “unknown unknowns” surrounding this format. In my view, “Zip” is simply too important for it to have continuing IPR/technical uncertainties and I would like to see those put to rest by the standardization process.

OOXML

From Monday to Wednesday in WG 4 work continued on the OOXML Standard: primarily fixing defects (and in particular fixing problems with date handling – a fix important enough to warrant its own amendments). One positive development is that some more large-scale thinking about textual reform has started, with drafting of a possible revised version of Part 3 (“Markup Compatibility and Extensibility” aka MCE) underway. And for the gigantic Part 1 (5,572 pages) some experimentation removing redundant tables of elements has shown we can achieve a 10% decrease in size, and we can lose several hundred pages by moving the tutorial material of the “Primer” out of the text. Ultimately it is this kind of thoroughgoing re-organisation and re-writing (currently still just an experiment) that will redeem the text.

Can there be redemption for Microsoft, whose Office 2010 product has now hit the shelves using the deprecated transitional variant of OOXML and a load of Microsoft extensions? Well, in time, maybe …

There has been much discussion in WG 4 how to standardize Microsoft’s extensions which – although they use the extension mechanisms described by the IS 29500 – are not themselves described in any standard. They’re currently documented on MSDN. How should they be standardized? In a multi-part Standard? in a registry? or what? Ultimately WG 4 concluded we should do nothing – we are not hearing any market demand for standardizing Microsoft’s extensions and so we will wait. Of course this means that as Microsoft adds more and more extensions to subsequent versions of Office the proportion of it described by the text of IS 29500 will diminish. We shall have to wait and see what the market thinks about that. Personally, I feel it is critical that procurers of OOXML-based suites pay careful attention to this aspect of MS Office and (I have written this before) know that MS Office 2007 – not 2010 – is the only version which (modulo bugs/defects) conforms to OOXML unextended. It is my guess that future large-scale procurers of MS Office may want to specify which extensions they want (maybe none), and I would like to see the conformance language of OOXML beefed-up making such procurement specifications easier.

Following the announcement of Microsoft’s plans that future versions of MS Office will use OOXML Strict, it was interesting to speculate how this leap was going to be achieved. One particularly horrid suggestion was that MS Office should effectively continue to use OOXML Transitional (but within a Strict wrapper) by relying on some new “extensions” that contained deprecated Transitional markup. On the other hand, using such extensions purely to provide graceful degradation for legacy applications seemed like an excellent idea. One of the key benefits of a move to OOXML Strict is that developers targeting future MS Office versions will be able (if it supports Strict properly) to ignore the 1,500 pages of nastiness that is the Transitional format. That would be a definite win.

For fun, during the week, I used SoftMaker’s office suite, which claims support for OOXML. Rather than lengthen even further an already over-long posting and save a report on this software for a later posting …

ODF

WG 6 (concerned with ODF) met for a teleconference at 7am on Thursday and the convenor reported on this, and other ODF-related matters, later that day.

The chief activity in WG 6 at the moment is the creation of an amendment to ISO/IEC 26300 (the International Standard version of ODF) so that it aligns properly with ODF 1.1. The result of this activity will be that the latest version of ODF will be in sync between JTC 1 and OASIS. The drafting work for this is nearly done, with some final tests and tweaks being made to ensure everything has been squared between the variants.

An interesting issue has arisen with the submission to WG 6 by the Dutch of a proposed amendment to ODF which would improve its change tracking to the point where they judged it could be acceptable for government use. Now, the flaky/incomplete nature of change tracking in ODF and its office suites has been something of an elephant in the room ever since Microsoft’s Doug Mahugh blogged about it, and I hear that Microsoft have used some nifty demonstrations of OpenOffice’s change tracking to quell the enthusiasm of large procuring bodies who were considering stepping out of line and switching away from MS Office. Indeed, I believe OASIS’s own Open Document Format Interoperability and Conformance (OIC) TC has a test-case which demonstrates interop problems with change tracking.

So one would have thought that the ODF TC – if they had not done it already – would have leapt at the Dutch proposal as a way to close an important competitive gap … But no, from the minutes of a recent meeting we learn that “the change tracking proposals is a topic for ODF-Next rather than ODF 1.2”; and so it has been deferred to ODF-Next (i.e. two versions of ODF into the future). As I understand it, the ODF TC wishes to meet a certain deadline for release of ODF 1.2, which has already been a fair while in drafting. In a little Twitter exchange I had with a co-chair of the ODF TC (Rob Weir) on this topic he tweeted that ODF perfection was not required. I leave it to readers of this blog to judge whether a desire for solid comprehensive change-tracking is really the same as an unrealistic demand for “perfection” (which, I agree, would be unreasonable).

I can certainly imagine ISO and IEC members being very reluctant to pass a version of ODF (should 1.2 ever come to JTC 1) that does not have a convincing story to tell about change-tracking.

UOML

The week of meetings finished with a 1½ day BRM (ballot resolution meeting) for DIS 14297, also known as “UOML (Unstructured Operation Markup Language) Part 1 Version 1.0” (get it here). Since the process used was an accelerated procedure (called PAS, which is practically the same as a Fast Track) and since the DIS ballot had failed, the meeting had many potential parallels with the OOXML BRM of 2008, and so I was particularly interested to attend – not this time as convenor, but out of the firing-line as a member of the UK delegation.

The problem with UOML 1.0 is, in nearly every aspect, very poor quality – it is almost gibberish, and obviously so to anybody who cares even to glance at it. It makes OOXML look like something written by Bertrand Russell. How the OASIS members could have approved it as a standard boggles the mind; how the OASIS Board could have okayed it for PAS submission boggles the mind; and how it nearly passed its JTC 1 ballot boggles the mind. Fortunately, just enough National Bodies voted against the DIS to make it fail its ballot, and so on Friday afternoon a group of us found ourselves in a room with 139 comments to resolve. After triage we found that left us with 10 minutes per comment – compared to the OOXML BRM this was luxury!

Many aspects of the UOML BRM were familiar: voting on comments in batches, NBs feeling grumpy about having compromised (rather than really good) dispositions; other NBs feeling grumpy about lack of time. Credit must go to the new consulting project editor Joel Marcey, who had valiantly deployed his skills to make very substantial improvements to the text prior to the meeting. And credit must go to Paul Cotton (“The convenor’s convenor”) whose in-the-trenches experiences of chairmanship (SQL, HTML 5, ...) were in evidence to ensure a productive and good-natured meeting.

But in the end it just was not enough. All the dispositions were resolved one way or the other, but a number of NBs had rejected proposals and there was a general perception (by my reckoning) that the standard remained unimplementable in a conformant manner.

So yet again we have an instance of a poorly-drafted standard coming into a process that finds it very difficult to take the strain when there are a reasonably large number of NB comments. I can imagine accelerated standardization working well when a standard is small and perfectly formed but when a standard is large and/or buggy (and when NBs bother to read it), trouble is almost bound to ensue. Although the JTC 1 Directives have been recently revised they do very little to address this particular problem of the BRM being potentially a crazy-time in which major changes are carried out in a very compressed time period. Reform in this area is still badly needed, I believe.

And away from the committees …

Tokyo was quite an experience: extremely humid and hot most of the time, but with one day of torrential rain as we experienced the outer bands of a passing typhoon. For once I was glad to be able to spend most of the time in air-conditioned meeting rooms, even if this meant there were disappointingly few opportunities for photography.

Food was a highlight, especially since Murata-san guided us to a number of truly outstanding eating experiences including a sashimi banquet, a more rustic meal starring yakitori chicken, and a meal which built to a climax of tuna shabu-shabu. All this was washed down with a variety of rice-based, sugar cane-based and potato-based wine and spirits. Yumsk! (and, hic!)


Hello Sake
Hello sake!


[Update: my pictures from the week are here.]

Tags: , , , , , , , , ,

Microsoft Fails the Standards Test

by Alex Brown 31. March 2010 14:47

The second anniversary of the approval of ISO/IEC 29500 (aka OOXML) is upon us. The initial version of OOXML (Ecma 376 1st Edition) was rejected by ISO and IEC members in September 2007, and it was only after extensive revisions and a bitter standards war in the following months that a revised format was finally approved on April 2, 2008.

The key breakthrough of the revision process was the splitting of the specification into two variant versions, called “Strict” and “Transitional”. The National Bodies confined all the technologies they found unacceptable to the Transitional format and dictated text to be included in the standard intended to prohibit its further use:

“The intent […] is to enable a transitional period during which existing binary documents being migrated to DIS 29500 can make use of legacy features to preserve their fidelity, while noting that new documents should not use them. […]

This annex is normative for the current edition of the Standard, but not guaranteed to be part of the Standard in future revisions. The intent is to enable the future DIS 29500 maintenance group to choose, at a later date, to remove this set of features from a revised version of DIS 29500.”

I was convinced at the time, and remain convinced today, that the division of OOXML into Strict and Transitional variants was the innovation which allowed the Standard to pass. Enough National Bodies could then vote in good conscience for OOXML knowing that their preferred, Strict, variant would be under their control into the future while the Transitional variant (which – remember – they had effectively rejected in 2007) would remain purely for the purpose of accurately specifying old documents: a useful aim in itself.

Promises and reality

Just before the final votes were counted, Microsoft made some commitments. Mr Chris Capossela (Senior VP, Microsoft Office) wrote an open letter promising what would happen if the Standard passed. Two years on, we can fill out a report card for a couple of these promises and determine how well Microsoft is doing …

Microsoft's promise on standards support in products


“We've listened to the global community and learned a lot, and we are committed to supporting the Open XML specification that is approved by ISO/IEC in our products.”

On this count Microsoft seems set for failure. In its pre-release form Office™ 2010 supports not the approved Strict variant of OOXML, but the very format the global community rejected in September 2007, and subsequently marked as not for use in new documents – the Transitional variant. Microsoft are behaving as if the JTC 1 standardisation process never happened, and using technologies (like VML) in a new product which even the text of the Standard itself describes as “deprecated” and “included […] for legacy reasons only” (see ISO/IEC 29500-1:2008, clause M.5.1).

Knowledgeable experts present at the Ballot Resolution Meeting, knowing what Microsoft planned, have publicly repeated the International consensus position in alarm. XML Standards guru Rick Jelliffe (an Australian delegate at the meeting) wrote:

“If [Microsoft’s] default format is OOXML Transitional, then they have abandoned support for an Open Standards process: OOXML was only made a standard because of the changes that were made at the BRM. The original ECMA version of OOXML (which is the basis of Transitional) was soundly rejected, let no-one forget.”

And Danish expert and BRM delegate Jesper Lund Stocholm, running an analysis of Office 2010 files wrote:

“It has been the fear of many that Microsoft will never, ever care at all about the strict conformance clause of ISO/IEC 29500, and my tests clearly [are] a sign that they were right.”

Microsoft, however, takes a different view to the independent experts. Their representatives will argue (with some justification) that terms like “legacy”, “deprecated”, and “new document” are tricky to define, but then this argument extends to the bizarre assertion that the Strict variant need never be supported. I believe, however, countries expect a more reasonable, plain-dealing approach to their clearly expressed intent – not this kind of wheedling sophistry. Mr Capossela writes that Microsoft has “learned a lot”; but on the evidence before us now, this was wishful thinking.

Microsoft's promise on standards maintenance


“We are committed to the healthy maintenance of the standard once ratification takes place so that it will continue to be useful and relevant to the rapidly growing number of implementers and users around the world.”

It all started so well – defect reports came in from many national bodies and (via Ecma) from Microsoft themselves. A number of useful improvements were made to the text correcting obvious defects, and (in the Transitional variant) fixing some of the evident mismatches between what the Standard said, and what legacy documents actually contained.

But as time has gone on, the situation has deteriorated. At the recent Stockholm meetings corrections agreed at the February 2007 Ballot Resolution Meeting were still being implemented, and while fixes which were evidently required for Office 2010’s headline conformance behaviour have been given the red carpet treatment, some other reports from National Bodies have been left to languish. Unusually, in Stockholm one of SC 34’s working groups (WG 2) recommended to the plenary that the OOXML maintenance group (WG 4) be reminded to answer overdue defect reports – in the ISO world that counts as a diplomatic incident!

Most worrying of all, it appears that Ecma have ceased any proactive attempt to improve the text, leaving just a handful of national experts wrestling with this activity. It seems to me that Microsoft/Ecma believe 95% of the work has been done to ensure the standard is “useful and relevant”. Looking at the text, I reckon it is more like 95% that remains to be done, as it is still lousy with defects.

Ironically, the failure to resource maintenance properly is only going to damage Microsoft Office in the longer term. The simple validators developed by me (Office-o-tron) and by Jesper Lund Stocholm (ISO/IEC 29500 Validator) reveal, to Microsoft's dismay, that the output documents of the Office 2010 Beta are non-conformant, and that this is in large part due to glaring uncorrected problems in the text (e.g. contradictory provisions). It is also a worrying commentary on the standards-savvyness of the Office developers that the first amateur attempts of part-time outsiders find problems with documents which Redmond’s internal QA processes have missed. I confidently predict that fuller validation of Office document is likely to reveal many problems both with those documents, and with the Standard itself, over the coming years.

So – while maintenance is happening, I think calling it “healthy maintenance” would be over-optimistic given the current circumstances.

Someone has blundered?

Microsoft has many enemies who will no doubt see the current state of affairs as proof that Microsoft never even intended to be good standards citizens. Indeed standards and XML veteran Tim Bray, writing shortly after the standard’s approval, made a prediction which could now seem impressively prophetic:

“It’s Kind of Sad • The coverage suggests that future enhancements to 29500, as worked through by a subcommittee of a subcommittee of a standards committee, are actually going to have some influence on Microsoft. Um, maybe there’s an alternate universe in which Redmond-based program managers and developers are interested in the opinions of a subgroup of ISO/IEC JTC 1/SC 34, but this isn’t it.

I suppose they’ll probably show up to the meetings and try to act interested, but it’s going to be a sideline and nobody important will be there. What Microsoft really wanted was that ISO stamp of approval to use as a marketing tool. And just like your mother told you, when they get what they want and have their way with you, they’re probably not gonna call you in the morning.”

For me, the puzzle of it is that in many respects, Microsoft does appear to get it. Senior management seems to want standards conformance, as Mr Capossela’s letter demonstrates – indeed strategically, playing fair by standards has always seemed like the most obvious way for the corporation to extract itself from the regulatory thickets that have entangled it over the past decade. Microsoft employs many eminent and standards-aware people of unimpeachable record – they also obviously “get it”. And on the ground in the standards committees there are many delightful, talented and diligent people who seem fully-signed up to a standards-aware (dare I say “non-evil”?) approach—as the SC 34 meetings in Stockholm again recently evidenced.

And if we look elsewhere within Microsoft we can see – for example from their engagement with HTML 5 and work on MSIE – that they can move in the right direction when the will is there.

So why – given the awareness Microsoft has at the top, at the bottom, and round the edges – does it still manage to behave as it does? Something, perhaps, is wrong at the centre — some kind of corporate dysfunction caused by a failure of executive oversight.

But whether Microsoft senior management have directed the company to behave badly, or whether they have failed to control a bad corporate impulse, is ultimately of no interest or concern to the National Bodies engaged in Standardization: for them, the effect is the same. Some responses will, however, be necessary.

Moving forward

If Microsoft ship Office 2010 to handle only the Transitional variant of ISO/IEC 29500 they should expect to be roundly condemned for breaking faith with the International Standards community. This is not the format “approved by ISO/IEC”, it is the format that was rejected.

However, it is foolish to believe they won’t ship it as is – and before long the world will be faced with responding to that release. In my view moving forward from there will be difficult …

  • Governments, corporations, other large entities – in fact, anyone – procuring office systems with a requirement for standards-conformance need to have their eyes very wide open about what precisely they will be getting with systems which create new documents which are extended Transitional ISO/IEC 29500.
  • Microsoft Office 2007 (the current version) reads and emits unextended Transitional ISO/IEC 29500, and so – strangely – may represent a high-water mark of Microsoft Office standards conformance. Anybody wanting to work just with documents which (modulo defects) are fully specified by Standards wholly under International control may want to stick with this version of the software.
  • Microsoft should make a public open commitment to support OOXML Strict fully. A service pack bringing this support to Office should be developed as a priority.
  • JTC 1 explicitly created the Transitional variant with the intention they would “at a later date, […] remove this set of features”. Now is the time to start the wheels in motion for this removal (the text will of course remain available for the perfectly good reason that the legacy needs to be documented).
  • Any assurances Microsoft has given to regulatory bodies (such as the EU Commission) about standards conformance must be looked at very carefully giving full consideration to the circumstances of this release.
  • Ecma need to commit adequate resources to standards maintenance and pro-actively seek to improve the text, working together with SC 34, if there is any appetite to improve the Standard to the point where it can be a trouble-free, or even good, basis for interoperable office applications.

In short, we find ourselves at a crossroads, and it seems to me that without a change of direction the entire OOXML project is now surely heading for failure.

Defect-o-tron ?

by Alex Brown 30. March 2010 12:23

At recent SC 34 standards meetings, I've been thinking it would be useful to have XML models for all the documents we habitually have to deal with: agendas, meeting reports, proposals, defect reports and so on.

One particular focus for me at the moment is defect reports. Currently, I prepare these as XML documents conforming to the following home-brewed schema:

namespace xh = "http://www.w3.org/1999/xhtml"

start = comment-batch
any-xhtml =
  (element xh:* { any-xhtml }
   | attribute * { text }
   | text)+
comment-batch =
  element comment-batch {
    (attribute xml:lang { token }
     & attribute xml:id { token }),
    spec,
    submitter,
    comment+
  }
spec = element spec { text }
submitter =
  element submitter {
    attribute type { token },
    text
  }
comment =
  element comment {
    attribute xml:id { token },
    type,
    document,
    (clause | corrigendum),
    page?,
    problem,
    proposal?
  }
type = element type { "G" | "T" | "E" | "C" }
clause = element clause { text }
corrigendum = element corrigendum { text }
problem = element problem { any-xhtml }
proposal = element proposal { any-xhtml }
document = element document { text }
page = element page { text }

As you can see, it's pretty simple and leverages XHTML for the meat of the content (validation is left as an exercise for the reader). Using this, a comment on a Standard looks like this:

<comment xml:id="GB-AB-25" xmlns:xh="http://www.w3.org/1999/xhtml">
<type>T</type>
<document>ISO/IEC 29500-1:2008</document>
<clause>2.4</clause>
<page>3</page>
<problem>
  <xh:p>The first and last items in the bulleted list are contradictory,
    since a document which uses the extensibility mechanisms of Part 3 cannot
    be valid to the Part 1 (or 4) schema.</xh:p>
  <xh:p>Furthermore, Part 1 need not mention the Part 4 schema here.</xh:p>
</problem>
<proposal>
  <xh:p>Qualify the validity requirement with a caveat that such
    documents are pre-processed with the behaviour specified in Part 3
    (once that behaviour is described normatively).</xh:p>
</proposal>
</comment>

Batches of comments are processing into XHTML documents (with XSLT) for review and correction, and – once approved – can then be forwarded for processing with the normal maintenance mechanisms. In the case of WG 4 (OOXML) there is a web form available for submission — but getting my nice XML into the form’s fields requires much tedious copy and pasting.

I suppose this bit can be automated with a programmatic HTTP client. But thinking about this also made me think the defect generation process can be (partly) automated. There are so many formulaic problems with the OOXML text that it should be possible to write programs which feed errors in the text directly into the maintenance process. Well, it would if it wasn’t for the various bureaucratic intervening processes. Hmmm, defect-o-tron anybody?

Tags: , , , , ,

SC 34 meetings in Stockholm last week

by Alex Brown 28. March 2010 10:46

Stockholm dawn
The weather was mostly grey, and the days busy, limiting photo taking time …

 

I have just returned from a week of meetings of ISO/IEC JTC 1/SC 34 in Stockholm, Sweden. Here's an update on what happened ...

WG 1

WG 1 met on the Monday (as usual, we "blaze the trail", working out how the public transport works, locating nearby bars, etc).

Our main project over the last 8 years has been the DSDL project, a multi-part standard for XML schema languages. The final parts of this jigsaw are falling into place: at this meeting we agreed to ballot DSDL Part 11, a specification being co-developed with W3C for Schema Association. Jirka Kosek – a project editor – is the driving force behind this work in WG 1. Rick Jelliffe has also completed a draft of a revision to DSDL Part 3 (more commonly known as Schematron), which will now be sent to ballot to collect National Body feedback.

ISO Zip

The need for an "ISO Zip" standard has long been recognized; I remember this topic first being raised in SC 34 during four years ago at our Seoul meeting. No progress has been made since then, but there have been some problems caused by the lack of a standard — not least the sudden disappearance of a document that ODF relies on!

As the number of XML-in-Zip formats mushrooms, WG 1 has now decided to grasp this nettle and propose a project for "Document Packaging" which will aim to deliver a minimal (yet compatible) file format specification particularly suitable for XML-in-Zip document formats. If successful the new standard will be usable as a drop-in replacement for the currently non-standard references to an over-featured (for their purposes) ZIP specification used by such formats as OOXML, ODF and EPUB.

The New Work Item Proposal was presented to the SC 34 plenary where it was agreed (without dissent) that it should be balloted. National Bodies will comment over the next three months and their responses considered at the Tokyo meetings in September.

An unpleasant surprise

Over the last few years I have been editing a project for Extensible Datatypes (ISO/IEC 19757-5, currently a FCD). As is usual with experts in SC 34, the text is prepared using schema-governed XML and processed into XSL-FO for rendering to PDF according to ISO's layout specification — we have a specification, TR 9573-11 AMD, specifically for that purpose.

This work on my text is nearly done, so imagine my surprise to learn that, all of a sudden, ITTF is now only accepting work that is in "Word format" (on inquiry, this means Microsoft® Word™ 2007). The decision has caused dismay among many SC 34 experts and reeks more of the short-term commercial interests of NBs' commercial publishing wings, than of any concern for document quality, adaptability or long-term preservation. It is a shame that ownership of Windows and MS Office is now apparently a prerequisite for being an JTC 1 Project Editor, and I can imagine more than a few eyebrows being raised if any future International Standard version of ODF needs to prepared using Microsoft Word!

WG 4

WG 4 (OOXML - ISO/IEC 29500 - maintenance) met for two and a half days. There are so many strands of activity here deserving of comment that I will write a separate blog post on this later this week. Stand by!

WG 6

WG 6 (ODF - ISO/IEC 26300 - maintenance) met on Thursday afternoon and Friday morning and was well-attended. The main work of this group is production of an International Standard version of ODF 1.1 that rolls in errata to date, and excellent progress is being made on this. Special praise must go to Dennis Hamilton for pulling an all-nighter (remotely participating from Seattle) and addressing some of the gnarlier problems of text production!

In general, it is good to see the suspicions of the past years now firmly set aside and all participants pulling together in the right direction for the good of ODF. It is also especially good to see the process now working better (if not perfectly) and admitting an International dimension to ODF maintenance; this success is due in no small part to the diligence and diplomacy of the WG 6 Convenor, Francis Cave who, it was jokingly suggested at the Plenary, should be appointed for a term of 30 years as convenor, rather than the normal three!

St Francis
Francis Cave, WG 6 convenor

The CJK lobby

As is usual at these meetings, various kinds of lobbying for various kinds of thing were taking place. Perhaps the prize for most effective operators must go to the CJK (China, Japan, Korea) participants who are working hard to raise awareness of their requirements for page layout and typography. Japan is promoting a project for specifying support for KIHONHANMEN in OOXML and ODF extensions, and plans are being made for a 'Workshop on CJK Issues related to OOXML and ODF extensions' in May (details will appear on the SC 34 web site when available). I wish them well ...

 

CJK
Experts from China, Japan and Korea in discussion

Tags: , , , , , , ,

Document Format Standards and Patents

by Alex Brown 9. March 2010 12:21

This post is part of an ongoing series. It expands on item 9 of Reforming Standardisation in JTC 1.

Background

Historically, patents have been a fraught topic with an uneasy co-existence with standards. Perhaps (within JTC 1) one of the most notorious recent examples surrounded the JPEG Standard and, in part prompted by such problems there are certainly many people of good will wanting better management of IP in standards. Judging by some recent development in document format standardisation, it seems probable that this will be the area where progress can next be made …

Most recently, the Fast Track standardisation of ISO/IEC 29500 in 2007/8 saw much interest in the IPR regime surrounding that text, with much dark suspicion surrounding Microsoft's motives. However, the big development in this space – when it came – was from an unexpected direction …

The i4i Patent

Back in the SGML days I remember touring the floor of trade shows and noticing the S4-Desktop product from Candian company Infrastructures for Information, Inc (i4i). Like a number of other products at the time (including Microsoft's own long-forgotten SGML Author for Word, or Interleaf’s BladeRunner) it attempted to make Word™ a structure-aware authoring environment, based on the (accurate) belief that while many companies wanted structured data they didn't want to have to grapple with pointy brackets.

Keen to avoid the phenomenon that Rob Weir describes whereby

There is perhaps no occasion where one can observe such profound ignorance, coupled with reckless profligacy, as when a software patent is discussed on the web.

I will avoid any punditry about the ongoing legal course of this patent. Those interested would do well to read IP lawyer Andy Updegrove's post (and follow-up) on the legalities of this matter.

On the technical merit of the standard though, there appears to me to be unanimity among disinterested experts qualified to judge. For example Jim Mason (for 22 years the chair of the ISO committee responsible for all-things-markup) commented:

[T]his technique did not originate with i4i. It was already established in other commercial products and was, in effect, standardized in ISO/IEC 8613, Office Document Architecture. ODA essentially described a binary format for word-processor document representation, which worked by pointers into a byte stream. Its original interchange format, ODIF, started as a representation of that structure, but it was extended to have an alternative SGML stream, exported by a process similar to that described in the i4i patent. So there was prior art, specifically prior art described in public standards.

This point was expanded on by markup veteran Rick Jelliffe, who concluded:

By the end of the judgment I was left thinking "what interactive XML system with any links wouldn't be included in this?" which is utterly ridiculous.

I was creating SGML systems from 1989, and the i4i patent is just as obvious then as it is now.

In a Guardian Interview i4i chairman Loudon Owen seemed to make it clear that the patent would not be licensed on a reasonable and non-discriminatory (RAND) basis (at least – or especially – where Microsoft are concerned):

On licensing to Microsoft, Owen sounds on the edge of anger: "No. No. This is our property. We are going to build our business. There's no right for Microsoft to use it and go forward." But i4i could license it at some humungous, eye-watering price that Microsoft might have to pay, surely? No, says Owen.

The Wider Context

As part of its amicus brief (PDF) in the Bilski case pending before the Supreme Court, IBM offered what might be termed the orthodox pro-patent position. In a section headed “Software Patent Protection Provides Significant Economic, Technological, and Societal Benefits” we thus find a footnote quoting this text:

Given the reality that software source code is human readable, and object code can be reverse engineered, it is difficult for software developers to resort to secrecy. Thus, without patent protection, the incentives to innovate in the field of software are significantly reduced. Patent protection has promoted the free sharing of source code on a patentee’s terms—which has fueled the explosive growth of open source software development.

While it is somewhat surpising to learn here of the affinity between FOSS and patents, the point is of course that the idea of patents is not wholly without foundation: that a state-sanctioned restraint of trade (for such is a patent) is justified in allowing innovators to monetize their inventions. However, increasingly when we listen to the voices of actual FOSS (and non-FOSS) people the view seems to be that any advantages are outweighed by the problems of patents. For example Mike Kay (developer of the superb Saxon family of XSLT, XQuery, and XML Schema processing products) in an open letter to his MP argues against software patenting in a piece which is well-worth reading in its entirety:

The software business does not need incentives to innovate. If you don't innovate, you die. [...] [I]n the software business, patenting of ideas benefits no-one: certainly, it does not benefit society or the economy at large, which is the only possible justification for governments to interfere with the market and grant one company a monopoly over an idea.

And, in specific reference to the i4i patent:

recently an otherwise unsuccessful company has been awarded a similar [i.e. 9-figure] sum against Microsoft, for an idea which most people in the industry considered completely trivial and obvious.

More colourfully Tim Bray lists some horror-story cases (again well worth reading) and opines that the whole patent system is "too broken to be fixed". He also addresses the question of whether patent activity benefits society, and comes down firmly against:

And here are a few words for the huge community of legal professionals who make their living pursuing patent law: You’re actively damaging society. Look in the mirror and find something better to do.

The Myth of Unencumbered Technology

Given the situation we are evidently in, it is clear that no technology is safe. The brazen claims of corporations, the lack of diligence by the US Patent Office, and the capriciousness of courts means that any technology, at any time, may suddenly become patent encumbered. Technical people - being logical and reasonable - often make the mistake of thinking the system is bound by logic and reason; they assume that because they can see 'obvious' prior art, then it will apply; however as the case of the i4i patent vividly illustrates, this is simply not so.

Turning to document format standards, we can see there most certainly are known and suspected patents in play. For example:

  • the i4i patent mentioned above (which, in his Guardian interview, the i4i Chairman refuses to rule out as applying to ODF)
  • 45 unspecified patents which Microsoft has claimed OpenOffice.org infringes, some number of which may relate to the ODF specification (and which Sun and Microsoft agreed a cease-fire over until 2014 - at least as far as Sun is/was concerned)
  • an unknown number of unspecified patents which have led IBM to include ODF under its Interoperability Specifications Pledge
  • an unknown number of unspecified patents which have led Microsoft to include OOXML under its Open Specification Promise (though presumably clear OOXML-specific patents such as US Patent 7,676,746 are in scope here)

Now, as is clear from the above, large corporations have a preferred means of neutralising their IP stake in standards: by "promises", "covenants" and the like.

The question for standardizers remains: is the current situation acceptable? and if not, what can be done to improve it?

The ISO Rules (and Are They Followed?)

Since 2007 the "big three" International SDOs (ISO, IEC and ITU-T) have operated a common patent policy predicated on the wholly reasonable premise that standards should be "accessible to everybody without undue constraints". The policy is implemented in detail by JTC 1 (which joins the forces of ISO and IEC) and which – as we know – governs the International Standards ODF and OOXML.

The Policy as implemented in the Directives has several aspects, which I would categorise as falling under the following headings …

Personal Disclosure

Anybody aware of an IPR issue has a duty to speak out:

any party participating in the work of the Organizations should, from the outset, draw their [sic] attention to any known patent or to any known pending patent application, either their own or of other organizations. (ISO Directives Part 1, Clause 3)

And indeed committee secretaries and chairs are routinely reminded by Geneva to issue a request for IPR disclosure at meetings, to jog people's memory.

Formal Disclosure in Standards

Readers of Standards can expect to have the IPR/patent situation made explicit in the text before them, and accordingly there are may textual items mandated for Standards to which patents apply. In particular it is stated, "[a] published document for which patent rights have been identified during the preparation thereof, shall include the following notice in the introduction:"

The International Organization for Standardization (ISO) [and/or] International Electrotechnical Commission (IEC) draws attention to the fact that it is claimed that compliance with this document may involve the use of a patent concerning (…subject matter…) given in (…subclause…).

Centralised Record-keeping

A JTC 1 "patent database" (served as a huge HTML document) is maintained in Geneva which gathers together all the patents applying to published standards, and the terms under which patent holders have agreed to make licenses available.

Clear Access Rights

Patent Holders who have signed the licensing declaration to ISO, IEC or ITU-T agree to license their patents under a clear regime: either RAND, ZRAND (i.e. RAND with a free-of-charge license), or – exceptionally – on a per-case commercial basis. Anybody accessing the patent database is able to see this and, by referring to the ISO/IEC governing documents, know what it means, not least because no deviations from Geneva's wording are permitted:

the patent holder has to provide a written statement to be filed at ITU-TSB, ITU-BR or the offices of the CEOs of ISO or IEC, respectively, using the appropriate "Patent Statement and Licensing Declaration" Form. This statement must not include additional provisions, conditions, or any other exclusion clauses in excess of what is provided for each case in the corresponding boxes of the form.

Problem Handling

And if things go wrong:

2.14.3 Should it be revealed after publication of a document that licences under patent rights, which appear to cover items included in the document, cannot be obtained under reasonable and non-discriminatory terms and conditions, the document shall be referred back to the relevant committee for further consideration.

Unfortunately, when we hold up the big two document standards of ODF and OOXML against the goals set out, we see there is work still to be done …

Moving Forward

While the "broken stack" of patents is beyond repair by any single standards body, at the very least the correct application of the rules can make the situation for users of document format standards more transparent and certain. In the interests of making progess in this direction, it seems a number of points need addressing now.

  • Users should be aware that the various covenants and promises being pointed-to by the US vendors need not be relevant to them as regards standards use. Done properly, International Standardization can give a clearer and stronger guarantee of license availability – without the caveats, interpretable points and exit strategies these vendors' documents invariably have.
  • In particular it should be of concern to NBs that there is no entry in JTC 1's patent database for OOXML (there is for DIS 29500, its precursor text, a ZRAND promise from Microsoft); there is no entry whatsoever for ODF. I would expect there to be declarations from the big US vendors who profess patent interests in these standards, and I would expect this to be addressed as a matter of urgency (perhaps in parallel with the publication of these standards' forthcoming amendments)
  • In the case of the i4i patent, one implementer has already commented that implementing CustomXML in its entirety may run the risk of infringement (and this is probably, after all, why Microsoft patched Word in the field to remove some aspects of its CustomXML support). OOXML needs to be referred back to its committee (this may be JTC 1, not SC 34) for a decision on what happens next. My personal guess is that CustomXML will be left in OOXML Transitional (patent-encumbrance will be just one more of the many warning stickers on this best-avoided variant), and modified in, or removed from, OOXML Strict
  • When declaring their patents to JTC 1, patent holders are given an option whether to make a general declaration about the patents that apply to a standard, or to make a particular declaration about each and every itemized patent which applies. I believe NBs should be insisting that patent holder enumerate precisely the patents they hold which they claim apply to ODF or OOXML, as this will give greater transparency about what is (or is not) covered and prevent the vague threat ("there may be patents but we're not saying what") which seems to apply at the moment.

There is obviously much to do, and I am hoping that at the forthcoming SC 34 meetings in Stockholm this work can begin. Certainly, anybody reading this blog post now knows there are outstanding IPR issues which we as standardizers have a duty to raise …

SC 34 WG meetings in Paris next week

by Alex Brown 25. November 2009 20:18

Once again I feel that bubbling up of almost schoolboy fervour that presages a set of SC 34 meetings. In Paris no less (AFNOR shall be our hosts): the city of love, fine art and rognons à la moutarde.

What is tasty on SC 34’s menu? Well four working groups are meeting next week:

  • WG 1 (which I convene) will be carrying forward its work on foundation standards – particularly the schema languages of DSDL. We have two new (/proposed) projects to discuss: one a schema language focussed on cross-reference validation; one on associating schemas with documents using processing instructions. Probably our most successful schema language, RELAX NG, is due for an update and several new features are up for discussion: keep it coming!
  • WG 4 (OOXML) will continue its intensive maintenance on ISO/IEC 29500 – not least in handling a new set of approved corrigenda (just voted on) and dealing with the day-to-day grind of correction and improvement. There are larger questions to answer too, in particular those which concern the relationship between the Strict and Transitional forms of OOXML. I have led the preparation of a background paper on this which (thanks to the newly open WG 4 mail archive) can be accessed as a public document (PDF). I predict some lively discussion!
  • WG 5 (OOXML/ODF interop) will continue its work examing how (or not) the two formats may be used by systems which hope to interoperate. TR 29166 - dedicated to this topic - continues to take shape ahead of its projected finishing date in 2011.
  • WG 6 is the newly-created WG dedicated to the JTC 1 side of maintenance of ISO/IEC 26300:2006 (aka ODF v1.0). As a newly-created group there will no doubt be a certain amount of adminstrivia to be got through but there are more substantial issues looming too: defect reports to be advanced and the longer-term project of amending ISO/IEC 26300 to bring it into alignment with ODF 1.1 – there is general agreement that it makes sense to reduce marketplace confusion by reducing the confusing number of standard (and non-standard) “ODF” variants out there, and aligning versions between standards organisations.

Stay tuned (and follow hashtag #sc34 for real-time updates) …

Tags: , , , , , ,

SC 34 Meetings, Seattle - Day 1

by Alex Brown 13. September 2009 16:03

At the invitation of ANSI, SC 34 is in sunny Bellevue for five jam-packed days of Standards meetings (Sunday-Thursday). This is a full and busy event, with around 60 delegates registered from 14 countries (Belgium, Brazil, Canada, China, Denmark, Finland, France, Germany, India, Korea, Norway, South Africa, UK, and the USA) and 4 liaison organisations (Ecma, OASIS, W3C and the XML Guild).

Maybe by the end of it a number of momentous questions will have been answered, including:

  • Whether the world needs a standardised way to associate XML documents with schemas
  • Whether OOXML Transitional should be evolved
  • How ISO/IEC 26300 shall be maintained within SC 34
  • How standard schemas should be licensed to users
  • How MIME types should best be used for identifying document formats

Stay tuned ...


About the author

Alex Brown


Links

Legal

The author's views contained in this weblog are his, and not necessarily of any organisation. Third-party contributions are the responsibility of the contributor.

This weblog’s written content is governed by a Creative Commons Licence.

Creative Commons License     


Bling

Use OpenDNS  

profile for alexbrn at Stack Overflow, Q&A for professional and enthusiast programmers

Quotable

Note that everyone directly involved in the development of ISO standards is a volunteer or funded by outside sponsors. The editors, technical experts, etc., get none of this money. Of course, we must also consider the considerable expense of maintaining offices and executive staff in Geneva. Individual National Bodies are also permitted to sell ISO standards and this money is used to fund their own national standards activities, e.g., pay for offices and executive staff in their capital. But none of this money seems to flow down to the people who makes the standards.

Rob Weir

RecentComments

Comment RSS