Mastodon
Where is there an end of it? | All posts tagged 'schema'

Notes on Document Conformance and Portability #3

Now that the furore about Microsoft’s implementation of ODF spreadsheet formulas in Office SP2 has died down a little, it is perhaps worth taking a little time to have a calm look at some of the issues involved.

Clearly, this is an area where strong commercial interests are in play, not to mention an element of sometimes irrational zeal from those who consider themselves pro or anti (mostly anti) Microsoft.

One question is whether Microsoft did “The Right Thing” by users in choosing to implement formulas the way they did. This is certainly a fair question and one over which we can expect there to be some argument.

The fact is that Microsoft’s implementation decision means that, on the face of it, they have produced an implementation of ODF which does not interoperate with other available implementations. Thus IBM blogger Rob Weir can produce a simple (possibly simplistic) spreadsheet, “Maya’s Wedding Planner” and used it to illustrate, with helpful red boxes for the slow-witted, that Microsoft’s implementation is a “FAIL” attributable to “malice or incompetence”. For good measure he also takes a side-swipe at Sun for their non-interoperable implementation. In this view, interoperability aligning with IBM’s Symphony implementation is – unsurprisingly – presented as optimal (in fact, you can hear the sales pitch from IBM now: “well, Mr government procurement officer, looks like Sun and MS are not interoperable, you won’t want these other small-fry implementations, and Google’s web-based approach isn’t suitable – so looks like Symphony is the only choice …”)

Microsoft have argued back, of course, most strikingly in Doug Mahugh’s 1 + 2 = 1? blog posting, which appears to present some real problems with basic spreadsheet interoperability among ODF products using undocumented extensions. The MS argument is that practical ODF interoperability is a myth anyway, and so supporting it meaningfully is not possible (in fact, you can hear the sales pitch from MS now: “well, Mr government procurement officer, looks like ODF is dangerously non-interoperable: here, let me show you how IBM and Sun can’t even agree on basic features; but look, we’ve implemented ISO standard formulas, so we alone avoid that – and you can assess whether we’re doing what we claim – looks like MS Office is the only choice …”)

Personally, I think MS have been disappointingly petty in abandoning the “convention” that the other suites more or less use. I accept that these ODF implementations have limited interoperability and are unsafe for any mission-critical data, but for the benefit of the “Maya’s Wedding Planner” type of scenario, where ODF implementations can actually cut it, I think MS should have included this legacy support as an option, even if they did have to qualify that support with warning dialogs about data loss and interoperability issues.

But - vendors are vendors; it is their very purpose to compete in order to maximise their long-term profits. Users don’t always benefit from this. We really shouldn’t be surprised that we have IBM, Sun and Microsoft in disagreement at this point.

What we should be surprised about is how this interoperability fiasco has been allowed to happen within the context of a standard. To borrow Rick Jelliffe’s colourfully reported words, the whole purpose of shoving an international standard up a vendor’s backside it to get them to behave better in the interests of the users. What has gone wrong here is in the nature of the standard itself. ODF offers an extremely weak promise of interoperability, and the omission of a spreadsheet formula specification in ODF 1.1 is merely one of the more glaring facets of this problem. As XML guru James Clark wrote in 2005:

I really hope I'm missing something, because, frankly, I'm speechless. You cannot be serious. You have virtually zero interoperability for spreadsheet documents.

To put this spec out as is would be a bit like putting out the XSLT spec without doing the XPath spec. How useful would that be?

It is essential that in all contexts that allow expressions the spec precisely define the syntax and semantics of the allowed expressions.

These words were prophetic, for now we do indeed face a present zero interoperability reality.

The good news is that work is underway to fix this problem: ODF 1.2 promises, when it eventually appears, to specify formulas using the new OpenFormula specification. When that is published vendors will cease to have an excuse to create non-interoperable implementations, at least in this area.

Is SP2 conformant?

Whether Microsoft’s approach to ODF was the wisest is something over which people may disagree in good faith. Whether their approach conforms to ODF should be a neutral fact we can determine with certainty.

In a follow-up posting to his initial blast, Rob Weir sets out to show that Microsoft’s approach is non-conformant, subsequent to his previous statement that “SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard”. After quoting a few selected extracts from the standard, a list is presented showing how various implementations represent a formula:

  • Symphony 1.3: =[.E12]+[.C13]-[.D13]
  • Microsoft/CleverAge 3.0: =[.E12]+[.C13]-[.D13]
  • KSpread 1.6.3: =[.E12]+[.C13]-[.D13]
  • Google Spreadsheets: =[.E12]+[.C13]-[.D13]
  • OpenOffice 3.01: =[.E12]+[.C13]-[.D13]
  • Sun Plugin 3.0: [.E12]+[.C13]-[.D13]
  • Excel 2007 SP2: =E12+C13-D13

Rob writes, “I'll leave it as an exercise to the reader to determine which one of these seven is wrong and does not conform to the ODF 1.1 standard.”

Again, this is clearly aimed at the slow witted. One can imagine even the most hesitant pupil raising their hand, “please Mr Weir, is it Excel 2007 SP2?” Rob however, is too smart to avoid answering the question himself, and anybody who knows anything of ODF will know that, in fact, this is a tricky question.

Accordingly, Dennis Hamilton (ODF TC member and secretary of the ODF Interoperability and Conformance TC) soon chipped in among the blog comments to point out that ODF’s description of formulas is governed by the word “Typically”, rendering it arguably just a guideline. And, as I pointed out in my last post, it is certainly possible to read ODF as a whole as nothing more than a guideline.

(I am glad to be able to report that the word “typically” has been stripped from the draft of ODF 1.2, indicating its existence was problematic.)

Curious readers might like to look for themselves at the (normative) schema for further guidance. Here, we find the formal schema definition for formulas, with a telling comment:

<define name="formula">
  <!-- A formula should start with a namespace prefix, -->
  <!-- but has no restrictions-->
  <data type="string"/>
</define>

Which is yet another confirmation that there are no certain rules about formulas in ODF.

So I believe Rob’s statement that “SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard” is mistaken on this point. This might be his personal interpretation of the standard, but it is based on an ingenious reading (argued around the meaning of comma placement, and privileging certain statements over other), and should certainly give no grounds for complacency about the sufficiency of the ODF specification.

As an ODF supporter I am keen to see defects, such as conformance loopholes, fixed in the next published ODF standard. I urge all other true supporters to read the drafts and give feedback to make ODF better for the benefit of everyone, next time around.

XML Prague 2009, Day 1

Night Falls on Old Prague

I am in Prague for the XML Prague conference, and for a week of meetings of ISO/IEC JTC 1 SC 34. Here is a running report of day 1 of the conference ...

Day 1 kicked off, after a welcome from Mohamed Zergaoui, with a presentation from Mike Kay (zOMG - no beard!) on the state of XML Schema 1.1. Mike gave a lucid tour of XML Schema's acknowledged faults, but maintained these must not distract us too much from the technology's industrial usefulness. XML Schema 1.1 looks to me mostly like a modest revamp: some tidying and clarification under the hood. One notable new feature is however to be introduced: assertions - a cut down version of the construct made popular by Schematron. Mike drew something of collective intake of breath when he claimed it was to XML Schema 1.1's advantage that it was incorprating multiple kinds of validation, and that it was "ludicrous" to validate using multiple schema technologies.

A counterpoint to this view came in the next presentation from MURATA Makoto. Murata-san demonstrated the use of NVDL to validate Atom feed which contain extension, claiming NVDL was the only technology that allows this to be done without manually re-editing the core schemas every time a new extension is used.

After coffee, Ken Holman presented on "code lists" - a sort of cinderalla topic within XML validation but an important one, as code lists play a vital role in document validity in most real world XML documents of any substance. Ken outlined a thorough mechanism for validation of documents using code lists based on Genericode and Schematron.

Before Lunch,  Tony Graham took a look at "Testing XSLT" and gave an interesting tour of some of the key technologies in this space. One of his key conclusions, and one which certainly struck a chord with me, was the assertion that ultimately the services of our own eyes are necessary for a complete test to have taken place

Continuing the theme, Jeni Tennison introduced a new XSLT testing framework of her invention: XSpec. I sort of hope I will never have to write substantial XSLTs which merit testing, but if I do then Jeni's framework certainly looks like TDB for TDD!

Next, Priscilla Walmsley took the podium to talk about FunctX a useful-looking general-purpose library of XPath 2.0 (and therefore XQuery) function. Priscilla's talk nicely helped to confirm a theme that has been emerging today, of getting real stuff done. This is not to say there is not a certain geeky intellectualism in the air - but: it's to a purpose.

After tea, Robin Berjon gave an amusing tour of certain XML antipatterns. Maybe because his views largely coincided with mine I thought it a presentation of great taste and insight. Largely, but not entirely :-)

Next up, Ari Nordström gave a presentation on "Practical Reuse in XML". His talk was notable for promoting XLink, which had been a target of Robin Berjon's scorn in the previous session (though now without some contrary views from the floor). Also URNs were proposed as an underpinning for identification purposes - a proposal which drew some protests from the ambient digiverse

To round off the day's proceedings, George Cristian Bina gave a demo of some upcoming features in the next version of the excellent oXygen XML Editor. This is software I am very familiar with, as I use it almost daily for my XML work. George's demo concentrated on the recent authoring mode for oXygen, which allows creation of markup in a more user-friendly wordprocessor-like environment. I've sort of used this on occasion, and sort of felt I've enjoyed it at the time. But somehow I always find myself gravitating back to good old pointy-bracket mode. Maybe I am just an unreconstructed markup geek ...


Breakfast Geek-out
Breakfast Geek-out