Notes on Document Conformance and Portability #1

Richard Gillam’s handy book, Unicode Demystified: A Practical Programmers Guide to the Encoding Standard, contains an example of right-to-left text appearing in a prevailing left-to-right writing direction:

Avram said “מזל טוב.‏” and smiled.

Whether you see here what you are meant to see here will depend on your browser's Unicode support, and whether you have Hebrew fonts installed. Properly rendered, it will look something like this:

In reading order, the first character after “said” is the “מ” character to the left of the closing quotation mark. The text then runs from right to left until the full-stop, and then resumes with “and smiled”. In Unicode, this text is not represented in rendering order, but reading order – it is up to the renderer to make space and reverse direction at the correct points. Here is the text represented as XML in a paragraph in an ODF document (get the document here):

<text:p>Avram said “&#x5de;&#x5d6;&#x5dc; &#x5d8;&#x5d5;&#x5d1;.&#x200f;” and smiled.</text:p>

One of the great things about XML is its solid basis in Unicode and therefore its use of the Universal Character Set (ISO/IEC 10646). XML defines a number of encodings for this character set, and in the XML above the numeric character reference mechanism is used for the Hebrew characters. Notice, just to the left of the full stop the use of U+200F 'RIGHT-TO-LEFT MARK' which specifies that the full stop is part of the right-to-left character sequence.

Viewing this document in three ODF applications (OpenOffice 3, Google Docs with FireFox, and the new MS Office 2007 SP2) give the correct result every time. That is good news.

And if, for an ODF application, the character sequence did not appear correctly (if, say, the full stop was out-of-place) we would be able to say unequivocally that it was faulty; and we would be able to point to the Unicode specification where the correct behaviour was described. We (the user) would be able to bang the table and demand the bug was fixed.

This kind of process is one one of the pillars of conformance testing: application conformance testing, to be exact. Where we have a solid spec and observable behaviour we can compare the two and make a judgement.

Where we don't have a solid spec, things get trickier. For the standardiser's viewpoint, and if its not too highfalutin (and anyway, I claim Cambridge resident's special rights), we might want to quote Wittgenstein on such occasions: "Whereof one cannot speak, thereof one must be silent".

Nikon D300 Woes

Nikon D300: Dreaded F0 Problem

The problem first surfaced in Prague, and has happened a couple of times since. The display shows F0 (as shown), the lens becomes fully stopped-down and autofocus stops working. Poking around the web, it seems this is a far from uncommon problem with Nikon D300s (see for example here, here, or here).

Now, the fault is intermittent – it generally happens after a good few minutes or hours of shooting and then mysteriously clears several hours later. So, when it last struck I took the above picture. Today I had a chance to lug the D300 into Cambridge to return it to Jessops. Knowing a bit about the Sale of Goods Act I was expecting to get a new unit, or an equivalent loaner while the D300 was repaired. With a family holiday coming up I don’t want to be without a camera!

Sure enough, the fault refused to show itself at the camera shop. So - lucky I took a picture, I say. There’s no verifying serial number on it they say. Jessops insisted they would have to send the camera away to verify for themselves that it was faulty. And no, they weren’t going to replace it; and no, they don’t ever loan replacement cameras. What about my consumer rights? Jessop’s seem to think it is okay to take the unit away for independent testing before those rights come into play, and that a photograph of the fault isn't sufficient evidence: that is company policy. Needless to say, as somebody who has spent large sums of money with Jessop’s over the years, this “no can do” attitude caused distinct irritation, and I made my displeasure felt before taking my £1,000 of new faulty camera away with me, hoping it won't misbehave too badly on holiday.

What next?

Well, to anybody contemplating buying a Nikon D300, I say – be aware of this potential problem.

To anybody contemplating buying photo equipment from Jessop’s I say – cross your fingers it doesn’t develop an intermittent fault that you can’t prove beyond doubt, otherwise you’re going to find yourself, like me, in an unhappy place.

Meanwhile I have contacted Jessops' Customer Liaison; let's see what they say ...

Naxos Music Library

For the past few months I have been trying the Naxos Music Library, a service which, for €25/month allows you to listen to as much streaming classical music (and jazz) as you can take. Now that might seem a bit steep, but consider this gives you access to the whole Naxos catalogue (which, for non-US residents, has a large number of out-of-copyright classic recordings), and the catalogues of other great labels like BIS, Chandos, Hungaroton, Music & Arts, Ondine and more …

Quality is fine (256kb/s if you've the bandwidth), and I've really enjoyed listening to, for example:

But…

The technologies available to stream the music are Flash, Silverlight or "Windows Media Player", and all of them have a huge problem: they cause pauses between tracks. On my fast (20Mb/s) connection it's for just a couple of seconds, but the effect can be ruinous. So today (it being Easter time) I am listening to the famous 1951 Knappertsbusch/Bayreuth recording of Parsifal — at the end of the weightiest, most yearning Vorspiel one can imagine, as the woodwind sing out into the Bayreuth gloom and the strings disappear into the stratosphere we are suddenly plunged into inky digital silence before the first scene begins. The spell is broken and part of my soul is chipped away.

This is a serious problem and really rules this service out of contention for music lovers who want to listen to opera or through-composed pieces where gapless replay is essential. If nothing else, it would be a reasonable stop-gap (hah!) to rip entire discs into single streamable entities as an alternative way of getting the bits.

SC 34 Meetings, Prague, Days 2, 3 & 4

What a Difference a Day Makes
MURATA Makoto, WG 4 convenor,
seems pleased with progress on the maintenance of OOXML

I had been intending to write a day-by-day account of these meetings but as it turned out there simply was not time for blogging (tweeting, on the other hand …). Another activity which suffered was photography: I had wanted to take a load of pictures of über-photogenic Prague – but somehow the work seemed to expand into all my notionally free time too. What I did manage to snap is here. And it is also well worth checking out Doug Mahugh’s photos for more.

WG 1 met again on Tuesday to finish its business (you can read the meeting report here) and then my attention mostly turned mostly to the activities in WG 4 (for OOXML maintenance) and WG 5 (which is for OOXML / ODF interop).

OOXML One Year On

Overall, WG 4 is making excellent progress. To date, 169 defects have been collected for OOXML (check out the defect log) and the majority of these have either been closed, or a resolution agreed. Amendments were started for 3 of the 4 parts of OOXML to allow a bunch of small corrections to be put in place, and the even-more-minor problems will be fixed by publishing technical corrigenda. Overall, I think stakeholders in OOXML can feel pretty confident that the Standard is being sensibly and efficiently maintained.

I was personally very pleased to see National Bodies well-represented (the minutes are here) – to the extent that I’d now ideally like to see some more big vendors coming to the table so their views can be heard. Microsoft (of course) was; but where (for example) are Apple, Oracle and the other vendors who participated in Ecma TC 45 while OOXML was being drafted? To them – and to anybody who wants to get involved in this important work – I say: participate!

Over at Rick Jelliffe’s blog Rick has been carrying out something of an exposé of the unfortunate imbalance in the stakeholders represented in the maintenance of ODF at OASIS (something which will become even more acute if Sun is, in the end, snapped-up by IBM). Personally I think Rick is right that it is vitally important to have a good mix of voices at the standardisation table: big vendors, small vendors, altruistic experts, users, government representatives, etc. WG 4 is getting there, but it too has some way to go.

Tougher Issues

Some more controversial topics were however not resolved during the meeting, and I think it may be worth exploring these in more detail:

The Namespace Problem

One issue in particular has proved thorny: whether to changes Namespaces in the OOXML schemas. This topic took a good slice of Tuesday, and then segued into a bar session afterwards; this then carried on over supper and by the time we broke it was midnight. And still no conclusion has been reached. WG 4 has issued a document outlining the state of discussions to date …

I have already expressed my own views on this; but during these Prague meetings some important new considerations were brought to bear. Go over to Jesper Lund Stocholm’s blog to get the thorny details

Personally, I think the “strict” format is a new format, and that changing the Namespace is only part of the solution. I would like to see the media type changes and for OOXML to recommend a new file extension (.dociso anyone?) to reduce the chances that users suffer the (to me) unacceptable fate of silent data loss that Jesper highlights.

Whither Transitional?

Another hot topic of discussion is what the “transitional” version of OOXML is really for. One interesting and slightly surprising fact that emerged after the BRM is that the strict schemas are a true subset of the transitional schemas. Should this nice link be preserved? Microsoft are keen for new features introduced in the “strict” version of OOXML to be mirrored in the “transitional” version – presumably, in part, because Office 14 will use transitional features.

Openness

When the maintenance of OOXML was being planned, one of the principles agreed on by the National Bodies was that the process should be as open as possible, consistent with JTC 1 rules. One aspect of this is the question of whether WG 4’s mailing list archive should be open to the public. Some NBs were a little nervous of this for the reason that their committee members might be less free to post candid comments if they were open to public scrutiny, and possible repercussions with their boss and/or the tinfoil brigade in the blogosphere. There was also the troubling precedent of the mailing list of the U.S. INCITS V1 committee, which opened its archive to public view during the DIS 29500 balloting period, only to see it die completely as contributors refused to post in public view.

The issue will now be put to public ballot, and I am hopeful that mechanisms have been put in place which will allow NBs to support opening of the archive. With public standards, public meeting reports, public discussion documents and a public mailing list archive I think WG 4 will demonstrate that an excellent degree of openness is indeed possible even within the constraints of the current JTC 1 Directives.

Overturning BRM decisions

The UK proposed an interesting new defect during the Prague meetings, which centred on one of the decisions made at the BRM.

Nature of the Defect:

As a result of changes made at the BRM, a number of existing Ecma-376 documents were unintentionally made invalid against the IS29500 transitional schema. It was strongly expressed as an opinion at the BRM by many countries that the transitional schema should accurately reflect the existing Ecma-376 documents.

However, at the BRM, the ST_OnOff type was changed from supporting 0, 1, On, Off, True, False to supporting only 0, 1, True, False (i.e. the xs:boolean type). Although this fits with the detail of the amendments made at the BRM, it is against the spirit of the desired changes for many countries, and we believe that due to time limitations at the BRM, this change was made without sufficient examination of the consequences, was made in error by the BRM (in which error the UK played a part), and should be fixed.

Solution Proposed by the Submitter

Change the ST_OnOff type to support 0, 1, On, Off, True and False in the Transitional schemas only.

The result of the BRM decision being addressed here was apparent in a blog entry I wrote last year, which attracted rather a lot of attention.

Simply put, the UK is now suggesting the BRM made a mistake here, and things should be rectified so that existing MS Office documents “snap back” into being in conformance with 29500 transitional.

This proposal caused some angst. Who were we (some asked) to overturn decisions made at the BRM? My own view is less cautious: this was an obvious blunder, the BRM got it wrong (as it did many things, I think). So let’s fix it.

Whither WG 5?

In parallel with WG 4, WG 5 (the group responsible for ODF/OOXML interoperability) also met. One of the substantive things it achieved was to water down the title of the ongoing report being prepared on this topic, changing it from:

OpenDocument Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation

to:

OpenDocument Format (ISO/IEC 26300) / Office Open XML (ISO/IEC 29500) Translation Guidelines

Adding the word “guidelines” to the title should make it clear to anybody noticing this project that it is not an “answer” to ODF/OOXML interoperability, merely a discursive document. For myself, I have doubts about the ultimate usefulness of such a document.

It is disappointing to see the poor rate of progress on meaningful interoperability and harmonisation work. Of course these things are motherhood and apple pie in discussion – but when the time comes to find volunteers to actually help, few hands go up. In my view, the only hope of achieving any meaningful harmonisation work is to get Another Big Vendor interested in backing it, and I know some behind-the-scenes work will be taking place to beat the undergrowth and see if just such a vendor can be found.

De Gustibus

Sarah has persuaded me to submit an entry for the The normblog Posterity Collection poll. It is:

1. Poet - Chaucer
2. Playwright - Shakespeare
3. Novelist - E.M. Forster
4. Composer - Beethoven, Bruckner, Mozart, Schubert
5. Jazz musician - Milt Jackson
6. Rock or pop star/group - ABBA
7. Country music ditto
8. Movie director - Michael Powell
9. Painter - Vermeer
10. Photographer
11. Sculptor - Richard Serra
12. Architect

E.M. Forster - really? Strange what this exercise makes you discover about yourself ...

(Actually, I'm distracted by the fact there is no Unicode character for the backwards B in ABBA).