Arguing About Archiving
A few people have been discussing the concept of archival data formats and whether ODF and MS/ECMA-OOXML fit the bill. Brian Jones of Microsoft is an honest person in a tough position, so we must be gentle with him.
I think as we see more and more applications pop up that support OpenXML (besides those built by Microsoft), you’ll start to see the anti-OpenXML folks calm down a bit. The ideal in any archival format is that it allows for long term access with as little disturbance as possible. That was the whole point of OpenXML. Give the world a fully documented format and pass the ownership of that format over to a standards body for safe keeping and future development. OpenXML allows anyone to build tools that read and write the formats, and at the same time is designed to cause the least amount of disruption possible. You can move all your existing documents into OpenXML, and you won’t lose a thing.
The problem with that is it isn't true. When those with legal training say that Microsoft's patent pledge is "quicksand of unknown depth", then it makes it clear that they have not opened MS/ECMA-OOXML up enough to make it safe for anyone else to implement. Furthermore, they refused to make the changes suggested by people like Ben Langhinrichs, who works with MSFT to solve customers' needs—how, then, do they expect those who are not their partners to use MS/ECMA-OOXML?
Jones writes in reference to an article by James Governor. Governor writes that he does not see where the need for document archiving necessarily favors ODF over other formats.
But what I really take issue with is the idea that Microsoft’s OpenXML format could become uninspectable in future. I just don’t buy it. Today I saw that WordPress can render Microsoft Office documents using Thinkfree. I also learned that the Powerpoint equivalent Google just acquired allowed the authoring of Powerpoint slides without using Powerpoint. It begins to put me in mind of Mark Twain – the only things sure in life are death, taxes, and third parties (reverse) engineering around Microsoft Office formats.
That is all well and good in theory, but when you are the person who has to try to retrieve and access older documents that your software provider's current software no longer supports, as I do, you quickly find that you want a fully-open, cross-vendor format for all of your documents, a format that allows you, if necessary, to write your own read and write filters and get access to your data that way. I do not see MS/ECMA-OOXML as being that format. I do see ODF as being one of a select few formats, together with a few others, that will meet the needs of users, including government agencies, for vendor-independent file formats.
The truth is, the legacy binary formats are not fully understood outside of Microsoft. I was working on a complex document (an IT training and standards manual) in my workplace, as part of a special-projects group. I found that OOo's handling of .doc files was good enough for simpler docs, but when there are complex layouts and embedded screen shots, the limits of reverse-engineering are shown. I had to use MS Office 2003 for this project because I was working with several others who were using it.
I would also note that these announcements of support are limited subsets of the older formats, not full coverage of both old and new formats. When you look at the ECMA document, do you really think that your average field IT support people can throw together something that can handle those formats? That is what archival is about. It is not just about vendor X supports it, but the field IT person who will have to work with it twenty years from now after the vendors have found something else to attract their attention.
I also do not see all of the legacy garbage encased in MS/ECMA-OOXML as being needed. Everyone knows that opening and manipulating data in old file formats is mostly a matter of filters, assuming that the underlying functionality is available. I would urge Microsoft to listen to one who both uses and supports users of office document files, including older versions of WordPerfect and Microsoft formats since about version 6. The thing to do is openly-specify the legacy behaviors and then sit down with OASIS and China to merge the formats into a single, vendor-independent file format.
Users do not need to replay VHS vs. Beta. Users already have that situation in the BlueRay vs. HD-DVD format battle, so there is no need to repeat it in the office applications space. Users want a choice, but they want a choice of applications and vendors, not file formats. Who wants more of the "I can not open your document, can you send it in XYZ format?"
Its true that as long as you have all the bits that constitute your document, you can theoretically retrieve all information in it, even if it is a very difficult task in practice without the document creator’s help. True, it is possible to reverse engineer the format. As Governor points out, Google did it, so did OpenOffice.org. Its difficult and fill with legal landmine that would not be there with a free and open format. Can any single company use legal instruments such as IP laws to stop governments from viewing documents in the future? No. Government has the right to take this legal instruments away. Don’t believe me? see what happen to AIDS drugs in developing country. Moreover, the backslash generated by such a move will probably bury any company who dares. However, as in AIDS drug, the most likely outcome is government or private company has to pay a small sum to the document creator to view the document. Some call this a tax and believe this is unacceptable.
So says CyberTech Rambler. In the Third World, that may be likely. In the US, our government is too beholden to commercial interests of large companies to ever even think of something like that. Our government is more likely to buy full licenses and grumble about the terms than it is to actually do something about it.
Do you disagree? Look at what happened recently in Florida, Minnesota, and California, where out-of-state commercial interests have sought to stymie the efforts of the states' own residents for a fully-open, vendor-neutral format for their state documents.
As I noted above, neither Google Docs, nor OpenOffice.org, nor any of the other office applications (whether online or local) have fully-decoded the older file formats yet. Therefore, any argument that reverse-engineering will solve the interoperability puzzle is provably false.
Finally, the argument that MS/ECMA-OOXML will “introduce competition back into the document market” is not based on understanding the purposes behind the formats. Microsoft has used its file formats as a way to ensure monopoly status for many years. Does anyone seriously think that this is not the reason they continued with the format when they had a simpler, more flexible, more interoperable format available in ODF? Is this the reason for the Men in Black?