Posts filed under ‘XML’
Texas is once again considering legislation that would select open standard data formats (and software that could use them) for state-produced documents.
Michael Coté, an Austin-based technology analyst specializing in open-source software, said tying government documents to a proprietary vendor creates the risk that those files may be unreadable in the future as software evolves and companies go out of business. Open-source formats such as OpenDocument are “vendor-neutral,” meaning they work with multiple programs and can more likely be accessed in the future, Coté said.
“If the Constitution was in WordPerfect 5.1 format, it would probably be difficult to read right now,” Coté said, referring to an obsolete but once widely used word processing format.
Veasey filed a similar bill in 2007. It got national attention from technology journalists and bloggers but went nowhere largely because of aggressive lobbying by Microsoft, he said.
At a hearing on the bill then, Microsoft national technology officer Stuart McKee described it as anti-competitive and warned that it could be the equivalent of the state “picking Betamax when everyone else goes with VHS.”
Passage is far from assured, of course. Many people freak out at the thought that they might lose the ability to use the leading proprietary office suite. On top of that, the leading proprietary vendor has an accomplished lobbying team that seeks to crush any move that could threaten their dominance. I also think that talking about it in terms of licensing costs is the wrong issue. The state will want to buy StarOffice or WordPerfect for the support, rather than just downloading OpenOffice. So licensing costs may go down,
Personally, I think the bill’s opponents are missing the point. A Microsoft that was totally committed to open standard data formats / file formats (not "open source" file formats) and network protocols would indeed face more competition. They might find it difficult to maintain some of their pricing and lose some market share. But this would not only benefit consumers and the states, it would make Microsoft a leaner, stronger, more nimble competitor. For me, at least, supporting such bills is not about trying to hurt Microsoft, because my ideal world has MSFT being one of a group of leading competitors. Ideally, my work environment would have two or three different vendors' applications, so that when a user got too frustrated with one product, we could just switch him/her to a different one.
I realize that a level playing field is scary to Redmondites. But this fear will be replaced by the same kind of thrill that athletes feel in the midst of a game. If I were Mr. Ballmer, I would come to Texas and say, “We’re planning to support ODF anyway. Go ahead and pass this. We think we can make our products good enough that you will choose us most of the time anyway.” Unless, of course, he really doesn’t believe his company' products are that good.
Powered by ScribeFire.
Ten years’ difference in age makes a big difference. I don’t hear from BT all the time, but I know he’s thinking about me (and waiting for my next visit to see his daughter, my granddaughter). In fact, I found out he’d moved and changed telephone numbers by sending e-mail about planning a visit. MJ, on the other hand, rarely misses a daily telephone call, and often does not know about things I’ve communicated via e-mail.
But this isn’t a book about raising young men. BT shared a link about a new Firefox plugin (for Windows and Linux, sorry Mac users) that enables users to view OOXML files in Firefox like a Web page. I have not yet installed it or tested it, but I wanted to let the readers know about it. The plugin is downloadable on Microsoft’s CodePlex site.
I should also mention OpenItOnline, which also works as a Firefox plugin. According to the site, “Open IT Online supports the following file types: *.doc, *.rtf, *.odt, *.sxw, *.xls, *.csv, *.ods, *.sxc, *.ppt, *.pps, *.odp, *.sxi, *.jpg, *.gif, *.png.”
In either case, try it out and see how it works for your needs.
By the way, I’m proud of both of these guys.
An ODF file is essentially zipped XML. This article from August 2007 shows how to use Python’s XML tools to get data contained with ODF files.
Thanks to Carol Geyer and OASIS for the link!
This is good news, because it means that even in the Microsoft-centric .Net world, ODF is gaining some traction. If you have a similar library for Java, REBOL, REXX, Python, Ruby, or even a C or C++ .dll or .so library, I would like to hear about it. In fact, if you have such a library written in any language, I would like to hear about it.
Apparently, someone had suggested that Malaysia adopt UOF instead of ODF as their XML-based open format. This article concludes that by supporting ODF, UOF support is included, and that therefore, no explicit selection of UOF is needed.
ISO 26300 can be translate and interoperate well with UOF documents today, so by adopting ISO 26300, Malaysia is will also have UOF support. It is not necessary to consider UOF today, because the efforts of harmonizing these two formats will guarantee future interoperability. Already in existence are free third party tools to translate between the two formats with good documentation on the differences.
Rob Weir continues his examples of areas where the Microsoft / Ecma International OOXML "standard" is designed to be too obscure for any competitor to implement it. Says Rob, "The Ecma Office Open XML (OOXML) specification seems to presuppose the existence of a Universal Translator of sorts." He then goes on to quote a part of the specification.
An alternative format import part allows content specified in an alternate format (HTML, MHTML, RTF, earlier versions of WordprocessingML, or plain text) to be embedded directly in a WordprocessingML document in order to allow that content to be migrated to the WordprocessingML format.
He notes that there are many different versions of each of these formats, but the standard does not specify which versions. An OOXML-compliant application needs to read all of the above-mentioned formats, without any knowledge of which versions to accept. Conspicuously missing from the list are standard formats like XHTML, DocBook, TeX, or ODF.
Andy Updegrove has begun listing some of the places where OOXML conflicts with existing standards. This is the time for standards groups to point out potential problems in the hope that they will be corrected in the final spec or (if it is not salvageable) the spec rejected as unacceptable.
I saw earlier today, where someone asked whether he should use ODF as a document interchange format. It was referenced from an AbiWord blog. I note that Ryan recommends RTF instead of ODF for document interchange, and Dom (the leader of the AbiWord project) says that RTF is just as good for that purpose as ODF. Having been in environments where multiple word processors were used and RTF was the supposed interchange format, I see problems there. Now maybe it was that the applications supported different versions of RTF, but there were significant and unexpected differences. I like the words Rob Weir used to describe RTF:
RTF – Rich Text Format is a proprietary document format occasionally updated by Microsoft. As one wag quipped, "RTF is defined as whatever Microsoft Word exports when it exports to RTF".
Dom, Ryan, I have to hand it to you. If AbiWord were my project, it would probably be sitting with all my other partially-completed projects. You have really succeeded in producing a small (light on the resource requirements), multiplatform word processor. You are very much respected, including by me. But in this case, I still disagree with you. ODF was designed to be used across applications and platforms, plus it has the advantages of being XML-based and having an open specification that is not controlled by any one vendor.
Today, I may send a draft of a document to a co-worker. Tomorrow, he may want to transform it into XHTML + CSS to put up on our Intranet. The next day, someone may want it in PDF format. The day after that, it may be used (with the proper transformations) with our to-do list / time management system. An XML-based format foresees ease of manipulation by software tools (such as Apache Cocoon), while still remaining human-readable. RTF, as you know, is a maze of backslash-quoted codes that is sure to deter most humans from trying to read the file's contents directly.
And while I agree that for most of the purposes that people exchange word processing documents, they would be better off exchanging either plain text or PDF. Even so, that isn't what people do, at least not yet.
Bryan Smith is writing a history of office suites, including a little bit about the current ODF vs OOXML controversy. I do not know whether he is just recording his recollections, or if he is actually researching the things he writes. In any case, it is good to have more input into what has happened in the past and what is happening now in the present. I suggest checking back over time, so that we can all see it as it progresses.
In a nutshell, XML was the answer the World Wide Web Consortium (W3C) gave to vendors who constantly complained about the slowness of their standardization process. XML is essential a set of standards for creating vendor standards for their implementations. XML has nothing to do with interoperability. It merely says how you should format your tags, how you should document your definitions, how you will define your schema, etc… Most “complete XML” documents are those that have a base “content” block (typically its own file), with one or more style blocks (or possible style templates), another block for modifications to the schema/templates, and then any base support definitions, templates, schema, etc… (which may or may not be included)
I very much disagree. XML was the W3C's answer to the "tag-soup" of HTML. Each browser maker had its own non-standard tags that were implemented in various sites. There were some sites where users had to either be using browser X or go away. In addition, sites were designed without separating the content and the presentation (the part that determines how the site looks), where the intent of HTML had always been that markup would be semantic, or based on the actual meaning of the content. They had tried with HTML 4 to begin reinforcing the idea of separation of presentation from content, but HTML was too forgiving of things like improperly nested tags and proprietary tags. In response, XML was designed to be strict in rejecting non-standard tags for the specific vocabulary (that is, markup language) that it defined, and to forcibly keep presentation out of the semantic markup of the document. In a round-about way, XML was designed to make it easy to create XHTML.
These things are not just my own ideas or recollections. The W3C's own documents support this interpretation.
The design goals for XML are:
- XML shall be straightforwardly usable over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs which process XML documents.
- The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
- XML documents should be human-legible and reasonably clear.
This specification, together with associated standards (Unicode and ISO/IEC 10646 for characters, Internet RFC 3066 for language identification tags, ISO 639 for language name codes, and ISO 3166 for country name codes), provides all the information necessary to understand XML Version 1.0 and construct computer programs to process it.
This is not explicitly saying that interoperability is the purpose, but it is still clear to anyone reading the text. Easy to write tools that process (read, write, modify) XML documents and easy for humans to read XML documents, for example, requires that it be clear that XML documents (and documents in XML-derived languages or formats such as MathML, SVG, XHTML, and ODF) should be readable and writable using a wide variety of tools, by a wide variety of vendors. In my opinion, this was one of the reasons for adopting namespaces. With namespaces, a document from the (X)HTML namespace can contain subsections that belong to the SVG and MathML namespaces. A file in ODF format makes use of namespaces in order to reuse such standards as (XML-based) SVG and MathML, so that multiple vendors can implement it. This is in accordance with the express and implied intent of XML. On the other hand, misusing XML to block off access by outside vendors, as OOXML seems to be doing, means that such non-standards as VML are used instead of standards. OOXML also uses binary blobs and all sorts of "legacy" features, which are mostly mistakes that were made in older versions of Microsoft's office applications.