Why Give Up Word?

Friday, 2007-January-26 at 09:58

Mark Eli Kalderon continues his series of articles where he discusses why he now does most of his writing with specially-formatted plain text files (LaTeX and Markdown).  In October, he discussed changing, proprietary file formats and data corruption.

A Word document is a proprietary binary. Moreover, the proprietary format is subject to change over time thus allowing Word to add new features. It is possible to convert older forms of Word documents to newer forms, but the conversion is lossy and binary files are subject to corruption. Eventually, I reached the point where I was losing data. Thus, for example, I no longer have a copy of my dissertation (completed in 1995). Clearly, Word is not an archival format.

He follows up by discussing the power of plain text.

Plain text files are less prone to corruption than binary files. Moreover, a corrupt binary file easily results in total data loss. A corrupt plain text file can be recovered at least in part. Moreover, a particular binary format associated with a particular application can change over time, also resulting in data loss

In November, he continues his series of reasons for using plain text files for most of his work, using a quote to point out that complex documents are often difficult to create in office software, but comparatively easy with tools like LaTeX.  Finally, yesterday, he wraps up by discussing office application file formats.  He concludes, "While I no longer word-process, if I did ODF would be the way to go. Almost every major word-processor now supports this open standard, and many that don’t are planning to implement it."

I will be checking back occasionally, because he seems to understand that many times, the office application is not the appropriate tool.  Where I am employed, it is common to see highly-paid engineers spending all day trying to get a document to fit on one page in Word or Excel, while still carrying the salient points of their analyses.  This is simply a misuse of the tools—office applications are not made for those uses—but that is the way the documents must be turned in.  One of the irritating things about the modern office suite paradigm is the way we spend most of our time "tweaking" our documents to get the right appearance, while cheating the content (the important part) of our attention.

One thing that is not really being trumpeted here: converting from one binary Word format to another binary Word format is almost always a lossy process.  There is typically a loss of features and formatting built into the process.  This will happen if your files are converted to RTF, OOXML, or even ODF.  It is hoped that such losses can be minimized, but you have to recognize that they will happen.  Converting all of your data into OpenDocument Format (ISO 26300, OASIS ODF) means that these losses need only happen once.  Using some proprietary format (even one that pretends to be open, such as OOXML) means that you are exposed to potentially losing more information when you have to again convert (when competitors come too close to fully implementing the format and Microsoft switches to something else).

Using such tools as TeX/LaTeX, Markdown, and (not surprisingly) ODF, documents and files can be transformed easily into a multitude of output formats as needed.  In many cases, these tools are zero-price.

Changing file formats from binary .doc, .xls, and .ppt to XML-based .odt, .ods, and .odp will not directly change the problem of tool misuse.  However, the simple, XML-based formats will make it easier for a chain of tools to extract the relevant information for massaging and processing apart from the traditional office hierarchical review environment.  (Worker bee creates document, forwards to Worker's supervisor bee for review, who returns it with some suggested changes.  Worker be makes changes, sends document back to supervisor bee, who forwards it to manager bee, and so on.)

I guess this is the beginning of a discussion about using the right tools for the job, but let us return to the topic of immediate importance.  Word (or Excel, etc) is not sacred or infallible.  It comes out of the Seattle area, not the Vatican.  It is entirely possible, especially for larger enterprises, to implement attachment-stripping and plain-text-only e-mail, so that clients and suppliers begin to extract relevant data and place it (in plain text form) into e-mail messages.  This may be done to help relieve your overloaded virus-checkers, for instance, but it has the additional benefit that many of the things your enterprise currently uses Word (Excel, etc) to perform will no longer be needed.

In any case, if you begin by moving to ODF (ISO 26300) file formats instead of OOXML formats and the current binary formats, you will help make your data longer-lasting, so that you will one day be able to go back and review contracts signed a decade past and see them in their original format.  If you take that first step, your company will be free to move in any chosen direction in the future.

Entry filed under: ODF.

New ODF Accessibility Tools Family Matters

RSS Slingshot

  • An error has occurred; the feed is probably down. Try again later.

RSS Unknown Feed

  • An error has occurred; the feed is probably down. Try again later.

RSS Unknown Feed

  • An error has occurred; the feed is probably down. Try again later.

RSS Owner Managed Business

  • An error has occurred; the feed is probably down. Try again later.


Blog Stats

  • 599,455 hits

Top Clicks

  • None


%d bloggers like this: