Document wars— A new hope and an old friend
Genealogists and historians are well aware of the problems with finding and translating old documents. When information technology is added to the mix, the problems become harder because of the number of different possible file formats. Unfortunately, in the IT world “old documents” may be as little as fifteen years old, often less than the lifetime of a manufactured product.
Genealogists and historians are well aware of the problems with finding and translating old documents. When information technology is added to the mix, the problems become harder because of the number of different possible file formats. Unfortunately, in the IT world “old documents” may be as little as fifteen years old, often less than the lifetime of a manufactured product. Fifteen years ago, Microsoft Windows 3.0 was the most common operating system, Word Perfect was a commonly used document format, and 3
There is a constant struggle in IT departments to update archived files that are still needed or that may be needed for patent or other legal purposes because of these format problems. Often these documents include manufacturing documents, such as recipes, production records, and material and personnel tracking information. The IT industry is addressing document conversion problems through formatting standards. IT fights over standards are often public and noisy, but the latest fight over document standards is setting a new high mark and is taking place in the ISO/IEC standards arena. The contenders are the Open Document Format (ODF), Microsoft's Open Office XML (OOXML), and Adobe's Portable Document Format (PDF) format. All three of these format standards are winding their way through the standard's communities.
ODF is currently defined in the ISO/IEC 26300 standard, released in 2006. It defines file format standards for word processing documents, spreadsheets, presentations, graphics, and mathematical equations. The ODF standard is supported by several open source efforts and a key contributor to ODF is Sun Microsystems. Despite the effort to make ODF an ISO/IEC standard, it has not yet become a widely used format.
Microsoft submitted the 6,000-page OOXML specification in January 2007 to an ISO/IEC Joint Technical committee for consideration as ISO/IEC 29500. OOXML defines a document container for specialized XML-based documents that roughly correspond to the file types available in MS-Office (such as documents, spreadsheets, presentations, and graphics). The OOXML standard was submitted following the fast-track procedure, raising serious doubts by several national committees that the substantial document could be adequately reviewed in the time available. Overlap with the ODF standard was also raised as a serious issue.
While having an editable format for archived documents may be important, it is often less important than having a printable and searchable format. This is why the PDF format developed by Adobe Systems should be included as a possible format standard. The PDF standard is already the de facto standard for saving printable and searchable files, and it is also in the processes of becoming an ISO standard. The PDF standard has been publicly available since 1993 and is a stable and well-tested format.
With all of these options, deciding on a document-archiving format can be difficult, but there are three simple rules that should give you the best chance of being able to read and search documents far into the future. Rule 1: if the document only has text, archive it as a text file. Rule 2: if the document must be edited, then archive it in the latest version available from the tool. Rule 3: otherwise archive the document in a PDF format. The ODF and OOXML formats have not yet shown the staying power of text and PDF. They may allow you to save editable documents, but that is often less important than having printable and searchable documents, especially for archived documents.
Dennis Brandl is president of BR&L Consulting in Cary, NC, firstname.lastname@example.org