The State of Confusion

XML v. i4i v. Microsoft

A lot of misleading information about i4i v. Microsoft has recently appeared in the news. Many self-proclaimed experts have merely skimmed the original i4i patent claims and concluded that the patent should never have been approved because it covers both Extensible Markup Language (XML) and its predecessor, the Standard Generalized Markup Language (SGML). Well, the i4i patent application clearly describes the differences between SGML and the i4i solution, while XML did not exist when the patent application claims were written. Moreover, the i4i patent contains a detailed description of a specific method which is actually used in the MS Office 2007 data format. Therefore, for software patents proponents, this patent, and the recent court decision against Microsoft are well grounded.

XML v. i4i

The XML specification produced by the World Wide Web Consortium (W3C) describes a method for marking up structured documents using markup tags that specify the content’s purpose rather than its formatting. While the optional XML content formatting description may be present, it does not have to exist at all. If present, the content formatting description is stored separately from the structured content. Otherwise, the document interpretation is based solely on the XML markup tags that mark the content’s purpose.

Therefore, the XML document content is structured and mixed-up with purpose-descriptive markup tags, while the i4i patented method explicitly keeps document content in a “totally unstructured” raw form. This characteristic clearly differentiates the i4i patented method from the XML method. The only similarity between XML and the i4i patented method is that both separate the formatting description from the document content.

i4i Patent v. Microsoft’s “Custom XML”

Back in 2005, Brian Jones, Microsoft Office Lead Program Manager wrote an article titled “Integrating with business data: Store custom XML in the Office XML formats.” This article described a means to “truly integrate your documents with business processes and business data,” and explained the role of XML Data Store in MS Office 2007 documents (formerly known as MS Office 12).

The “Custom XML” term used in Microsoft literature does not mean a customized XML element tag which marks-up a block of text. For Microsoft, “Custom XML” is just another name for what Jones calls “XML Data Store” — a file added to a MS Office document.

According to Jones, you can put a data file into the XML Data Store which “means that you now have a place to store any data your solution may need. The data will travel with the document, but will always be stored as a separate XML part in the ZIP package.” To utilize the data file, he explained “all you need to do is create a relationship from the main document part to your XML part.”

This new Microsoft “invention” is very similar to a more than decade-old technology patented by the Infrastructures for Information Corporation (i4i) from Toronto, Ontario, Canada. This company filed its patent application on June 2, 1994; and US Patent № 5,787,449 Method and system for manipulating the architecture and the content of a document separately from each other was granted on July 28, 1998.

The i4i patent describes a document format and method of encoding where the document content — stored in the “raw content area” — “is totally unstructured and has no embedded metacodes in the data stream.” The i4i patent further states that the document structure’s definition is described in a separate “metacode map” where “for each metacode applied to the content, an entry in the metacode map is created which describes the metacode and gives its position.”

Jones’ description of the MS Office document XML Data Store basically equates to the i4i patent’s description of the “raw content area.” Similarly, his description of the “relationship from the main document” to the data stored in the XML Data Store relates to the i4i patent’s metacode mapping.

The terminology used in the Microsoft literature only adds to the confusion. The current Microsoft name for the XML Data Store is Custom XML. That is — according to the MSDN documentation — “a custom XML document part (file)” stored in an Office Open XML package or, in other words, in the MS Office 2007 document format.

To summarize, MS Office 2007 will let you include a file — any raw file — inside the main Office document. The file will stay in its original form and will not be changed by MS Office applications. Parts of the file may be displayed within the main document if you create an appropriate “relationship from the main document.” This is equivalent to the i4i patented mechanism, which explicitly describes the “raw content” and the “metacode map.”

These equivalent elements led the US District Court for the Eastern District of Texas to draw this conclusion: “Defendant Microsoft is found to have unlawfully infringed U.S. Patent № 5,787,449.” The court injunction explains that it “does not apply” to “infringing and future Word products” which “upon opening an XML file, applies a custom transform that removes all custom XML elements.”

The meaning the court has applied to the “custom XML” term is clearly picked up from Microsoft’s MSDN literature.
End Software Patents: “Software patents are a fiction.”

Debian Conference in Portland, Oregon

Nina Paley tribute to EFF

Creative Commons License