A lot of misleading information about i4i v. Microsoft has recently appeared in the news. Many self-proclaimed experts have merely skimmed the original i4i patent claims and concluded that the patent covers both Extensible Markup Language (XML) and Standard Generalized Markup Language (SGML), and that the patent should never have been approved because it covers prior arts. Well, those “experts” are not quite right.
The i4i patent application clearly describes the differences between SGML and the i4i solution; in fact, the i4i patent contains a focused description of a specific method which has nothing to do with XML. This i4i patented method is actually used in one specific part of the MS Office 2007 data format. Microsoft misleadingly uses the term “Custom XML” for a document part — as we can see in this MSDN reference article — even though it is not XML at all. It is just a file — any raw file — that can be stored within the XML-based MS Office 2007 file format in addition to the main document parts. A detailed analysis of the root cause of this confusion follows.
The XML specification produced by the World Wide Web Consortium (W3C) describes a method for marking up structured documents using markup tags that specify the content’s purpose rather than its formatting. While the (optional) XML content formatting description may be present, it does not actually have to exist at all. If present, the content formatting description is stored separately from the structured content. Otherwise, the document interpretation is based solely on the XML markup tags which mark the content’s purpose.
Therefore, the XML document content is structured and mixed-up with purpose-descriptive markup tags, while — on the other hand — the i4i patented method explicitly keeps document content in a “totally unstructured” raw form. This characteristic clearly differentiates the i4i patented method from the XML method. The only similarity between XML and the i4i patented method is that both separate the formatting description from the document content.
Back in 2005, Brian Jones, Microsoft Office Lead Program Manager, wrote an article titled “Integrating with business data: Store custom XML in the Office XML formats.” This article described a means to “truly integrate your documents with business processes and business data,” and explained the role of XML Data Store in MS Office 2007 documents (formerly known as MS Office 12).
According to Jones, you can put a data file into the XML Data Store which “means that you now have a place to store any data your solution may need. The data will travel with the document, but will always be stored as a separate XML part in the ZIP package.” To utilize the data file, he explained “all you need to do is create a relationship from the main document part to your XML part.”
This “new” Microsoft “invention” is very similar to a more than decade-old technology patented by the Infrastructures for Information Corporation (i4i) from Toronto, Ontario, Canada. This company filed its patent application on June 2, 1994; and US Patent № 5,787,449 Method and system for manipulating the architecture and the content of a document separately from each other was granted on July 28, 1998.
The i4i patent describes a document format and method of encoding where the document content — stored in the “raw content area” — “is totally unstructured and has no embedded metacodes in the data stream.” The i4i patent further states that the document structure’s definition is described in a separate “metacode map” where “for each metacode applied to the content, an entry in the metacode map is created which describes the metacode and gives its position.”
Jones’ description of the MS Office document XML Data Store basically equates to the i4i patent’s description of the “raw content area.” Similarly, his description of the “relationship from the main document” to the data stored in the XML Data Store relates to the i4i patent’s metacode mapping.
To summarize, MS Office 2007 will let you include a file — any raw file — inside the main Office document. The file will stay in its original form and will not be changed by MS Office applications. Parts of the file may be displayed within the main document if you create an appropriate “relationship from the main document.” This is equivalent to the i4i patented mechanism, which explicitly describes the “raw content” and the “metacode map.”
These equivalent elements led the US District Court for the Eastern District of Texas to draw the right conclusion: “Defendant Microsoft is found to have unlawfully infringed U.S. Patent № 5,787,449.” The court injunction explains that it “does not apply” to “infringing and future Word products” which “upon opening an XML file, applies a custom transform that removes all custom XML elements.”