Microsoft Word Document


A Microsoft Word Document is a paper-oriented document encoding format used by the Microsoft Word application. Microsoft Word files typically end with the extensions .doc, and .docx. Microsoft Word documents are primarily paper-oriented but may be displayed in various modes in which the notion of pages is abstracted away.

The original binary .doc format has evolved substantially since since the early versions of the word processor for Xenix and MS-DOS in 1983. The current family of formats uses the .docx extension and consists of a zipped folder which contains XML files, among many other resources.

Example

The example below shows a simplified and abridged version of word/document.xml, which encodes the main document within a .docx file:

In the above example, the link pointed by <hyperlink r:id="rId20"> is stored in a separate file, word/_rels/document.xml.rels as follows:

The .docx format is complex and hard to generate programatically from scratch without the aid of a specialized library. From a DocOps perspective, Microsoft Word Documents are usually treated as a render target. That is to say, documents are authored in a different format and then rendered to Microsoft Word.

The example has been generated using Pandoc from the markdown version.


© 2022-2024 Ernesto Garbarino | Contact me at ernesto@garba.org