The documentation system tenet of automated content generation states that the documentation system should aim to generate content in a hands-free, mechanical fashion, whenever possible. This tenet directly supports the principle of generative content by generating content automatically from applications, files, and data.
This tenet shares similarities with the tenet of composability, as well as embedding and blending. The difference is that rather than selecting document parts, or querying an external source within the confines of a content component, we are able to generate brand new content, document views, or even entire bodies of documentation.
We can reason about generative content based on the nature of the source content, which gives us the following fundamental use cases:
Letâs look at each of them.
Document-centric content generation involves the generation of content based on content that is already expressed in textual form. A basic example is that of document merging and consolidation.Â
Say for instance that in a large project there are a series of documents produced in various formats and by different peopleâoutside the realm of a central documentation platform. We can create an automation process which selects and combines all of these documents so that all of the projectâs moving parts can be easily grasped. Such a process may involve skipping cover pages, renumbering headings, expanding business terms, and removing formattingâto adhere to many of the tenets we have elaborated upon such as that of consistent layout.
With the advent of LLM technology, content may also be generated by summarizing large bodies of text, or by generating content that fits predefined topics. For example, the top questions asked to a customer care center can be formulated as LLM prompts to generate relevant content for training or scripting purposes.
Nearly all modern business applications store data in structured databases, and allow interaction via APIs. All such data can be used to generate documentation in a programmatic fashion. For example, a billing systemâs database (or API) can be queried to create documents that describe each of the tariffs and discounts defined in them, without having users maintain parallel âpaperâ versions of details that are already natively crystallized as structured data.
We may also want to set up structured data for the objective of content generation in a purposeful manner. This is the approach taken by business intelligence teams for the generation of business reports. In this case, a specific data structure is agreed upon for the generation of various tables and infographics which are then embedded on a dashboard or business report document.
While there are a number of tools to generate visualizations for statistical information such as Matplotlib, there is also a healthy ecosystem of tools to generate business analysis and software engineering artifacts such as flowcharts, class diagrams, cloud infrastructure architectures and so on. For example, Graphviz can generate most âboxes and arrowsâ diagram types from data, avoiding manual âdrawingâ of information. Such an example is presented at the beginning of this article.
In the case of embedding and blending we usually have a main body of text in which code snippets are embedded. This approach is suitable for tutorials or guides but not for reference documentation.Â
Whenever documenting APIs including web services, methods, functions, commands, and so on, it is preferable to generate the entire body of documentation from the relevant codebase.
Most programming languages have an associated tool to generate HTML-based documentation, and most APIs are documented using the OpenAPI specification, which is normally rendered as HTML using tools such as SmartBear SwaggerHub. However, we often want greater control over key aspects in the documentation generation process, especially if we want to adhere to the tenets expressed in this book:
Now, it is worth noting that code and other forms of semi-structured data require some more effort than simple structured dataâfor the purposes of content generation. For example, extracting content from source code may require the use of an off-the-shelf parser, or writing our ownâin the case of an obscure language or file format. Naturally, we can also use vanilla text manipulation primitivesâeven the grep commandâ for simple use cases such as that of extracting comments.
In a nutshell, for semi-structured data such as code we need to bring extra tooling to preprocess, parse, or decode the data source before it is in a sufficiently structured shape that facilitates the generation of documentation.
In the case of embedding and blending, we may embed a video in a document, its transcript, or both. We may also generate top-level documentation from a library of multimedia content. For example, when integrating with videoconferencing applications, we can organize the content by title, participants, time, and the topics derived from each transcript.
© 2022-2024 Ernesto Garbarino | Contact me at ernesto@garba.org