Design Patterns for Production NLP Systems

This post is an excerpt from the final chapter of our upcoming book on Deep Learning and NLP with PyTorch. The book is still a draft under review so your comments on this section are appreciated! 

Production NLP systems can be complex. When building an NLP system, it is important to remember that the system you are building is solving a task and is simply a means to that end. During system building, the engineers, researchers, designers, and product managers have several choices to make. While our book has focused mostly on techniques or foundational building blocks, putting those building blocks together to come up with complex structures to suit your needs will require some pattern thinking. Pattern thinking and a language to describe the patterns is a “method of describing good design practices or patterns of useful organization within a field of expertise”. This is popular in many disciplines (Alexander, 1979), including software engineering.  In this section, we will describe a few common design and deployment patterns of production NLP systems. These are choices or tradeoffs teams often have to make to align the product development with technical, business, strategic, and operational goals. We examine these design choices under six axes:

1. Online vs. Offline systems: Online systems are those where the model predictions need to be made in real-time or near real-time. Some tasks such as fighting spam, content moderation, etc. by their very nature require an online system. Offline systems, on the other hand, don’t need to run in real-time. They can be built to run efficiently on a batch of inputs at once and can take advantage of approaches like Transductive Learning. Some online systems can be reactive, and can even do the learning in an online fashion (aka online learning), but many online systems are built and deployed with a periodic offline model build that is pushed to production. Systems that are built using online learning should especially be sensitive to adversarial environments. A recent example of this was the (in)famous Twitter chatbot Tay which went astray and started learning from online trolls. As hindsight wisdom expected, Tay soon started responding with offensive Tweets and its parent company, Microsoft, had to shut down the service in less than a day after launch. A typical trajectory in system building is to first build an offline system, move that as an “online” system with a lot of engineering effort, and then make it an “online learning” system by adding a feedback loop and possibly changing the learning method. While such a path is organic in terms of complexity added to the code base, it can introduce blind spots like handling adversaries etc. Figure 9.2 shows the “Facebook Immune System” as an example of an online system that detects spam (Caveat: circa 2012. Not a reflection of current Facebook infrastructure). Notice how the online systems require a lot more engineering than a similar offline system.

Figure 9.2: Facebook Immune System, an example of an online system, deployed to fight spam and abuse. Figure courtesy: (Stein et al, 2012).

2. Interactive vs. Non-interactive systems: Most natural language systems are non-interactive in the sense that the predictions solely come from a model. In fact, many production NLP models are deeply embedded in the Transform step of “Extract-Transform-Load” (ETL) pipeline of data processing. In some situations, it might be helpful for a human to be involved in the loop of making predictions. Figure 9.3 shows an example of an interactive machine translation interface from Lilt Inc., where models and humans are jointly involved in prediction making in the so-called “Mixed-Initiative Models” (Green 2014). Interactive systems are hard to engineer but can achieve very high accuracies by bringing a human in the loop. 

Figure 9.3: A human-in-the-loop machine translation model in action. Figure courtesy: Lilt Inc.

3. Unimodal vs. Multimodal systems: In many situations, it might be helpful to incorporate more than one modality in the learning and prediction process. For instance, it is helpful for a news transcription system to not just use the audio stream but also use the video frames as input. For example, a recent work from Google, dubbed “Looking to Listen” (Ephrat et al 2018), uses multimodal inputs to the solve the hard problem of speaker source separation (aka The Cocktail Party problem). Multimodal systems are expensive to build and to deploy, but for hard problems combining inputs from more than one modality provides signals that would be otherwise impossible to achieve with any single modality alone. We see examples of this in NLP too. For example, in multimodal translation, we can improve translation quality by incorporating inputs from multiple source languages when available. When generating topics for web pages (topic modeling), one could incorporate features extracted from the images contained therein in addition to the text on the webpage.

Figure 9.4: A multimodal system that utilizes both audio and video features jointly to solve a hard problem like the Cocktail Party Problem. Figure courtesy: Erphat et al, 2018

4. End-to-end systems vs. Piecewise systems: Since the advent of Deep Learning, another choice point available to researchers and engineers is to build a complex NLP system either as a pipeline of different units or as a monolithic end-to-end system. An end-to-end design is appealing in many areas like machine translation, summarization, and speech recognition where carefully designed end-to-end systems can significantly decrease the implementation and deployment complexity, and certainly cut down the number of lines of code. Piecewise systems break down a complex NLP task into subtasks each of which is optimized separately, independent of the final task objective. Subtasks in the piecewise systems make it very modular and easy to “patch” a specific issue in production but usually comes with some technical debt.

Fig 9.5: Piecewise Systems: Training pipelines of some of the traditional machine translation systems that treated the task of MT a sequence of subtasks, each of which with its own model (Figure courtesy: Hoang et al, 2009). Compare this with the Neural MT model discussed in Section 8.5.


5. Closed domain vs. Open domain systems: A closed domain system is optimized explicitly for a singular purpose to perform well in that domain. For example, a machine translation system could be optimized explicitly to work with biomedical journals — this would involve more than just training on a biomedical parallel corpus. Open-domain systems, on the other hand, are intended for general purpose use (e.g. Google Translate). For another example, consider a document labeling system. If the system predicts only one of the many classes it is trained on (typical case), it will result in a closed-domain system. But, if the system is engineered to discover new classes as it is running, it is an open-domain system. In the context of translation and speech recognition systems, closed domain systems are also referred to as “limited vocabulary” systems.

6. Monolingual vs. Multilingual systems: NLP systems built to work with a single language is called a monolingual system. It is easy to build and optimize a monolingual system. Multilingual systems, in contrast, are equipped to handle multiple languages. They are expected to work out of the box when trained on a dataset for a different language. While building a multilingual system is attractive, focusing on a monolingual version has its advantages. Researchers and engineers can leverage widely available resources and domain expertise in that language to produce high-quality systems that would otherwise not be possible with a general multilingual system. For this reason, we often find many multilingual products implemented as a collection of individually optimized monolingual systems with a language identification component dispatching the inputs to them.


1.  Christopher Alexander, “The Timeless Way of Building”, Oxford University Press, 19792. Hieu Hoang, Philipp Koehn, and Adam Lopez, “A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation”, in Proceedings of IWSLT 2009.3. Tao Stein, Erdong Chen, Karan Mangla, “Facebook Immune System”, SNS 20114. Spence Green, “Mixed-Initiative Language Translation”, Ph.D. Thesis, Stanford University, 20145. Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, Michael Rubinstein, “Looking to Listen: A Speaker-Independent Audio-Visual Model for Speech Separation”, SIGGRAPH 2018