By Amir Feizpour, Royal Bank of Canada
This article is based on the transcript- with some augmentation- from the “Machine Learning in Production” panel discussion held at Toronto Machine Learning (micro)Summit I hosted in July 2018. The panelists are Inmar Givoni (Uber), Solmaz Shahalizadeh (Shopify), Vincent Wong (Dessa), Amir Hajian (Thomson Reuters), and hosted by Amir Feizpour (Royal Bank of Canada).
Motivation
It is one thing to build machine learning models that are interesting on an experimental and exploratory level but it is another thing to build models that drive action and disrupt your business. Many teams use machine learning as decision support divisions: the data scientists analyze data and generate ad-hoc insights that a manager uses to manually make decisions. There are many challenges with this model, including lack of scalability and efficiency, and inability of data science teams to measure their generated value. Without a value measurement system, data scientists might not be able to justify their existence and, more importantly, won’t be able to self-correct and improve their practices. In order to disrupt business, machine learning models must adopt a product-focused approach, which is a much more significant undertaking.
End to End Pipeline
For a product-driven approach to use machine learning, it is important to think about the problem you are trying to solve from the beginning and to have some initial idea of how the machine learning solution might be used. “It can’t be the case that you say I’ve learned this really cool machine learning technique and I want to find where I can use it,” said Solmaz. The first step is to understand what pain points you are trying to tackle, and what kind of service-level agreement in terms of quality, availability and responsibility you need. This guides the overall structure of the product,what approaches your might take, and what constraints exist. “For example, you could decide what trade-offs you want to have in terms of reliability versus availability given that you are working with a machine learning problem,” said Solmaz. Based on the business needs and other requirements, the delivery patterns- batch, real-time, or streaming- and infrastructure can be decided.
Other than the exploratory phase - feature engineering and machine learning model exploration and selection - the rest of the pipeline is no different from typical software products and therefore all the good practices like DevOps and cookie cutter project structures can be used and, to a large extent, automated. Once you know the scope and the constraints of the problem, you can start playing with the data. “Often you need to start by getting a sense of data and sometimes that means more boring tasks like building reports and dashboards, and then identifying features that would be used for say a predictive model that we prototype,” said Solmaz. The goal of this stage is to determine, as quickly as possible, if you understand the problem sufficiently and if you can come up with a solution. Jupyter notebooks are a great medium at this stage and you can do rapid prototyping before you have to think about everything else that needs to be taken into account. However, if any piece of the exploratory work becomes needed for the final product then “you need to enforce all of the software engineering practices like unit and integration testing and deployment and performance tests to ensure that the data pipelines are reliable and meet necessary standards as early as possible,” said Solmaz. At this stage, it is also important to know what exceptions might appear and how to handle them without a need for manual intervention.
Once the design stage is concluded, the ingestion, feature preparation, modeling scripts, and their corresponding tests can be promoted to a test/QA environment, before going to a production environment. Having everything run in production is the ultimate goal and the limitations of that environment sets the constraint for other stages. For example, if the production environment does not support software packages or hardware that your model relies on, using those in the design stage is not warranted. In some cases, there is no need for re-training models in the production environment which means only the requirements of the inference part need to be considered. A good practice is to obtain the production environment limitations at the beginning and limit all experimentation based on those. Using containerized solutions is one way to achieve consistency here (see Docker, Kubernetes, OpenShift).
Once the product is live, it is important to monitor the pipeline. This includes the machine learning model for problems like drift in the input data distribution or instability of inferred scores or clusters. Another important piece to monitor is the interaction of the end-user with the product. This allows you to measure the value generated and to self-correct the pipeline if needed.
Having the right team
In order to achieve a highly efficient end-to-end pipeline, you need to create a multi-faceted team and establish the right dynamics. “You need to have people who understand architectures, integration, production, testing and monitoring working alongside the people who focus on modeling,” said Inmar. Depending on the complexity of the projects and the availability of resources, one person might take all these roles, aka a full-stack data scientist, or they might be taken on by multiple people. However, you should have automated data collection about the user behavior throughout your product. Solmaz believes that “data doesn’t always tell you why people are doing stuff and therefore you should have a UX research team to help you understand the user behaviour.” Finally, you need product owners and domain experts who have enough appreciation of the technical side of things but also can help you decide and refine what product features you need.
Product Design
The product features that you need, and therefore the data that is required to be used, is guided by what problem you are trying to solve for the end user. This needs to be clarified in the problem definition stage through user interviews. Amir points out that “unlike the traditional way of building products where your R&D team is sitting somewhere far from the customers what you should do when you build a machine learning solution is to put the user in the center of your design.” This should include user interviews when you design your solution, but also frequent opportunities for the users to interact with your solution as it is being built or you might end up building something that doesn’t solve any real problems. Sometimes talking to domain experts can be a good proxy for what users want but it is crucial to talk to the end users directly as well. A typical mistake is that you might think that you are the user of a certain service or product and therefore you know what it should be like. However, it is important to realize that you are not the “typical user” in most cases and therefore still need to hear people out before you know what you need to build. Inmar said that you can’t say, “I buy books, therefore I understand people who are going to buy books.”
In many cases your data science product and the context it is placed in are very dynamic and you need to design accordingly. These dynamic designs need to be done “in a non-invasive way in order to avoid disturbing the workflow that the user is experiencing,” said Amir. As the machine learning solutions are built on the patterns found in data, any changes in it, including user behavior as a result of the introduction of the product, means that you might have to retrain/redesign the model.
Another important design decision is determining how to collect the right data, especially when you are dealing with a cold start problem which can exist on many levels. For example, you might not have any data at all, or you have some data but no relevant/reliable labels, or you have all the data you need and have built a model and have evaluated it on historical data but need to know how it performs in real life. In fact, Amir thinks that “the people who build the most successful products (and get rich) are the ones who design the most creative ways to collect data combined with innovative machine learning models.” Amir introduced the concept of minimum viable machinery which is a pipeline that tackles the simplest relevant problem, often using a rule-based or simple linear model, which you can solve accurately as an approximation to the real problem you are trying to solve. Using that model, you can complete and test your end-to-end solution, and if you have done your homework in identifying a user problem, this solution can solve that problem to some extent. Once you have a product with some basic functionality in the hands of the users, you have the opportunity to gather explicit and implicit feedback from the users to improve your product. The design of the pipeline has to be flexible enough that you can substitute in more complex machine learning solutions as soon as you have enough information.
Taking the minimum viable machinery approach also allows you to establish a baseline for your work. Inmar believes that “usually the biggest bang for the buck could be achieved with the most basic algorithms within an end-to-end solution and then you can measure it using A/B testing or by evaluating the monetary implications of your model to make incremental improvements.” She also thinks that “establishing that baseline also informs your decision about the necessity of investment in more complex models given that they potentially would need more data and computational resources.” This can also help you establish trust with your stakeholders. Solmaz warned us that “in many organizations it is the first time that machine learning is being tried so there is this doubt if it actually works and it is important that those few first projects to make sure they get some sort of result.” Adding complexity to a baseline that is already providing value can be more straightforward, justified, and impactful.
Scalability
Once you establish what product you need and have built it once, you need to think about making it scalable in various aspects. This requires the right operating model and has to address data, model, infrastructure, processes, and human levels.
The first step that one needs to think about is data preparation. “Once the pipeline is ready, new data is going to flow in and you need to maintain and store in an efficient way rather than the painful manual process that you had to do when you started,” said Inmar.
The next step is to think about the complexity of the model you are using. “Say you have a very big deep learning system; you need to spend a lot of time scaling up the training part and making it as streamlined as possible and also to scale up the inference part to ensure that you can output values within the required timescales,” said Inmar.
Another aspect is infrastructure and platform where you need very little to no friction between your design, test, and production environments so that you can spin up a sandbox for your exploratory work as fast as possible and push your insight into production with no manual work (see DevOps for Data Science, for example).
Then there’s the human aspect. Here, you need to think about the right roles and the division of labor among those roles so that everyone can do what they are strong at and also the problem remains interesting and challenging for different roles in the team (see here, for example). The team needs to have a mindset of scale and have best practices and processes to enable them to quickly overcome problems and get results. If there are multiple hand-off stages and many manual steps then you can only scale your impact linearly with the size of your team (at best). “It is a good practice to have machine learning researchers or data scientist and engineers part of the same team with division of responsibilities to avoid building things that can’t be used in production or having people spend time on things that are not their strong suit,” said Inmar.
One of the dangers of thinking about scale from the beginning is that you might end up over-engineering your product; “one thing to keep in mind from the beginning is having a system where you can do fast experimentation, where you can easily plug in new models, so that you do not lock yourself into a particular model or approach,” said Inmar.
Finally, you need to be continuously looking for the bottlenecks in your pipeline and thinking about prioritizing which to tackle and how.
Machine learning at scale and legacy
A lot of what we talked about so far applies to the situation where you have autonomy to decide the structure of the team and the infrastructure you use. Tackling implementation and scaling issues is more straightforward with all features that public clouds offer. However, not all companies have that luxury, for example larger enterprises that have regulatory limitations about where their data can be or smaller ones that have limited capital and operational budgets. In those scenarios what you can do is limited by the organizational and legacy considerations and platforms. Sometimes “you have to deal with clients that have their data in good shape but there’s a lot of fragmentation that comes from different silos and you have to engineer to get the datasets together,” said Vincent. Another aspect to consider is that these companies have been doing things in certain ways for a long time and the domain experts in these companies really know a lot about their subject matter. “You need to be extra transparent and extra humble about how you navigate these companies because you’re dealing with subject matter experts who are amazing at what they do and can complement your engineering and deep learning expertise,” said Vincent.
Sometimes managers at these companies are overly excited and expect you to do magic for them and you need to have that humbleness to listen to their problems and transparency to do careful feasibility analysis and give them the best recommendations. Educating technical and non-technical people in the company about the possibilities presented by machine learning can also provide the right context for successful work without which good ideas face a lot of resistance and expectations are set at wrong levels. “Education is both ways actually. For example, if you work with good product managers who know the domain and industry well and your first model can basically be understanding what’s going on in their brain and just automating that,” said Solmaz.
The video of the panel discussion:
Amir Feizpour, I’m a senior manager of data science at RBC and part of a science team that works as a central data science consulting team across the enterprise. We build data science models and put them in production. I’m interested in figuring out ways that we can build machine learning pipelines that easily integrate into the business processes.
Inmar Givoni, I work as an engineering manager at Uber Advanced Technology Group which is the self-driving division within Uber. I did my PhD at the University of Toronto specializing in machine learning. Previously, I worked at Kobo on things like recommendations, search, NLP on the text of the books, experience optimization and so on. Afterwards I led the software and machine learning groups at Kindred which is a company that is building robotic solutions for operations in warehouses. Now at Uber my focus is on productionizing the algorithms that come out of a research team on the cars and the software stack.
Solmaz Shahalizadeh, I’m currently the vice-president of Data Science and Engineering at Shopify. My team is distributed across multiple data-driven products, such as order fraud detection, Shopify capital, forecasting future sales, and logistics and fulfillment. I completed a Masters in machine learning, and a Masters in bioinformatics where I used predictive models to predict the outcome of breast cancer. During my degrees, I got a sense of joy and frustration of real solutions that work outside of lab and outside of your own shell script. After that I worked at Morgan Stanley before joining Shopify in 2013.
Vincent Wong, my background is in finance and have worked at most Canadian banks in technology solution roles. I am co-founder at Dessa and what we do is end-to-end machine learning solutioning for various enterprises. How we got into it was frustration over seeing things built and then getting thrown over the fence with no output or business value.
Amir Hajian, I’m an astrophysicist and I did research in different universities after I got my PhD. My official title is director of research at Thomson Reuters labs but in practice I am a full-stack data scientist where I’m involved in ideation all the way to actually seeing it in the hands of of the users. Over the past two years I led three products from ideation all the way to production and I did at least a dozen more that did not go to production, from which we learned a lot about what to do when you build machine learning and what to avoid.
Related: