Everything is a Model

TLDR: I review a recent systems paper from Google, why it is a wake-up call to the industry, and the recipe it provides for nonlinear product thinking.

Here, I will be enumerating my main takeaways from a recent paper, “The Case for Learned Index Structures” by Tim Kraska, Alex Beutel, Ed Chi, Jeffrey Dean, and Neoklis Polyzotis that uses machine learning models as alternatives to well-known data structures like trees, hashmaps, and Bloom filters. The paper pleasantly surprised everyone in the community by showing that these model-based approaches provided an alternative that was not only faster but also memory efficient. My bet is this work will receive either the best paper award or a keynote invite at the upcoming VLDB, SIGMOD, and other database conferences, as it is waking up database researchers to whole new areas of database research.

I contend that the lessons from this paper reach beyond the fields of databases, and machine learning. On Twitter, I noted how this paper is an example of machine learning permeating into everything we know. Chris Manning said this paper “ate Algorithms in one large bite.” While these are valid observations, the truth needs more unpacking than what’s possible on Twitter. So here we go:

1. All data products can be optimized knowing statistical properties of their dataThis sounds like a no-brainer for the last decade of data-driven product building and optimization folks used to A/B testing, for example. But the real beauty of the Kraska et al. paper is taking this to the extreme and going inwards into the basics of computer science, and challenging assumptions.  As with any data-driven optimization, doing so on basic data structures themselves surprisingly yield better alternatives — one of the findings in this paper. Of course, this “better” is well qualified in the paper (see bullets 2 and 3).

2. Everything is a modelThe paper begins with a bold assertion: “Indexes are models” (btw, is it indices or indexes?). They further go on to say, “an index is a model which takes a key as an input and predicts the position of the record.” This comes from a simple yet beautiful idea that the position of an item in a sorted sequence is proportional the CDF of the data. So in effect, we can “predict” the position of an item from an index by learning its CDF. So indexes are models.  But I want to assert that “everything is a model.” Even in the paper, they show this idea extends beyond indexes to hashmaps and Bloom filters. Anywhere we are making a decision, like using a specific heuristic or setting the value of a constant, there is a role for learned models. In ML alone, we have seen this at play in other forms. The model specification is a heuristic. AutoML and other model discovery efforts replace this heuristic with the outcome of another model. All the work on hyperparameter optimization is an example of a heuristic (constant) being replaced with the outcome of a model.

3. Rules are models and models are rulesIt’s easy to get carried away with the “everything is a model” assumption. The best part of this paper, in my opinion, is authors are very careful in not saying “here’s our cool idea, here are our models, and here are our results.” Instead, there is a very careful dialectic on hardware/software interactions and when/why building a model makes sense. It is useful to look at when to “model” using rules and when to learn — it all depends on how hard the space you’re learning is and your hardware constraints. Section 3.3 on Hybrid Indexes is a good illustration of this. The space of keys is partitioned recursively, and each partition is handled by a progressively refined model. By acknowledging some parts of the input distribution could be hard to model, the authors fall back on the heuristics (a B-Tree in this case) and rightfully so.

4. New hardware like TPU V2 and Graphcore can bring in unforeseen disruption in the industryAt NIPS 2017, I saw a Graphcore demo showing how they performed 3x faster than Nvidia Volta. Google’s Tensor Processing Unit (TPU) powered AlphaGo but only did inference (prediction). The new TPUs (TPU V2) is built for both training and inference, giving an insane performance and driving down experimentation time, and shorter time-to-market. With these new devices, things that were once expensive to run, no longer are. As a consequence, these new hardware units are enabling new kinds of products that will cannibalize other products that came before them. According to the authors, “ML hardware is at its infancy. ” Building your machine learning infrastructure today might feel like walking on shifting ground. Walk quickly and be alert. Also, see bullet #9.

5. Piggyback on the software that works with new hardwareIf a new hardware makes something previously difficult or impossible possible, then you are losing to your competitors by not adopting these new hardware and software improvements. This has unforeseen second order effects. For instance, in the ongoing battle of the machine learning frameworks, it doesn’t matter how good/clean/flexible a framework is. What matters, for production/industry purposes, is if the framework supports new hardware that unlocks new possibilities. For instance, a single TPU Pod is currently beating the world’s 10th fastest supercomputer. If the only way to access that is through Tensorflow, then the inherent value of other frameworks do not matter as much, and product builders may have no choice but to opt to work with TensorFlow. Interestingly, this opens up all sorts of interesting business opportunities for folks who optimize the thin layer between hardware and software.

6. Forgo your assumptionsWe take somethings in Computer Science as sacrosanct. For example, for lookups, hashmaps are efficient. For set-membership testing, it is hard to beat Bloom filters, and so on. This paper is a reminder to question everything. We see this everywhere in the paper:

“Yet, we argue that none of these apparent obstacles are as problematic as they might seem.” “As a result, the high cost to execute a neural net might actually be negligible in the future.”

Whenever you say, “that’s expensive” or “that’s difficult” while making product decisions, ask yourself “why?” If you don’t, your competitor might be asking these questions. Are there new hardware, new datasets, and new research that invalidate your assumptions?

7. Back-of-the-envelope calculations should be done oftenA great question Kraska et al. ask in their paper is “What Model Complexity Can We Afford?” (c.f. Section 2.1). We rarely see this in ML research papers, but this line of thinking is crucial in product building as we operate under time and budget constraints.

8. ML is not pixie dustWhen taking ML bets today, machine learning is no longer an afterthought. It is a core computer science principle (like algorithms or distributed systems) that should influence your product development/thinking. Even today, enterprise leaders still think of machine learning as pixie dust they can sprinkle on their existing products by hiring a few ML engineers. This kind of thinking produces marginal gains. For exponential gains, you want to start with machine learning based thinking while building the product.

9. Develop for the hardware you have and have the hardware that meets your product/business goalsThis point is self-explanatory and was covered in bits and pieces in points 1-8, but I am writing it again to emphasize the product-software-hardware interactions in building modern machine learning products.

That’s it for now, but I hope you will read this paper. I am curious to hear your takeaways from this.