I’m a big fan of brainteasers and happen to find quirky, counterintuitive mathematical truisms fascinating. I’d be lying if I said I hadn’t spent a weekend figuring out why once you have 23 people in a room, you’re more likely than not to have at least one shared birthday among them. At the same time, I don’t consider myself particularly “good” at brainteasers, and can’t say I’d perform as well as Bruce Willis in that classic scene from Die Hard 3, even if Samuel L. were there to help me.
Luckily, there’s a fairly simple way to improve your skill at almost anything: practice it, just as you might jog up stairs or run for long distances to build up stamina. But imagine if instead of having regular old running shoes, you had turbo-charged, bionic shoes that made it easier for you to run farther and push your distance limits. You’d build up your stamina for sure, and you might even run to parts of your neighborhood or city that you’d never explored before and see something cool.
While these turbo-shoes don’t exist last time I checked, as a data scientist I can tell you that an analogue to them exists when it comes to generating insights about a wide array of complex systems, from physical and biological processes and financial investments to winning your favorite board game or solving a little brainteaser. What’s more, you already have everything you need to use this tool assuming you’re reading this post from a computer.
What is this mysterious method I speak of? Simulation!
In my humble opinion, one of the “killer apps” of taking a data science-y approach to problem solving is as follows: “let computers do your dirty work for you.” Setting up a quick simulation is one of the best examples of this principle at work: figure out what game you want to play, tell your computer the rules, and then have it play that game over and over again while aggregating the results in an easy-to-interpret format. This provides a way to tackle prohibitively complex problems and gain a good understanding of “long-run” phenomena, all while making a much better use of your time (unless you have nothing better to do than flip a coin 10,000 times).
In this post, I want to quickly demonstrate how trading the pen and paper for a little bit of Python helped me work out a brainteaser I like to call the “Crazy Plane” problem.
Imagine there are a 100 people in line to board a plane with 100 seats. For convenience, let’s say that person 1 has the ticket to seat 1, person 2 to seat 2, etc. Unfortunately, the first person in line is crazy and simply takes a random seat instead of looking at his ticket. Each person boarding after him will either take the “proper” seat, or will choose a random seat if hers is already occupied. What is the probability that the last person to board the plane will end up in the correct seat?
If you looked at the problem above for a little while, don’t instantly see a way to “math it out,” but don’t want to just google the answer, then there’s no need to ask your doctor about simulation because I already know it’s right for you! You don’t need a stack of statistical skillz to solve this problem; all you need is a function.
The Drawing Board
As is the case with many of the problems we deal with here at Datascope, my first plan of attack was to pick up a pen and draw. Before tackling the 100-seat plane specified in the problem, I sketched out some trials with smaller planes just to get a sense of the scenario we’re dealing with. Consider a tiny plane with just two seats and two passengers. When the first passenger chooses randomly, he’ll either sit in his assigned seat or in the target passenger’s seat, leaving the target passenger with a 50-50 shot of finding her seat unoccupied.
We can draw out similar scenarios for planes with three seats and four seats. Each row in these diagrams represents the possible seat choices that the first, second, or nth passenger can make, given the choices of previous passengers. Below is an artistic rendering of the ways a crazy plane with three seats could pan out…
…and here’s a sketch for the plane with four seats.
In looking at these base cases, a couple things stand out:
- The last passenger on the plane only ever ends up in her seat or in the crazy guy’s seat (seat number 1).* If anybody in the middle “undoes” the crazy passenger’s error by randomly selecting his seat, then everything else goes off without a hitch. Everything works out 50 percent of the time for each of these small planes. This gives us an inkling that the probability of the last passenger ending up in her assigned seat may not depend on the number of people boarding the plane.
We have an idea of what will happen if we ramp up from our “tiny planes” to our full 100-seat crazy plane, which is great. Unfortunately, drawing these diagrams for larger planes quickly becomes intractable. I’d be lying if I said I didn’t while away certain evenings drawing diagrams for five- and six-passenger planes, but I don’t even want to think about larger diagrams.[1].
Simulating a Crazy Plane
But here’s the good news: we don’t have to think about the 100-passenger diagram or calculate out all the ways in which passengers may be inconvenienced by prior randomness. Instead, we can lace up our metaphorical turbo-charged running shoes by plopping down in front of a computer (ironically) and letting a little bit of code do that work for us. Here’s just one way of turning this brainteaser into an easily repeatable function:
We simulate passengers boarding the crazy plane by creating two identical lists of the numbered passengers and their corresponding assigned seats. In the main loop, we board all passengers except the final one. As passengers take seats, corresponding numbers are removed from both lists. We know that the first passenger will always choose a seat randomly, so we make sure of that. Thereafter, each passenger tries to look up his seat, but will choose randomly if that seat has been removed from the list. Once all these choices have been made, we look at the last passenger and the last seat to see if there’s a match, returning 1 if there is and 0 if there isn’t (hereafter termed success and failure, respectively).
Voila! You can now simulate the crazy plane process for any size plane you like. Where do we go from here? Well, the crazy plane is chock full o’ randomness, meaning that it is unpredictable in the short run but predictable in the long run (just like rolling dice or flipping coins). That is to say, the more times we observe a random process like the crazy plane, the better we’ll understand it. This is the real “secret sauceâ€� of simulations: they generate much more information about the “long runâ€� in a much shorter amount of time.
To see what I mean, let’s simulate the long-run behavior of the 100-seat crazy plane. Each graph below shows the cumulative success rate (successful trials/total trials) of running the process 100 times.
Notice that to the leftmost region of each graph (where the number of trials is still small), the blue line representing the cumulative success rate fluctuates a lot, because random processes are “unpredictable in the short run.” As we get closer to the full 100 trials, that fluctuation decreases and our success rate converges on the very same 50 percent that we saw in our “tiny planeâ€� sketches. There’s some variation in success rates for the four “batches” above: we’re generally within five percentage points of 50 percent, but never hit it on the nose. How might this change if we bumped up the number of trials? Because we have code, this is an easy question to answer!
Once we increase the number of simulations in each batch, the results come even closer to 50 percent. You can see in the far-right portion of each graph how our cumulative success rates more or less flatline. Put in a more general way: as we increase the “length” of the long run, the impact of each individual experiment is minimized. This leaves us with a clearer picture of what’s actually going on at scale: that this seemingly complex boarding process is no different than flipping a coin!
Don’t Just Calculate…Simulate!
At this point, let’s reflect on what we haven’t done:
-
Opened a combinatorics textbook
-
Obtained an advanced math or stats degree
-
Filled blackboard upon blackboard with arcane equations a la A Beautiful Mind
Now, let’s reflect on what we have done:
-
Simulated our 100-seat crazy plane process thousands of times in a few seconds
-
Solved the problem, for all intents and purposes
-
Created something extensible: if we want more precision around our answer, we simply simulate more and tabulate new results.[2]
-
Created something customizable: the function we created will work for any number of seats. Want to investigate a crazy plane with 50 seats? 150 seats? Just change N in our function and you’re ready for lunatic lift-off!
In solving this brainteaser, we’ve also gotten a brief introduction to Monte Carlo methods, which is just a fancy term for simulating something a bunch of times to solve problems. If that description seems incredibly general, that’s because simulations are extensible and customizable (as we’ve already seen). Simulation-based solutions to problems pop up in disciplines that are about as varied as the day is long. Here are just a couple of examples of simulation at work:
-
If you wanted to estimate the value of π, you could do it with a one-by-one square, a circle inscribed within that square, and a uniform random number generator.
-
Monte Carlo simulations are often employed to program AI for various popular games..
-
If you happen to be an asset manager and want to understand how different portfolio allocations generate returns in various environments, a tool like this one may be for you.
-
There are shelves of textbooks devoted to simulation methods in the physical sciences, including but not limited to physics,fluid dynamics, quantum mechanics, and biology.
We demonstrated a simulation-based approach to solving a tidy little brainteaser. Now, if you weren’t as stubborn of a nerd as I am, you probably could have just googled this problem and found a litany of answers, explanations, and analytic solutions. That’s fine for a tidy little brainteaser. You may even have the math skillz to have solved this one without simulation, but any real problem will have many more degrees of complexity. In fact, it’s not that hard to “untidyâ€� our brainteaser a bit…
-
What if there were not just one crazy passenger, but two, three, or more crazy passengers?
-
What if those crazy passengers were dispersed randomly throughout the boarding line?
-
What if a passenger randomly choosing a seat were more likely to pick certain seats than others? Perhaps the initial crazy is less likely to run way to the back of the plane and take seat 100 than he is to take a seat nearby, or any passenger finding her seat taken will choose the nearest unoccupied seat…
Minor tweaks like this will be difficult or require a PhD thesis to work into an analytic solution to the problem. By comparison, it’s easy to tweak our code above to accommodate any or all of the scenarios above, run a couple hundred or couple thousand more trials, and see how our inference changes. Now that we’ve explored one particular framing of the “crazy plane” problem and have some basic code to simulate the process, we have all the raw materials needed to formulate and test new hypotheses. No advanced training in statistics or combinatorics required, just a computer and some curiosity!
Additionally, simulations can illuminate the path to an analytic solution, much like the back of the textbook can help guide you from half a solution to a whole one. Given what our simulations showed us about the crazy plane problem, we might be emboldened to pick up the pencil and paper again to derive a general formula about why the probability is 50 percent. It may seem strange in this scenario, but let’s say our simulations were integrated within a larger application that needed to work in real time. In that case, we may not have the time to let our computer churn through a million possible scenarios, and having a formula to calculate (or quickly approximate) the quantity we need would be our only option. In reality, calculations and simulations can reinforce each other. After all, we started with some quick sketches, and for a larger problem we might iterate on these stages many times: start at the “drawing board,� code up and run some simulations, go back to the drawing board to ask new questions, modify code, etc.
So the next time you see a problem that you’re not sure how to calculate (or perhaps just don’t want to calculate), see if you can simulate! And what’s more, simulate with pride, knowing that you’re taking advantage of one of the most powerful approaches in the data scientist’s tool belt.