Sensor Fusion Tutorial

What is this sensor fusion thing?

This blog post is about sensor fusion. You might think you donâ€™t know what that means, but donâ€™t worry, you do. Itâ€™s something you do all the time, as part of your daily life. In fact, a lot of it is done by your nervous system autonomously, so you might not even notice that itâ€™s there unless you look for it, or if something goes wrong.

Take, for example, riding a bicycle. If you know how to do it, you likely remember the frustrating days you spent getting on only to fall back down right away. In part, what made it so hard is that riding a bike requires several separate internal sensors to work in concert with one another at a high level of precision. Your brain must learn how to properly interpret and integrate visual cues with what your hands and feet perceive and with readings from your vestibular system (i.e., the â€œaccelerometersâ€� and â€œgyroscopesâ€� in your inner ear). Perhaps going in a straight line is doable without visual cues, but I am willing to bet that taking proper turns while blindfolded is impossible for most people. That is because the visual cues add information about the state of the bicycle that none of the other sensors can provide: what is the angle of the curve ahead, what angle am I currently turning at, and am I heading straight for a tree instead of the middle of the road? On the other hand, getting rid of the information from the vestibular system would be even more detrimental. The visual system, while very good at finding upcoming obstacles, is quite bad at determining small deviations from vertical, and it probably wonâ€™t know you are falling until you are already halfway down.

Ok, so both sensors are needed for riding a bike successfully, but why fuse the information? Couldnâ€™t your eyes make sure that youâ€™re not about to hit an open door, while your inner ear makes sure youâ€™re staying upright? Of course, they could, but it turns out that integrating the two sources of data goes a long way toward eliminating noise from the measurement. Even though the eyes are very bad at determining equilibrium, they still provide some useful information. Your brain could throw this information away, or it could spend a tiny bit of extra energy and use it to significantly improve the accuracy of the inner ear sensor.

Engineers have been quick to catch onto thisâ€”at least soon after electronic sensors became a thingâ€”and so have developed similar techniques to be used in control systems where the same variable can be measured in several different ways: the velocity of cars, the altitude of airplanes or the attitude of space rockets. In the rest of the blog post I’ll introduce some of these techniques while focusing on a relatively pedestrian end-goal: bettering my train commute.

Sensing the number of people in a train car

Here at Datascope most of us take the CTA (Chicago Transit Authority) to work and back. My particular commute is 45 minutes each way on the Red Line, which is the most crowded line in the city. As a result, it is sometimes hard to find a seat, and on busy days, even to get on the train. Sometimes, like on baseball game days, this is unavoidable, since there are more people in the station than there is space on the train. On most days, however, this is merely a problem of distribution: some cars might be full, while others are only half full. Without waiting for the whole train to pass by and looking in every one of 8 or so cars, there is no way to know.

So, keeping this in mind, what could we do if we could measure or estimate the number of people in each car, in real time? For one, we could display a distribution of car occupancy, even before the train arrives. People could wait outside the emptiest car, and not try and crowd into an already crowded one. This would not just make the rides more comfortable, but also speed up boarding, which is a major factor of train delays, at least according to the Chicago Transit Authority courtesy guide.

In addition, the occupancy estimate could solve a personal pet peeve of mine: the poop car. This is a Chicago phenomenon that happens in the winter: you think you scored a nice, relatively empty car, at a busy time too! It looks almost too good to be true, and it turns out it is. The car is filled with some stench or another, making it nearly impossible to breathe, so much so that, at the next stop almost all riders get off and switch to other, more crowded cars, while a new set of â€œsuckersâ€� walks into the trap. Broad City illustrated this phenomenon admirably:

Now, if we had a good estimate of the number of people in each car, we could easily see that one car is much emptier than the rest of the train, while the neighboring cars are fuller. For the Red Line in winter, that is a certain sign of either unbearable stench or some other, equally obnoxious problem.

Sensor models

A simple and cheap way to get a (very rough) headcount in an enclosed space is to measure the level of carbon dioxide in the space. Since people breathe out more CO2 than they breathe in, a higher concentration of CO2 usually correlates with more people. How much, however, depends a lot on the space, (and maybe on other things, like light, time of day, or HVAC dynamics). There is no master function that says â€œwhen your sensor reads 1000ppm of CO2, there are 15 people in your room,â€� and no such function can be calculated for a general case. This is why we need to build our own function, and weâ€™re going to do this by first running an experiment and using the data to build a sensor model. We install a cheap CO2 sensor in each train car, and lump all sources of error into generic â€œsensor noiseâ€�. In our experiment we take readings of both the CO2 level and the number of people present at that time in the train car, many times throughout a day. In code, it looks a little bit like this:

Letâ€™s say we get a figure like this:

The above figure helps us build a sensor model for our particular car. A simple model fitting a line to the points in the figure tells us, for every given reading, how many people we expect to find in the room.[1]

We could stop here, report that number of people, and call it done. But weâ€™re no fools, we see thereâ€™s a LOT of noise in the reading. And so, we also calculate the standard error for our fit, and get a rough estimate of how good our model is. In the above figures, you can see the uncertainty of our sensor model as a shaded gradient.

Letâ€™s talk about what that means for a minute. Say we check our sensor and get a reading of 1092ppm (a number I obtained by randomly mashing my keyboard). We see that this corresponds, on average, to there being approximately 46 people in the train car, but itâ€™s pretty much just as likely that there are 40 or 50 of them. The car could even be emptyâ€”that option is still within two standard deviations of the mean.[2]

Multiple readings between stations

Now that we have a sensor model, letâ€™s actually get some data! Letâ€™s start looking at whatâ€™s happening in the car between two stations, something we can determine easily, either from an accelerometer on the car or from the great train tracker app CTA provides. The nice thing about the train between stations is that very few people change cars at that time (it is in fact illegal to do so). So, we can assume that the number of people in the car stays constant between measurements. We already have a sensor reading, so letâ€™s grab one more.

This one is centered somewhere around 46 people and has, unsurprisingly, the same standard deviation as the first readingâ€”this is a side effect of our model. Now comes the fun part: we merge the two readings into one probability curve:

Mathematically, we multiply the two blue distributions and normalize the result as needed. Because our distributions are Gaussian this multiplication can be done analytically:

$\mu = \frac{\sigma_{p}^2\mu_{up}+\sigma_{up}^2\mu_{p}}{\sigma_{up}^2+\sigma_{p}^2}, \quad \sigma=\frac{\sigma_{up}^2\sigma_{p}^2}{\sigma_{up}^2+\sigma_{p}^2}$

In code, it looks something like this:

Multiple sensors

We can apply the process in the previous section to multiple sensors that measure the same thing. For example, letâ€™s say we want to install a temperature sensor, whose measurements also (loosely) correlate with the number of people in the room. We install the new sensor:

After doing the required experiments, the model for the temperature sensor looks like this:

Now, each measurement between the two stations could come from one sensor or the other. Since we know how much to trust each sensor, we use the same method as before to fuse them together. The math does not change, since weâ€™re still multiplying Gaussians, and the same equations apply.

The first temperature measurement, for example, would fuse with the other two measurements like this:

A 5 minute train ride might look like this, where the red line shows the true occupancy of the car:

To make things easier, I made some custom classes to abstract away measurements, and used them as following:

Moving targets

Until now we assumed the number of people in the car stays constant. Thatâ€™s a decent assumption between two stations, but if we consider a whole route, this is certainly not true. Fortunately, the previous approach can be easily adapted to moving targetsâ€”and by this I mean â€œchanging number of peopleâ€�, not moving people between stations.

In order to estimate changing number of people we need to introduce the concept of time between measurements, and also build a model of human behavior between measurements[3]. Letâ€™s assume that, at every station, a random number of people enter or leave, except at the last station, when everyone leaves the train. In mathematical terms this means that, every time we stop at a station, the error on our estimate of the number of people in the car will have to widen, and since we donâ€™t know how much weâ€™ll just set its sigma to infinity. In code, this is done by either creating a brand new Estimate or setting its sigma to None:

However, as the train is moving between the stations we will be getting measurements again, and our estimation will narrow. Over several stations our estimate will look something like this:

In the end we get a filtered version of the signal, translated from CO2 and temperature levels to number of people in the room. We also get a confidence interval at every point in time. Perhaps, more importantly, we obtained these estimates in real time. What this means is that we didnâ€™t have to wait until the train ride was over, or until we got to the next station in order to average all the data and get a good estimate. Instead, we included every single measurement as it came in, and we could present the people waiting at the next station a constantly updating estimate of how many people were in each car.

Better models

What we did in the previous sections is a well known and quite brilliant method called Kalman filtering. Itâ€™s one of the technological advances that helped Apollo 11 get to the moon, and variants of it are used nowadays on pretty much any non-trivial control system. An old joke says that every time the late Rudolf KÃ¡lmÃ¡n got on a plane he would state his name and ask to be shown â€œhis filterâ€�. A very similar approach, called Kalman smoothing can be taken if we do not need real time results, but instead can use the full data at once to determine the most likely outcome. In our case (due to the fact that we assume no movement between cars between stations ) a Kalman smoother would have resulted in the estimate right before each station being considered as the estimate for that whole leg of the trip. Because of its overall much better estimate, smoothing is usually preferred to filtering when real-time results are not necessary. The downside would be that we wouldnâ€™t be able to tell which car is the smelly car until the end of the line, and then itâ€™s no better than a quick in-person smell test of the whole train.

Improvements on this approach are numerous, and Iâ€™m sure some of you are already thinking about what they might be. Iâ€™ll list only a few:

non-linear sensor models: we fit a polynomial or exponential curve to the experimental data
non-constant sensor noise model: the standard deviation is different at different sensor readings, not constant across the range of readings
time-varying sensor model: the way the sensors behave might change with time of day, or other factors (e.g. air conditioning)
more complex human dynamics: the way people enter and leave a car is a lot more complicated than we assumed; we could, for example, take into account that people are more likely to enter or leave during rush hour; people might prefer to enter into emptier cars; or they might be actively leaving the poop car
some other stuff I havenâ€™t thought of, but you probably have: let us know on Twitter!