Analyzing Customer Churn – Competing Risks

Every survival analysis method I’ve talked about so far in this series has had one thing in common: we’ve only looked at one event in a customer lifetime (churn). In many cases, that’s a perfectly fine way to go about things… we want our customers to stick with us, so churn is the event of interest. So why would we ever need to think about competing risks?

You know, competing risks. Will you die by tornado, or by shark?

There’s actually a critical assumption undergirding most survival analysis methods for right-censored data - that censored individuals have the same likelihood of experiencing the event of interest as individuals that never got censored. If this assumption ever gets violated, things like Kaplan-Meier estimators can become wildly inaccurate. (If you need a refresher on Kaplan-Meier curves and other concepts, take a look at my earlier post on basic survival analysis.)

The problem

I’ll give two examples of why this may be a problem. Suppose you were interested in analyzing churn for your basic service plan, and you had a significant number of users that had upgraded from your basic plan to your premium plan. You don’t want to exclude the upgraders from your analysis, since a significant portion of their lifetime was spent on the basic plan. Of course, including them has problems too, since a significant portion of their lifetime wasn’t on your basic plan.

So you decide to include them, but to end their “lifetime” at their upgrade date. But wait, do they churn at their upgrade date? Surely not - that would artificially inflate churn! So they must get censored at their churn date? That’s not ideal either. Users that upgrade are likely very happy with your service. If your premium plan didn’t exist, they may very well have stuck on your basic plan longer than the average user. So, your censored users would be less likely to churn than your non-censored users, and your fundamental assumptions would be violated.

If you count folks that upgrade as churning, you get an obviously wrong curve. But, even in this dummy data set I randomly generated, you also get different results if you simply censor those who upgrade.

This is even easier to understand if we consider the reverse… Suppose you were interested in studying the cumulative incidence of customers upgrading from your basic plan to your premium plan. (A cumulative incidence curve is just 100% - the survival curve. So, you can say things like “10% of customers have upgraded by day 365” rather than “90% of customers have not upgraded by day 365.”)

A very similar dilemma would apply in this case. Certainly, you wouldn’t want to count churners as upgraders - you’d wildly over-inflate your upgrade rates. So do you count them as censored? Absolutely not! The average user who doesn’t churn has an average chance of upgrading. The average user who does churn has a 0% chance of upgrading! So, if you censored individuals at their churn date, you’d effectively be assuming that they continue upgrading at the standard rate, and you’d still be massively overstating your upgrade rate.

The solution

Thankfully, statisticians have solved for this problem using “competing risks” survival models. (I won’t dive into the math in this post, but the NIH has a pretty easy-to-follow explanation of competing risks math.) These models essentially let you study more than one event, and learn about the probability of either event occurring.

So, let’s get started coding this up in R! If you’d like to follow along, download my dummy competing risks data. It’s simply time-to-event data, like usual, except that there are 2 event types indicated by the “type” variable - we’ll say 1 for churn and 2 for upgrades.

We’ll start with basic setup… load the survival library, and load the survival data set.

Next, like usual, we’re going to create a survival variable in the data frame, except that this time we’ll specify a formula that includes both the event indicator and the type indicator, and we’ll specify that this should be an “mstate” (for “multistate”) survival object.

That’s pretty much all there is to it! You can now fit a survival curve and plot it just as you would for a regular survival curve!

If you run this code, you’ll get a plot that looks something like this:

Accurate estimates of the true rate of upgrade and churn, with no assumptions violated.

You’ll notice that this looks a little bit different from the survival analysis plots you’re used to seeing… lines are going up and to the right, rather than down and to the left. That’s because the plot function defaults to plotting “cumulative incidence” curves for competing risks data. As discussed above, this is simply the inverse of a survival curve - the share of people experiencing an event rather than not experiencing an event.

Conclusion

That’s it! You’ve now added another tool to your survival analysis arsenal. Of course, there are extensions of this basic version of competing risks analysis… competing risks regression, for example, is a way of looking at the effect of independent variables on survival rates, while accounting for competing risks. There’s a lot out there, but it’s beyond the scope of this post (though maybe there will be another later).

Please follow up if you notice any problems or have any questions!