Monotonicity constraints in machine learning

In practical machine learning and data science tasks, an ML model is often used to quantify a global, semantically meaningful relationship between two or more values. For example, a hotel chain might want to use ML to optimize their pricing strategy and use a model to estimate the likelyhood of a room being booked at a given price and day of the week. For a relationship like this the assumption is that, all other things being equal, a cheaper price is preferred by a user, so demand is higher at a lower price. However what might easily happen is that upon building the model, the data scientist discovers that the model is behaving unexpectedly: for example the model predicts that on Tuesdays, the clients would rather pay $110 than $100 for a room! The reason is that while there is an expected monotonic relationship between price and the likelyhood of booking, the model is unable to (fully) capture it, due to noisyness of the data and confounds in it.

Too often, such constraints are ignored by practitioners, especially when non-linear models such as random forests, gradient boosted trees or neural networks are used. And while monotonicity constraints have been a topic of academic research for a long time (see a survey paper on monotonocity constraints for tree based methods), there has been lack of support from libraries, making the problem hard to tackle for practiotioners.

Luckily, in recent years there has been a lot of progress in various ML libraries to allow setting monotonicity constrsints for the models, including in LightGBM and XGBoost, two of the most popular libraries for gradient boosted trees. Monotonicity constraints have also been built into Tensorflow Lattice, a library that implements a novel method for creating interpolated lookup tables.

Monotinicity constraints in LighGBM and XGBoost

For tree based methods (decision trees, random forests, gradient boosted trees), monotonicity can be forced during the model learning phase by not creating splits on monotonic features that would break the monotonicity constraint.

In the following example, let’s train too models using LightGBM on a toy dataset where we know the relationship between X and Y to be monotonic (but noisy) and compare the default and monotonic model.

Let’s fit a fit a gradient boosted model on this data, setting min_child_samples to 5.

The model will slightly overfit (due to small min_child_samples), which we can see from plotting the values of X against the predicted values of Y: the red line is not monotonic as we’d like it to be.

Since we know that that the relationship between X and Y should be monotonic, we can set this constraint when specifying the model.

The parameter monotone_constraints=”1″ states that the output should be monotonically increasing wrt. the first features (which in our case happens to be the only feature). After training the monotone model, we can see that the relationship is now strictly monotone.

And if we check the model performance, we can see that not only does the monotonicity constraint provide a more natural fit, but the model generalizes better as well (as expected). Measuring the mean squared error on new test data, we see that error is smaller for the monotone model.

Default model mse 37.61501106522855Monotone model mse 32.283051723268265

Other methods for enforcing monotonicity

Tree based methods are not the only option for setting monotonicity constraint in the data. One recent development in the field is Tensorflow Lattice, which implements lattice based models that are essentially interpolated look-up tables that can approximate arbitrary input-output relationships in the data and which can optionally be monotonic. There is a thorough tutorial on it in Tensorflow Github.

If a curve is already given, monotonic spline can be fit on the data, for example using the splinefun package.