What is this?
This is an interactive demonstration-explanation of gradient boosting algorithm applied to classification problem. Boosting takes a decision (‘blue’ or ‘orange’) by iteratively building many simpler classification algorithms (decision trees in our case).
Challenges
Can you ..
-
overfit the model? (from some point test loss should increase)
-
achieve test loss < 0.02 on ‘xor’ dataset (second one) using depth = 2?
same for depth = 1?
-
achieve test loss < 0.1 on spiral dataset using minimal depth of trees?
-
fit ‘stripes’ dataset with trees of depth 1? Can you explain why this is possible?
-
get minimal possible loss on dataset with several embedded circles?
Comments
- each time you change some parameter, gradient boosting is recomputed from the scratch (yes, gradient boosting algorithm is quite fast)
if learning rate is small, target doesn’t change much. As a result, consequent trees have similar structure to fight this, different randomizations are introduced:
subsampling (take random part of data to train each tree)
and random subspaces (take random subset of features to build a tree), only 2 variables in the demo, so random rotations are used instead of random subspaces.
did you notice? If you set learning rate to be high (without using Newton-Raphson update) only several first trees make serious contribution, other trees are almost not used
There are many other things about GB you can find out from this demo. Enjoy!