This year my team at Yandex organized MLHEP (Machine Learning in High Energy Physics) summer school in Lund, Sweden.
There were two tracks: basic and advanced, lasting for three days + 2 days on neural networks for both tracks together.
School was accompanied by two kaggle challenges: one for both tracks and one for advanced. This is the most producive way to try and learn techniques in practice.
Just as a year ago, I gave lectures for basic track. Previous materials were enriched with new topics and more explanations.
Also, I’ve added many visualizations and animations compared to the previous year.
This 3-day course is the shortest course of machine learning, and itstill gives nice introduction into some advanced topics!
Day 1
Introduction to machine learning terminology. Applications within High Energy Physics and outside HEP.
-
Basic problems: classification and regression.
-
Nearest neighbours approach and spacial indices
-
Overfitting (intro)
-
Curse of dimensionality
-
ROC curve, ROC AUC
-
Bayes optimal classifier
-
Density estimation: KDE and histograms
Parametric density estimation
-
Mixtures for density estimation and EM algorithm
-
Linear decision rule, intro to logistic regression
-
Linear regression
Day 2
-
Linear models: logistic regression
-
Polynomial decision rule and polynomial regression
-
SVM (Support Vector Machine) and kernel trick
-
Overfitting: two definitions
-
Model selection
-
Regularizations: L1, L2, elastic net.
Decision trees
-
Splitting criteria for classification and regression
-
Overfitting in trees: pre-stopping and post-pruning
-
Non-stability of trees
-
Feature importance
-
RSM, subsampling, bagging.
-
Random Forest
Day 3
Ensembles
-
AdaBoost
-
Gradient Boosting for regression
-
Gradient Boosting for classification
-
Second-order information
-
Losses: regression, classification, ranking
-
ensembling
-
softmax modifications
-
PCA
-
LDA, CSP
-
LLE
-
Isomap
-
ML-based approach
-
Gaussian processes
Day 4, part 1
Slides of Tatiana Likhomanenko on non-trivial applications of boosting in High Energy Physics.
-
All materials from school are available at MLHEP 2016 repository
-
Official page at indico
-
Kaggle competitions for school: exotic higgs and triggers