MLHEP 2016 lectures slides

This year my team at Yandex organized MLHEP (Machine Learning in High Energy Physics) summer school in Lund, Sweden.

There were two tracks: basic and advanced, lasting for three days + 2 days on neural networks for both tracks together.

School was accompanied by two kaggle challenges: one for both tracks and one for advanced. This is the most producive way to try and learn techniques in practice.

Just as a year ago, I gave lectures for basic track. Previous materials were enriched with new topics and more explanations.

Also, I’ve added many visualizations and animations compared to the previous year.

This 3-day course is the shortest course of machine learning, and itstill gives nice introduction into some advanced topics!

Day 1

Introduction to machine learning terminology. Applications within High Energy Physics and outside HEP.

  • Basic problems: classification and regression.

  • Nearest neighbours approach and spacial indices

  • Overfitting (intro)

  • Curse of dimensionality

  • ROC curve, ROC AUC

  • Bayes optimal classifier

  • Density estimation: KDE and histograms

Parametric density estimation

  • Mixtures for density estimation and EM algorithm

  • Linear decision rule, intro to logistic regression

  • Linear regression

Day 2

  • Linear models: logistic regression

  • Polynomial decision rule and polynomial regression

  • SVM (Support Vector Machine) and kernel trick

  • Overfitting: two definitions

  • Model selection

  • Regularizations: L1, L2, elastic net.

Decision trees

  • Splitting criteria for classification and regression

  • Overfitting in trees: pre-stopping and post-pruning

  • Non-stability of trees

  • Feature importance

  • RSM, subsampling, bagging.

  • Random Forest

Day 3

Ensembles

  • AdaBoost

  • Gradient Boosting for regression

  • Gradient Boosting for classification

  • Second-order information

  • Losses: regression, classification, ranking

  • ensembling

  • softmax modifications

  • PCA

  • LDA, CSP

  • LLE

  • Isomap

  • ML-based approach

  • Gaussian processes

Day 4, part 1

Slides of Tatiana Likhomanenko on non-trivial applications of boosting in High Energy Physics.

  1. All materials from school are available at MLHEP 2016 repository

  2. Official page at indico

  3. Kaggle competitions for school: exotic higgs and triggers