Machine Learning in Science and Industry slides

Recently we (Alex & Tatiana) were invited to give lectures about machine learning at GradDays — an event that is organized twice a year at the Heidelberg University (Germany’s oldest university).

GradDays are giving courses that broaden the physics knowledge of students and teach specialized useful techniques.

Program of GradDays included:

Black Holes and Quantum Gravity Constraints on Field Theories
Correlated Quantum Dynamics of Ultracold Few- to Many-Body Systems
Quantum Simulation with Quantum Optical Systems
The formation of structure in cosmology
Solar System Exploration Missions and their Scientific Outcome
Quantum Field Theory in Extreme Environments

as well as other sweet things.

Even so, our course “Machine learning and applications in Science and Industry” was the most popular. Focus of the course (heavily influenced by time constraints: only 4 days) was to give a wide overview of useful models in Machine Learning and their applications in very different areas, and even contained optional practice!

That’s why we put inside many interactive demonstrations of machine learning techniques!

Also we tried to create a nice bridge between models and their real-life applications. Many of the examples were from particle physics — an area that we’re working in (tracking, tagging, reweighting, uniform boosting, particle identification, simulation refinement, tuning of simulation parameters, etc.). However we also included some notable examples from other data-intensive areas: astronomy, neuroscience, medicine, climatology and biology.

Finally, many other interesting things done with machine learning were discussed: spam detection, search engines, visual recognition, kinect and AlphaGo, recommender systems and news clustering.

Lecture of the first day gives some introduction into problems, applications and notions of machine learning. Several simple models are discussed to get an impression:

knn and search for neighbours
density estimation techniques
mixtures of distributions
clustering methods
linear models with regularization.

In the second day we made focus on tree-based techniques, specially boosting, that aren’t popular in research now, but work very well in practice and are best-performers in many examples with tabular data

decision trees for classification and regression
Random Forest
AdaBoost and Reweighter
Gradient Boosting for classification, regression and ranking (ordering of items)
Uniform boosting
applications: particle identification, triggers and search engines

On the third day we get back to continuous optimization models, start from revisiting linear and generalized linear models, then more involved models are introduced

linear models and their generalizations
regularizations (again)
SVM and kernel trick
spam detection and elements of visual recognition
factorization models and recommender systems
factorization machines
unsupervised dimensionality reduction techniques (PCA, LLE and IsoMAP)
supervised dimensionality reduction techniques: CSP and LDA
artificial neural networks

Finally, the last day was devoted mostly to deep learning: convolutional and recurrent neural networks, autoencoders, embeddings, GANs and others.

Also, an active learning was demonstrated in couple with gaussian processes.