My eRum 2018 biggest highlights

On the range of dates 14.-16. May 2018, the European R users meeting (eRum) was held in Budapest. I was there as an active participant since I had the presentation about time series data mining. The eRum 2018 was a very successful event and I want to thank organizers of this event for a great organization of it.

This blog post will be oriented on my biggest highlights of the eRum conference and as a list of useful resources.

Workshops

The eRum started with many workshops separated to 2 blocks and 7 parallel sessions (so together 14 workshops). It was difficult to choose 2 workshops from 14, in which I will sit because there were many interesting topics. I finally chose DALEX and Keras workshops.

DALEX - Descriptive mAchine Learning EXplanations

Great workshop by Przemyslaw Biecek and Mateusz Staniak about tools for exploration, validation, and explanation of complex machine learning models.

I learned many techniques for a diagnosis of machine learning models. Techniques for explanations of a trained model, predictions, single prediction etc. were all presented here. Workshop resources can be downloaded here:

Various packages were used for these purposes, the list of them follows:

Deep learning with Keras

The second workshop that I attended was about using Keras for deep learning by Aimee Gott and Douglas Ashton. It was a nice workshop about the basic usage of Keras library in R. We had got through the use cases with Iris dataset and time series dataset from accelerometer (used CNN for training). The materials can be downloaded from here:

Conference talks

The second and the third day of the conference continued with keynote and invited talks, contributed talks and lightning talks. It was really motivating and inspirational to see all the R enthusiasts speak about their projects. It gives me more confidence to contribute to the R ecosystem or in the Data Science ecosystem in general. I will mention briefly 6 talks that were most fascinating to me.

The recipes package by Edwin Thoen helps in preprocessing (creating) of design (model) matrices. By recipes, you can create effective preprocessing “pipeline” for your data.

The bombshell by Florian Privé was about using large matrices in R. He created bigstats package for a parallel and fast manipulation of matrices with a larger size than RAM size.

The great keynote speech by Nathalie Vialaneix was about using unsupervised learning for relational data (or dissimilarity data). She talked about various interesting use cases to use her R packages adjclust and SOMbrero for clustering relational data. The slides can be found here: slides_villavialaneix_ERUM2018.

Afterward, Erin LeDell from H2O talked about automated ensemble learning using h2o package. The h2o.automl function allows various interesting things, for example, limit (restrict) learning time for a creation of ensemble.

The great machine learning session continued with a talk by Szilard Pafka. His benchmark repositories are well known in the ML community. He talked about gradient boosting frameworks (h2o.gbm, xgboost, lightGBM), and their pros and cons (see repo GBM-perf).

The next day was most interesting for me talk by Henrik Bengtsson about parallel computing in R. His future package allows async parallel multiprocessing computing. It has many various useful applications, for example in shiny apps.

TSrepr talk

As I mentioned in the beginning, I also gave a talk about my TSrepr package. I talked about how to use time series representations to do better data mining in R. Slides are here:

The video of the talk:

You can read more about how to use time series representation methods in my previous blog posts:

All other talks can be seen on Budapest Users of R Network channel!