On the range of dates 14.-16. May 2018, the European R users meeting (eRum) was held in Budapest. I was there as an active participant since I had the presentation about time series data mining. The eRum 2018 was a very successful event and I want to thank organizers of this event for a great organization of it.
This blog post will be oriented on my biggest highlights of the eRum conference and as a list of useful resources.
Workshops
The eRum started with many workshops separated to 2 blocks and 7 parallel sessions (so together 14 workshops). It was difficult to choose 2 workshops from 14, in which I will sit because there were many interesting topics. I finally chose DALEX and Keras workshops.
DALEX - Descriptive mAchine Learning EXplanations
Great workshop by Przemyslaw Biecek and Mateusz Staniak about tools for exploration, validation, and explanation of complex machine learning models.
I learned many techniques for a diagnosis of machine learning models. Techniques for explanations of a trained model, predictions, single prediction etc. were all presented here. Workshop resources can be downloaded here:
Various packages were used for these purposes, the list of them follows:
Deep learning with Keras
The second workshop that I attended was about using Keras for deep learning by Aimee Gott and Douglas Ashton. It was a nice workshop about the basic usage of Keras library in R. We had got through the use cases with Iris dataset and time series dataset from accelerometer (used CNN for training). The materials can be downloaded from here:
Conference talks
The second and the third day of the conference continued with keynote and invited talks, contributed talks and lightning talks. It was really motivating and inspirational to see all the R enthusiasts speak about their projects. It gives me more confidence to contribute to the R ecosystem or in the Data Science ecosystem in general. I will mention briefly 6 talks that were most fascinating to me.
The recipes
package by Edwin Thoen helps in preprocessing (creating) of design (model) matrices. By recipes, you can create effective preprocessing “pipeline” for your data.
The bombshell by Florian Privé was about using large matrices in R. He created bigstats
package for a parallel and fast manipulation of matrices with a larger size than RAM size.
The great keynote speech by Nathalie Vialaneix was about using unsupervised learning for relational data (or dissimilarity data). She talked about various interesting use cases to use her R packages adjclust
and SOMbrero
for clustering relational data. The slides can be found here: slides_villavialaneix_ERUM2018.
Afterward, Erin LeDell from H2O talked about automated ensemble learning using h2o
package. The h2o.automl
function allows various interesting things, for example, limit (restrict) learning time for a creation of ensemble.
The great machine learning session continued with a talk by Szilard Pafka. His benchmark repositories are well known in the ML community. He talked about gradient boosting frameworks (h2o.gbm, xgboost, lightGBM), and their pros and cons (see repo GBM-perf).
The next day was most interesting for me talk by Henrik Bengtsson about parallel computing in R. His future
package allows async parallel multiprocessing computing. It has many various useful applications, for example in shiny apps.
TSrepr talk
As I mentioned in the beginning, I also gave a talk about my TSrepr
package. I talked about how to use time series representations to do better data mining in R. Slides are here:
The video of the talk:
You can read more about how to use time series representation methods in my previous blog posts:
All other talks can be seen on Budapest Users of R Network channel!