PyDataBudapest and “Machine Learning Libraries You’d Wish You’d Known About”

I’m back at BudapestBI and this year it has its first PyDataBudapest track. Budapest is fun! I’ve had a second iteration talking on a slightly updated “Machine Learning Libraries You’d Wish You’d Known About” (updated from PyDataCardiff two weeks back). When I was here to give an opening keynote talk two years back the conference was a bit smaller, it has grown by +100 folk since then. There’s also a stronger emphasis on open source R and Python tools. As before, the quality of the members here is high – the conversations are great!

During my talk I used my Explaining Regression Predictions Notebook to cover:

  • Dask to speed up Pandas

  • TPOT to automate sklearn model building

  • Yellowbrick for sklearn model visualisation

  • ELI5 with Permutation Importance and model explanations

  • LIME for model explanations

Some audience members asked about co-linearity detection and explanation. Whilst I don’t have a good answer for identifying these relationships, I’ve added a seaborn pairplot, a correlation plot and the Pandas Profiling tool to the Notebook which help to show these effects.

Although it is complicated, I’m still pretty happy with this ELI5 plot that’s explaining feature contributions to a set of cheap-to-expensive houses from the Boston dataset:

I’m planning to do some training on these sort of topics next year, join my training list if that might be of use.