By George Seif, AI / Machine Learning Engineer
Introducing gratia
I use generalized additive models (GAMs) in my research work. I use them a lot! Simon Wood’s mgcv package is an excellent set of software for specifying, fitting, and visualizing GAMs for very large data sets. Despite recently dabbling with brms, mgcv is still my go-to GAM package. The only down-side to mgcv is that it is not very tidy-aware and the ggplot-verse may as well not exist as far as it is concerned. This in itself is no bad thing, though as someone who uses mgcv a lot but also prefers to do my plotting with ggplot2, this lack of awareness was starting to hurt. So, I started working on something to help bridge the gap between these two separate worlds that I inhabit. The fruit of that labour is gratia, and development has progressed to the stage where I am ready to talk a bit more about it.
How Can Autonomous Drones Help the Energy and Utilities Industry?
5 Steps to Prepare for a Data Science Job
A career in data science is hyped as the hottest job of the 21st century, but how do you become a data scientist? How should you, as an aspiring data scientist, or a student who aims at a data science job, prepare? What are the skills you need? What must you do? Fret not – this article will answer all your questions and give you links with which you can jump-start a new career in data science!
Whats new on arXiv
Removing the influence of a group variable in high-dimensional predictive modelling
Document worth reading: “Attribute-aware Collaborative Filtering: Survey and Classification”
Attribute-aware CF models aims at rating prediction given not only the historical rating from users to items, but also the information associated with users (e.g. age), items (e.g. price), or even ratings (e.g. rating time). This paper surveys works in the past decade developing attribute-aware CF systems, and discovered that mathematically they can be classified into four different categories. We provide the readers not only the high level mathematical interpretation of the existing works in this area but also the mathematical insight for each category of models. Finally we provide in-depth experiment results comparing the effectiveness of the major works in each category. Attribute-aware Collaborative Filtering: Survey and Classification
Computer Vision for Model Assessment
One of the differences between statistical data scientists and machine learning engineers is that while the latter group are concerned primarily with the predictive performance of a model, the former group are also concerned with the fit of the model. A model that misses important structures in the data — for example, seasonal trends, or a poor fit to specific subgroups — is likely to be lacking important variables or features in the source data. You can try different machine learning techniques or adjust hyperparameters to your heart’s content, but you’re unlikely to discover problems like this without evaluating the model fit.
What to think about this new study which says that you should limit your alcohol to 5 drinks a week?
Someone who wishes to remain anonymous points us to a recent article in the Lancet, “Risk thresholds for alcohol consumption: combined analysis of individual-participant data for 599 912 current drinkers in 83 prospective studies,” by Angela Wood et al., that’s received a lot of press coverage; for example:
Introduction to Active Learning
By Jennifer Prendki, VP of Machine Learning, Figure Eight
High school statistics class builds election prediction model
High school seniors, in the Political Statistics class at Montgomery Blair High School in Silver Spring, Maryland, built a prediction model for the upcoming elections: