Thomas Perneger points us to this amusing quiz on statistics terminology:
7 Best Practices for Machine Learning on a Data Lake
Why a data lake? Machine learning often requires an iterative process that can drain performance on a traditional warehouse. Data lakes are made for scale and experimentation. They also provide ample, diverse training data for the most comprehensive learning experience, which makes algorithmic assessments more accurate and successful when put into production.
Direct access to Amazon SageMaker notebooks from Amazon VPC by using an AWS PrivateLink endpoint
Amazon SageMaker now supports AWS PrivateLink for notebook instances. In this post, I will show you how to set up AWS PrivateLink to secure your connection to Amazon SageMaker notebooks.
Causal mediation estimation measures the unobservable
I put together a series of demos for a group of epidemiology students who are studying causal mediation analysis. Since mediation analysis is not always so clear or intuitive, I thought, of course, that going through some examples of simulating data for this process could clarify things a bit.
New: Maintained Datasets
Can you trust the data you use on Kaggle? Is it licensed? Has it been updated recently?
Turbocharge Tech Transformation: Integrate AI Across Insurance
By Insurance Nexus Sponsored Post.
Customize your notebook volume size, up to 16 TB, with Amazon SageMaker
Amazon SageMaker now allows you to customize the notebook storage volume when you need to store larger amounts of data.
R plus Magento 2 REST API revisited: part 1- authentication and universal search
I wrote a post about getting Magento 2 data to R using REST API last year. Now I provide more examples of use and a wrapper over API that you can re-use to get data from Magento 2 to R in a bit more convenient way.
Whats new on arXiv
ADEPOS: Anomaly Detection based Power Saving for Predictive Maintenance using Edge Computing
Postdocs and Research fellows for combining probabilistic programming, simulators and interactive AI
Here’s a great opportunity for those interested in probabilistic programming and workflows for Bayesian data analysis: