I’m pleased to share Part I of my new book “Introduction to Reproducible Science in R“. The purpose of this book is to approach model development and software development holistically to help make science and research more reproducible. The need for such a book arose from observing some of the challenges that I’ve seen teaching graduate courses in natural language processing and machine learning, as well as training my own staff to become effective data scientists. While quantitative reasoning and mathematics are important, often I found that the primary obstacle to good data science was reproducibility and repeatability: it’s difficult to quickly reproduce someone else’s results. And this causes myriad headaches:
To get hired as a data scientist, don’t follow the herd
By Jeremie Harris, Co-founder @SharpestMindsAI (Y Combinator W18)
The Long Tail of Medical Data
By Thijs Kooi, Merantix
“Law professor Alan Dershowitz’s new book claims that political differences have lately been criminalized in the United States. He has it wrong. Instead, the orderly enforcement of the law has, ludicrously, been framed as political.”
This op-ed by Virginia Heffernan is about g=politics, but it reminded me of the politics of science.
Machine Learning Toronto SummitNov 20-21 – Special KDnuggets discount
KDNUGGETS is inviting you to celebrate Canada’s top AI Research at the Toronto Machine Learning Summit (TMLS). Join us for food and drinks for both days, academic poster sessions and Vectors Best Poster Award, AI Career Fair and Expo(60+AI start-ups), and the Women in AI evening. Not to mention the talks for 3 streams and workshops. You can see some additional links below; |
Amazon Polly adds Italian and Castilian Spanish voices, and Mexican Spanish language support
Amazon Polly is an AWS service that turns text into lifelike speech. This pre-trained service requires no machine learning skills to easily integrate AI into your applications.
Time Series and MCHT
Introduction Over the past few weeks I’ve published articles about my new package, MCHT, starting with an introduction, a further technical discussion, demonstrating maximized Monte Carlo (MMC) hypothesis testing, bootstrap hypothesis testing, and last week I showed how to handle multi-sample and multivariate data. This is the final article where I explain the capabilities of the package. I show how MCHT can handle time series data.
Healthcare Analytics Made Simple
Sponsored Post.By Vikas (Vik) Kumar
Document worth reading: “An Introduction to Probabilistic Programming”
This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages. We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs. In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller. This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications. An Introduction to Probabilistic Programming
Whats new on arXiv
Path Space Cochains and Population Time Series Analysis