Sometimes in statistics, one knows certain facts with absolute certainty about a distribution. For example, let $@ t $@ represent the time delay between an event occurring and the same event being detected. I don’t know very much about the distribution of times, but one thing I can say with certainty is that $@ t > 0 $@; an event can only be detected after it occurs.
Django and Elastic Beanstalk, a perfect combination
This post gives a minimum working example such that you can launch your Django application on Amazon servers using Elastic Beanstalk. The only things you need is a Django application, Python 3 and an Amazon account. Before we start, make sure you have installed the Amazon CLI. Let’s start!
Google F1 Server Reading Summary
A summary of the Google published whitepaper: F1 - A Distributed SQL Database That Scales.
Book Review: Computer Age Statistical Inference
A new book, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science by Bradley Efron and Trevor Hastie, was released in July this year. I finished reading it a few weeks ago and this is a short review from the point of view of a machine learning researcher.
Grazing in a circular field
I’ve written a little about grazing before (How to calculate the optimal dimensions for a rectangular field using just straight fences). ||This week we’ll look at another classic grazing math problem. This involes a goat grazing in a circular field.Old MacDonald, on his farm, has a large circular field. He also has a goat. He tethers the goat, using a rope, to a post on the circumference of the field. He wants the goat to be able to graze half of the grass in the field. Here’s the question: What length should he make the rope?|
Docker and Kaggle with Ernie and Bert
This post is meant to serve as an introduction to what Docker is and why and how to use it for Kaggle. For simplicity, we will primarily speak about Sesame Street and cupcakes in lieu of computers and data.
Non-Zero Initial States for Recurrent Neural Networks
The default approach to initializing the state of an RNN is to use a zero state. This often works well, particularly for sequence-to-sequence tasks like language modeling where the proportion of outputs that are significantly impacted by the initial state is small. In some cases, however, it makes sense to (1) train the initial state as a model parameter, (2) use a noisy initial state, or (3) both. This post examines the rationale behind trained and noisy intial states briefly, and presents drop-in Tensorflow implementations.
Data-Informed vs Data-Driven
A while back (July 2015), I was fortunate to speak at PyData in Seattle about the pitfalls of overreliance on data. You can find the slides here.
The Two Tribes of Language Researchers
TL;DR not-a-rant rant
T-Shirt Design Contest!
I’ve decided that I want to have Becoming a Data Scientist t-shirts to sell and to give out to podcast guests and contest winners, but I am not a graphic designer, so I need some help! So I’m going to have a t-shirt design contest!