Convolutional Neural Networks are great: they recognize things, places and people in your personal photos, signs, people and lights in self-driving cars, crops, forests and traffic in aerial imagery, various anomalies in medical images and all kinds of other useful things. But once in a while these powerful visual recognition models can also be warped for distraction, fun and amusement. In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones. Just because it’s easy and because we can. And in the process we might learn how to take better selfies :)
Reinforcement Learning - Monte Carlo Methods
Playing Blackjack with Monte Carlo Methods¶
Prototyping Long Term Time Series Storage with Kafka and Parquet
Another attempt to find better storage for time series data, this time it looks quite promising
Data Science Tutorials Flipboard Magazine
I have been getting great feedback on my “Becoming a Data Scientist” Flipboard magazine, and I had this other set of articles bookmarked that didn’t quite fit into it. I want the Becoming a Data Scientist one to be the “best of the best” of articles I find on Twitter about data science, and to focus on understanding data science and related topics without getting too into the “nitty gritty”. However, I often come across great data science related tutorials on very specific topics that may not have broad appeal (and might look scary to beginners) but I also wanted to share.
Dropout Ensembling In Neural Nets
As the size of our neuro nets get larger, it becomes especially easy to overfit. Pretend that we have 20,000 neurons across 20 layers, with only 500 training examples, it’s very easy for the net to memorize the shape of the data and come up with some crazy, super high dimension model only to fit the data.
Recurrent Neural Networks
So far on this blog, we’ve mostly looked at data in two forms – vectors in which each data point is defined by a fixed set of features, and graphs in which each data point is defined by its connections to other data points. For other forms of data, notably sequences such as text and sound, I described a few ways of transforming these into vectors, such as bag-of-words and n-grams. However, it turns out there are also ways to build machine learning models that use sequential data directly. In this post, I want to describe one such approach, called a recurrent neural network.
Reinforcement Learning - Part 1
n-armed bandit problem¶
Theoretical Motivations for Deep Learning
This post is based on the lecture “Deep Learning: Theoretical Motivations” given by Dr. Yoshua Bengio at Deep Learning Summer School, Montreal 2015. I highly recommend the lecture for a deeper understanding of the topic.
Analyzing Pronto CycleShare Data with Python and Pandas
This week Pronto CycleShare, Seattle’s Bicycle Share system, turned one year old. To celebrate this, Pronto made available a large cache of data from the first year of operation and announced the Pronto Cycle Share’s Data Challenge, which offers prizes for different categories of analysis.
Bayes Primer
Ah - now this is interesting. We have strong data evidence of an unfair coin (since we generated the data we know it is unfair with p=.8), but our prior beliefs are telling us that coins are fair. How do we deal with this?