Imagine that you are a nutritionist trying to explore the nutritional content of food. What is the best way to differentiate food items? By vitamin content? Protein levels? Or perhaps a combination of both?
Document Similarity With Word Movers Distance
Document Similarity with Word Mover’s Distance
The Power of IPython Notebook + Pandas + and Scikit-learn
contact@andreykurenkov.com
发表于
IPython Notebook, Numpy, Pandas, MongoDB, R — for the better part of a year now, I have been trying out these technologies as part of Udacity’s Data Analyst Nanodegree. My undergrad education barely touched on data visualization or more broadly data science, and so I figured being exposed to the aforementioned technologies would be fun. And fun it has been, with R’s powerful IDE-powered data mundging and visualization techniques having been particularly revelatory. I learned enough of R to create some complex visualizations, and was impressed by how easy is to import data into its Dataframe representations and then transform and visualize that data. I also thought RStudio’s paradigm of continuously intermixed code editing and execution was superior to my habitual workflow of just endlessly cycling between tweaking and executing of Python scripts.
Animate NBA shot events with Paper.js
Yuki Katoh (yukiegosapporo@gmail.com)
发表于
All the shots and FT attempts in one animation made with NBA spatio-temporal data (maintained by neilmj) and paper.js.The data is from Golden State Warriors vs Denver Nuggets on January 13th 2016.
Model-Free Prediction and Control
The problem with the methods covered earlier is that it requires a model. Oftentimes, the agent does not know how the environment works and must figure it out by themselves.
Translating W2v Embedding From One Space To Another
The Problem With Word Embeddings
A Guide to Gradient Boosted Trees with XGBoost in Python
A Gentle Introduction to Bloom Filter
Bloom filters are probabilistic space-efficient data structures. They are very similar to hashtables; they are used exclusively membership existence in a set. However, they have a very powerful property which allows to make trade-off between space and false-positive rate when it comes to membership existence. Since it can make a tradeoff between space and false positive rate, it is called probabilistic data structure.
LSTMs
In past posts, I’ve described how Recurrent Neural Networks (RNNs) can be used to learn patterns in sequences of inputs, and how the idea of unrolling can be used to train them. It turns out that there are some significant limitations to the types of patterns that a typical RNN can learn, due to the way their weight matrices are used. As a result, there has been a lot of interest in a variant of RNNs called Long Short-Term Memory networks (LSTMs). As I’ll describe below, LSTMs have more control than typical RNNs over what they remember, which allows them to learn much more complex patterns.