This was a pretty busy week with a lot going on, but I finally seem to be settling into my new role!
The study for Aerial Robotics is almost over with a week to go. There hasn’t been much coding in this course, but that was to be expected since it was more about PID-Control Theory and quadrotor dynamics. I am particularly interested in the Capstone/’final’ project for this course, which would involve building an autonomous robot in Pi.
Anyway, on to the interesting tidbits from this week:
AlphaGo Zero
Google’s Deepmind recently announced a new version of their AI-based Go player, the AlphaGo Zero. What makes this one so special, is that it breaks the common notion of intelligent systems requiring a LOT of data to produce decent results. AlphaGo Zero was only provided the basic rules of Go, and it performed the rest of the learning all by playing against itself. Oh and BTW, AlphaGo Zero beats AlphaGo, the previous champion in the game. This is indeed a landmark in demonstrating the power of good-old RL.
Read this article for a basic overview, and their paper in Nature for a detailed explanation. Brushing up on Monte Carlo Tree Search would certainly help.
Word Mover’s Distance
Given an excellent embedding of words such as Word2Vec, it is not very difficult to compute the semantic distance between individual terms. However, when it comes to big blocks of text, a simple ‘average’ over term-embeddings isn’t good enough for computing their relative distances.
In such cases, the Word Mover’s Distance, inspired from Earth Mover’s Distance, provides a better solution. It figures out the semantically closest term(s) from one document to each term in another, and then the average effort required to ‘rephrase’ one text in words of another. Click on the article link for a detailed explanation.
Robots generalizing from simulations
OpenAI posted a blog article about how they trained a robot only through simulations. This means that the robot received no data from sensors during the training phase, but was able to perform basic tasks in deployment after some calibration.
During the simulations, they used dynamics randomization to alter basic traits of the environment. This data was then fed to an LSTM to understand the settings and goals. A key insight from this work is Hindsight Experience Replay. Quoting the article, “Hindsight Experience Replay (HER), allows agents to learn from a binary reward by pretending that a failure was what they wanted to do all along and learning from it accordingly. (By analogy, imagine looking for a gas station but ending up at a pizza shop. You still don’t know where to get gas, but you’ve now learned where to get pizza.)”
Concurrency in Go
If you are a Go Programmer, take a look at this old (but good) talk on concurrency patterns and constructs in the language.
Generalization Bounds in Machine Learning
The Generalization Gap for an ML system is defined as the difference between the training error and the generalization error. The Generalization Bound tries to put a bound on this value, based on probability theory. Read this article for a detailed mathematical explanation.
Like this:
Like Loading…