I thought I would end off the year with a not-so-serious post about capturing the essence of machine learning. In the past, you have undoubtedly explored a variety of in-depth and semi in-depth offerings on what machine learning is, and explored its relationships to numerous other topics. Starting from some initial common point of reference when discussing such complex concepts is always a good idea; the problem is, there exist innumerable initial common points of reference for a topics such as machine learning.
Document worth reading: “Generalization in Machine Learning via Analytical Learning Theory”
This paper introduces a novel measure-theoretic learning theory to analyze generalization behaviors of practical interest. The proposed learning theory has the following abilities: 1) to utilize the qualities of each learned representation on the path from raw inputs to outputs in representation learning, 2) to guarantee good generalization errors possibly with arbitrarily rich hypothesis spaces (e.g., arbitrarily large capacity and Rademacher complexity) and non-stable/non-robust learning algorithms, and 3) to clearly distinguish each individual problem instance from each other. Our generalization bounds are relative to a representation of the data, and hold true even if the representation is learned. We discuss several consequences of our results on deep learning, one-shot learning and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. Because of the differences in the assumptions and the objectives, the proposed learning theory is meant to be complementary to previous learning theory and is not designed to compete with it. Generalization in Machine Learning via Analytical Learning Theory
Synthetic Data Generation: A must-have skill for new data scientists
By Tirthajyoti Sarkar, ON Semiconductor
World’s Biggest Deep Learning Summit 3 weeks away
Deep Learning Summit, San Francisco, Jan 24 - 25
Christmas elves puzzle
Newsletter ** Privacy Contact Us ** About
Using the Economics Value Curve to Drive Digital Transformation
I’m missing my Thursday evening Big Data MBA classes at the University of San Francisco School of Management (though I expect my students are glad that ordeal is over). One of my biggest learnings from this semester was around how to properly construct an actionable and measurable business hypothesis. One of the common mistakes is starting with an overly-simplified business objective such as:
Document worth reading: “The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers”
Automatically solving mathematical word problems (MWPs) is challenging, primarily due to the semantic gap between human-readable words and machine-understandable logics. Despite a long history dated back to the 1960s, MWPs has regained intensive attention in the past few years with the advancement of Artificial Intelligence (AI). To solve MWPs successfully is considered as a milestone towards general AI. Many systems have claimed promising results in self-crafted and small-scale datasets. However, when applied on large and diverse datasets, none of the proposed methods in the literatures achieves a high precision, revealing that current MWPs solvers are still far from intelligent. This motivated us to present a comprehensive survey to deliver a clear and complete picture of automatic math problem solvers. In this survey, we emphasize on algebraic word problems, summarize their extracted features and proposed techniques to bridge the semantic gap, and compare their performance in the publicly accessible datasets. We will also cover automatic solvers for other types of math problems such as geometric problems that require the understanding of diagrams. Finally, we will identify several emerging research directions for the readers with interests in MWPs. The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers
Some fun with {gganimate}
Your browser does not support the video tag.
Clustering the Bible
During this time of year there is obviously a lot of talk about the Bible. As most people know the New Testament comprises four different Gospels written by anonymous authors 40 to 70 years after Jesus’ supposed crucifiction. Unfortunately we have lost all of the originals but only retained copies of copies of copies (and so on) which date back hundreds of years after they were written in all kinds of different versions (renowned Biblical scholar Professor Bart Ehrmann states that there are more versions of the New Testament than there are words in the New Testament). Just as a fun fact: there are many more Gospels but only those four were included in the official Bible.
French Mortality Poster
Based on the heatmaps I drew earlier this month, I made a poster of two centuries of data on mortality rates in France for males and females. It turned out reasonably well, I think. I will probably get it blown up to a nice large size and put it up on the wall. I’ve had very good results with PhD Posters for work like this over the years, by the way.