This one’s all about nostalgia. When I was about 7 or 8 years old back in Australia, my favorite TV show by far was The Curiosity Show, a weekly series hosted by Rob (Morrison) and Deane (Hutton) — they didn’t use their last names — featuring segments about science and practical experiments you could do at home. My favorite part was the experiments: I remember pestering my parents for all manner of straws, construction paper, watch batteries and wire to follow along with the TV show (and also the companion books, themed around Earth, Air, Fire and Water). So I was thrilled to find this week that the entire series is now available on YouTube. I still remember the thrill of building my own electric motor:
Magister Dixit
“Data sources such as social media text, messaging, log files (such as clickstream data), and machine data from sensors in the physical world present the opportunity to pick up where transaction systems leave off regarding underlying sentiment driving customer interactions; external events or trends impacting institutional financial or security risk; or adding more detail regarding the environment in which supply chains, transport or utility networks operate.” Tony Baer ( 2014 )
Whats new on arXiv
Self-Attentive Sequential Recommendation
A Deep (But Jargon and Math Free) Dive Into Deep Learning
Deep learning has been around for a while, so why has it just become a buzz topic in the last 5 years? Well, deep learning returned to the headlines in 2016 when Google’s AlphaGo program crushed Lee Sedol, one of the highest-ranking Go players in the world.Before there wasn’t a good way to train deep learning neural networks, but now with advancements in machine learning (ML) algorithms and deep learning chipsets, deep learning is being more actively implemented. Deep learning is being applied in healthcare, finance, and retail, and the global deep learning market is expected to reach $10.2 billion by 2025.
But what is it?
Distilled News
Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data
Document worth reading: “PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison”
The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future. PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
Counting baseball cliches
Post-game sports interviews tend to sound similar. And when you do say something out of pattern, the talk shows and the social media examine every word to find hidden meaning. It’s no wonder athletes talk in cliches. The Washington Post, using natural language processing, counted the phrases and idioms that baseball players use.
Guide to a high-performance, powerful R installation
R is an amazing language with 25 years of development behind it, but you can make the most from R with additional components. An IDE makes developing in R more convenient; packages extend R’s capabilities; and multi-threaded libraries make computations faster.
If you did not already know
Efficient Neural Architecture Search (ENAS)
We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%. …
If you did not already know
Kleinberg’s Impossibility Theorem
Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and pro- foundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. …