SunJackson Blog

Because it's Friday： The Curiosity Show

转载自：http://blog.revolutionanalytics.com/2018/08/because-its-friday-the-curiosity-show.html

David Smith

发表于 2018-08-31

This one’s all about nostalgia. When I was about 7 or 8 years old back in Australia, my favorite TV show by far was The Curiosity Show, a weekly series hosted by Rob (Morrison) and Deane (Hutton) — they didn’t use their last names — featuring segments about science and practical experiments you could do at home. My favorite part was the experiments: I remember pestering my parents for all manner of straws, construction paper, watch batteries and wire to follow along with the TV show (and also the companion books, themed around Earth, Air, Fire and Water). So I was thrilled to find this week that the entire series is now available on YouTube. I still remember the thrill of building my own electric motor:

阅读全文 »

Magister Dixit

转载自：https://advanceddataanalytics.net/2018/08/31/magister-dixit-1331/

Michael Laux

发表于 2018-08-31

“Data sources such as social media text, messaging, log files (such as clickstream data), and machine data from sensors in the physical world present the opportunity to pick up where transaction systems leave off regarding underlying sentiment driving customer interactions; external events or trends impacting institutional financial or security risk; or adding more detail regarding the environment in which supply chains, transport or utility networks operate.” Tony Baer ( 2014 )

阅读全文 »

Whats new on arXiv

转载自：https://advanceddataanalytics.net/2018/08/31/whats-new-on-arxiv-749/

Michael Laux

发表于 2018-08-31

Self-Attentive Sequential Recommendation

阅读全文 »

A Deep (But Jargon and Math Free) Dive Into Deep Learning

转载自：https://blog.dataiku.com/deep-learning-essentials

megan.fang@dataiku.com (Megan Fang)

发表于 2018-08-31

Deep learning has been around for a while, so why has it just become a buzz topic in the last 5 years? Well, deep learning returned to the headlines in 2016 when Google’s AlphaGo program crushed Lee Sedol, one of the highest-ranking Go players in the world.Before there wasn’t a good way to train deep learning neural networks, but now with advancements in machine learning (ML) algorithms and deep learning chipsets, deep learning is being more actively implemented. Deep learning is being applied in healthcare, finance, and retail, and the global deep learning market is expected to reach $10.2 billion by 2025. But what is it?

阅读全文 »

Distilled News

转载自：https://advanceddataanalytics.net/2018/09/01/distilled-news-850/

Michael Laux

发表于 2018-08-31

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

阅读全文 »

Document worth reading： “PMLB： A Large Benchmark Suite for Machine Learning Evaluation and Comparison”

转载自：https://advanceddataanalytics.net/2018/08/31/document-worth-reading-pmlb-a-large-benchmark-suite-for-machine-learning-evaluation-and-comparison/

Michael Laux

发表于 2018-08-31

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future. PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

阅读全文 »

Counting baseball cliches

转载自：http://flowingdata.com/2018/08/31/counting-baseball-cliches/

Nathan Yau

发表于 2018-08-31

Post-game sports interviews tend to sound similar. And when you do say something out of pattern, the talk shows and the social media examine every word to find hidden meaning. It’s no wonder athletes talk in cliches. The Washington Post, using natural language processing, counted the phrases and idioms that baseball players use.

阅读全文 »

Guide to a high-performance, powerful R installation

转载自：http://blog.revolutionanalytics.com/2018/08/installation-guide.html

David Smith

发表于 2018-08-31

R is an amazing language with 25 years of development behind it, but you can make the most from R with additional components. An IDE makes developing in R more convenient; packages extend R’s capabilities; and multi-threaded libraries make computations faster.

阅读全文 »

If you did not already know

转载自：https://advanceddataanalytics.net/2018/08/30/if-you-did-not-already-know-468/

Michael Laux

发表于 2018-08-30

Efficient Neural Architecture Search (ENAS) We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%. …

阅读全文 »

If you did not already know

转载自：https://advanceddataanalytics.net/2018/08/30/if-you-did-not-already-know-469/

Michael Laux

发表于 2018-08-30

Kleinberg’s Impossibility Theorem Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and pro- foundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. …

阅读全文 »