SunJackson Blog

TEXATA Data Analytics Summit 2018 – Exclusive 30% KDnuggets Discount

转载自：http://feedproxy.google.com/~r/kdnuggets-data-mining-analytics/~3/chCXqgKb5oA/texata-data-analytics-summit-2018.html

Matt Mayo Editor

发表于 2018-10-01

The 4th Annual TEXATA Summit is only 3 weeks away! Join us on Friday October 19th in Austin, Texas to learn and connect with your fellow Industry Leaders discussing the latest trends and innovations in AI, Advanced Analytics, Machine Learning and Big Data.

阅读全文 »

Modeling muti-category Outcomes With vtreat

转载自：http://www.win-vector.com/blog/2018/10/modeling-muti-category-outcomes-with-vtreat/

John Mount

发表于 2018-10-01

vtreat is a powerful R package for preparing messy real-world data for machine learning. We have further extended the package with a number of features including rquery/rqdatatable integration (allowing vtreat application at scale on Apache Spark or data.table!).

阅读全文 »

Chromebook Data Science - a free online data science program for anyone with a web browser.

转载自：https://simplystatistics.org/2018/10/01/chromebook-data-science-an-online-data-science-program-for-anyone-with-a-web-browser/

未知

发表于 2018-10-01

Jeff Leek ** 2018/10/01

阅读全文 »

A Right to Reasonable Inferences

转载自：http://feedproxy.google.com/~r/kdnuggets-data-mining-analytics/~3/0yuZsqZ-vrw/right-reasonable-inferences.html

Dan Clark

发表于 2018-10-01

By Dr. Sandra Wachter, Lawyer and Research Fellow (Asst. Prof.), University of Oxford

阅读全文 »

Import AI 114： Synthetic images take a big leap forward with BigGANs; US lawmakers call for national AI strategy; researchers probe language reasoning via HotspotQA

转载自：https://jack-clark.net/2018/10/01/import-ai-114-synthetic-images-take-a-big-leap-forward-with-biggans-us-lawmakers-call-for-national-ai-strategy-researchers-probe-language-reasoning-via-hotspotqa/

Jack Clark

发表于 2018-10-01

Getting hip to multi-hop reasoning with HotpotQA:…New dataset and benchmark designed to test common sense reasoning capabilities…Researchers with Carnegie Mellon University, Stanford University, the Montreal Institute for Learning Algorithms, and Google AI, have created a new dataset and associated competition designed to test the capabilities of question answering systems. The new dataset, HotspotQA, is far larger than many prior datasets designed for such tasks, and has been designed to require ‘multi-hop’ reasoning to thereby test the growing sophistication of newer NLP systems at performing increasing cognitive tasks. HotpotQA consists of around ~113,000 Wikipedia-based question-answer pairs. Answering these questions correctly is designed to test for ‘multi-hop’ reasoning – the ability for systems to look at multiple documents and perform basic iterative problem-solving to come up with correct answers. These questions were “collected by crowdsourcing based on Wikipedia articles, where crowd workers are shown multiple supporting context documents and asked explicitly to come up with questions requiring reasoning about all of the documents”. These workers also provide the supporting facts they use to answer these questions, providing a strong supervised training set. It’s the data, stupid: **To develop HotpotQA the researchers needed to themselves create a kind of multi-hop pipeline to be able to figure out what documents to give cloud workers to use to compose questions for. To do this, they mapped the Wikipedia Hyperlink Graph and used this information to build a directed graph, then they try to detect correspondences between these pairs. They also created a hand-made list of categories to use to compare things of similar categories (eg, basketball players, etc). Testing: HotpotQA can be used to test models’ capabilities in different ways, ranging from information retrieval to question answering. The researchers train a system to give a baseline and the results show that the (relatively strong baseline) obtains performance significantly below that of a competent human across all tasks (with the exception of certain ‘supporting fact’ evaluations, in which it obtains performance on par with an average human). Why it matters: Natural language processing research is currently going through what some have called an ‘ImageNet moment’ following recent algorithmic developments relating to the usage of memory and attention-based systems, which have demonstrated significantly higher performance across a range of reasoning tasks compared to prior techniques, while also being typically much simpler. Like with ImageNet and the associated supervised classification systems, these new types of NLP approaches require larger datasets to be trained on and evaluated against, and as with ImageNet it’s likely that by scaling up techniques to take on challenges defined by datasets like HotpotQA progress in this domain will increase further. Caveat: As with all datasets with an associated competitive leaderboard it is feasible that HotpotQA could be relatively easy and systems could end up exceeding human performance against it in a relatively short amount of time – this happened over the past year with the Stanford SQuAD dataset. Hopefully the relatively higher sophistication of HotspotQA will protect against this. Read more:** HotpotQA website with leaderboard and data (HotpotQA Github). Read more: HOTPOTQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Arxiv).

阅读全文 »

Up your open source game with Hacktoberfest at Locke Data!

转载自：https://itsalocke.com/blog/up-your-open-source-game-with-hacktoberfest-at-locke-data/

未知

发表于 2018-10-01

How awesome is open source software? Quite awesome in our opinion! Locke Data maintains several open source repos on GitHub, in particular of R packages, and we’d like you to join in the fun! This month, we’re taking part in Hacktoberfest and will do our best to mentor you through your first open source contributions if you wish!

阅读全文 »

A Review of the Neural History of Natural Language Processing

转载自：http://blog.aylien.com/a-review-of-the-recent-history-of-natural-language-processing/

Sebastian Ruder

发表于 2018-10-01

Disclaimer This post tries to condense ~15 years’ worth of work into eight milestones that are the most relevant today and thus omits many relevant and important developments. In particular, it is heavily skewed towards current neural approaches, which may give the false impression that no other methods were influential during this period. More importantly, many of the neural network models presented in this post build on non-neural milestones of the same era. In the final section of this post, we highlight such influential work that laid the foundations for later methods.

阅读全文 »

Reinforcement Learning： Super Mario, AlphaGo and beyond

转载自：https://dimensionless.in/reinforcement-learning-super-mario-alphago/

Kshitij Bajracharya

发表于 2018-10-01

阅读全文 »

Distilled News

转载自：https://advanceddataanalytics.net/2018/10/01/distilled-news-875/

Michael Laux

发表于 2018-10-01

Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using Python)

阅读全文 »

Bob Erikson on the 2018 Midterms

转载自：https://andrewgelman.com/2018/10/01/the-2018-midterms/

Andrew

发表于 2018-10-01

Donald Trump’s tumultuous presidency has sparked far more than the usual interest in the next midterm elections as a possible midcourse correction. Can the Democrats can win back the House of Representatives and possibly even the Senate in 2018? This short essay presents some observations about midterm elections and congressional elections generally, followed by some considerations relevant toward understanding the upcoming 2018 midterm verdict. Most of my [Erikson’s] remarks would be commonplace among seasoned congressional election scholars. Please note, however, that I tout a theory of ideological balancing in elections, that remains controversial in some quarters.

阅读全文 »