SunJackson Blog

Using Clustering Algorithms to Analyze Golf Shots from the U.S. Open

转载自：https://bigishdata.com/2018/06/19/using-clustering-algorithms-to-analyze-golf-shots-from-the-u-s-open/

Jack Schultz

发表于 2018-06-19

Cluster analysis can be considered one of the pillars of machine learning, and yet it’s one that’s difficult to talk about.

阅读全文 »

Profiling Top Kagglers： Martin Henze (AKA Heads or Tails), World's First Kernels Grandmaster

转载自：http://blog.kaggle.com/2018/06/19/tales-from-my-first-year-inside-the-head-of-a-recent-kaggle-addict/

Martin Henze

发表于 2018-06-19

Let me begin by introducing myself: My name is Martin. I’m an astrophysics postdoc working on understanding exploding stars in nearby galaxies. From the very beginning of my studies, I was using data analysis to try to unveil the mysteries of the universe. From deep images taken with ground- and spaced-based telescopes, through time series measuring the heartbeats of extreme stars, to population correlations probing the fundamental physics behind incredibly powerful eruptions: learning the secrets of a complex cosmos requires all the tools you can get your hands on.

阅读全文 »

Sent2Vec： An unsupervised approach towards learning sentence embeddings

转载自：https://rare-technologies.com/sent2vec-an-unsupervised-approach-towards-learning-sentence-embeddings/

Prerna Kashyap

发表于 2018-06-19

A comparison of sentence embedding techniques by Prerna Kashyap, our RARE Incubator student. As her graduation project, Prerna implemented sent2vec, a new document embedding model in Gensim, and compared it to existing models like doc2vec and fasttext.

阅读全文 »

Is it Time to Regulate Bitcoin?

转载自：http://datameetsmedia.com/is-it-time-to-regulate-bitcoin/

Pio Calderon

发表于 2018-06-19

In the financial space, anything unregulated and unregistered would cause doubts and uneasiness. In the case of cryptocurrencies, such as bitcoin, financial regulators all over the world have started to find ways to oversee the blockchain, or the record of all cryptocurrency transactions, as well as to address the irregularities presented by these virtual currencies that mostly bypass financial firms, exchanges, and regulated banks. The most popular of all cryptocurrencies, bitcoin, chiefly operates outside of the conventions of a financial system; and this worries regulators as it has the potential to be linked to money laundering, tax evasion, fraud, and terrorist funding.

阅读全文 »

Pivoted document length normalisation

转载自：https://rare-technologies.com/pivoted-document-length-normalisation/

Mohit Rathore

发表于 2018-06-19

As a part of the RARE incubator program my goal was to add two new features on the existing TF-IDF model of Gensim. One was implementing a SMART information retrieval system (smartirs) scheme [1] and the other was implementing pivoted document length normalization [2].

阅读全文 »

AI Lab： Learn to Code with the Cutting-Edge Microsoft AI Platform

转载自：https://blogs.technet.microsoft.com/machinelearning/2018/06/19/ai-lab-learn-about-experience-code-with-the-cutting-edge-microsoft-ai-platform/

ML Blog Team

发表于 2018-06-19

This post is authored by Tara Shankar Jana, Senior Technical Product Marketing Manager at Microsoft.

阅读全文 »

Import AI

转载自：https://jack-clark.net/2018/06/18/importai-99-using-ai-to-generate-phishing-urls-evidence-for-how-ai-is-influencing-the-economy-and-using-curiosity-for-self-imitation-learning/

Jack Clark

发表于 2018-06-18

Auto-generating phishing URLs via AI components:…AI is an omni-use technology, so the same techniques used to spot phishing URLs can also be used to generate phishing URLs…Researchers with the Cyber Threat Analytics division of Cyxtera Technologies have written an analysis of how people might “use AI algorithms to bypass AI phishing detection systems” by creating their own system called DeepPhish. DeepPhish: **DeepPhis works by taking in a list of fraudulent URLS that have been successfully worked in the past, encodes these as a one-hot representation, then trains a model to generate new synthetic URLs given a seed sentence. They found that DeepPhish could dramatically improve the chances of a fraudulent URL getting past automated phishing-detection systems, with DeepPhish URLs seeing a boost in effectiveness from 0.69% (no DeepPhish) to 20.90% (with DeepPhish). Security people always have the best names: DeepPhis isn’t the only AI “weapon” system recently developed by researchers, the authors note; other tools include Honey-Phish, SNAP_R, and Deep DGA.* *Why it matters:** This research highlights how AI is an inherent omni-use technology, where the same basic components used to, for instance, train systems to learn to spot potentially fraudulent URLS, can also be used to generate plausible-seeming fraudulent URLs. Read more: DeepPhish: Simulating Malicious AI (PDF).

阅读全文 »

The Role of Resources in Data Analysis

转载自：https://simplystatistics.org/2018/06/18/the-role-of-resources-in-data-analysis/

未知

发表于 2018-06-18

Roger Peng ** 2018/06/18

阅读全文 »

BDD100K Blog Update

转载自：http://bair.berkeley.edu/blog/2018/06/18/bdd-update/

未知

发表于 2018-06-18

We are excited by the interest and excitement generated by our BDD100K dataset. Our data release and blog post were covered in an unsolicited article by the UC Berkeley newspaper, the Daily Cal, which was then picked up by other news services without our prompting or intervention. The paper describing this dataset is under review at the ECCV 2018 conference, and we followed the rules of that conference (as communicated to us by the Program Chairs in prompt email response when we asked for clarification following the reporter’s request; the ECCV PC’s replied that ECCV follows CVPR’s long-standing policy). We thus declined to speak to the reporters after they reached out to us. We did not, and have not, communicated with any media outlets regarding this story.

阅读全文 »

Docstrings in open source Python

转载自：https://rare-technologies.com/docstrings-in-open-source-python/

Dmitry Berdov

发表于 2018-06-18

Hi everyone, my name is Dmitry Berdov, I’m a graduate student at the Ural Federal University, now working in QA testing (automation) sphere. I had no experience with writing documentation before joining the RARE Incubator, where my task has been to refactor and improve the poor state of Gensim docs. Now, after several months of shooting myself hard in the foot, I would like to share my insights from this unforgettable process.

阅读全文 »