SunJackson Blog

Dimensionality reduction is a critical component of any solution dealing with massive data collections. Being able to sift through a mountain of data efficiently in order to find the key descriptive, predictive, and explanatory features of the collection is a fundamental required capability for coping with the Big Data avalanche. Identifying the most interesting dimensions of data is especially valuable when visualizing high-dimensional (high-variety) big data.

阅读全文 »

Histogram intersection for change detection

转载自：http://blog.datadive.net/histogram-intersection-for-change-detection/

ando

发表于 2016-02-28

The need for anomaly and change detection will pop up in almost any data driven system or quality monitoring application. Typically, there a set of metrics that need to be monitored and an alert raised if the values deviate from the expected. Depending on the task at hand, this can happen at individual datapoint level (anomaly detection) or population level where we want to know if the underlying distribution changes or not (change detection).

阅读全文 »

A Variant on “Statistically Controlling for Confounding Constructs is Harder than you Think”

转载自：http://www.johnmyleswhite.com/notebook/2016/02/25/a-variant-on-statistically-controlling-for-confounding-constructs-is-harder-than-you-think/

John Myles White

发表于 2016-02-25

Yesterday, a coworker pointed me to a new paper by Jacob Westfall and Tal Yarkoni called “Statistically controlling for confounding constructs is harder than you think”. I quite like the paper, which describes some problems that arise when drawing conclusions about the relationships between theoretical constructs using only measurements of observables that are, at best, approximations to those theoretical constructs.

阅读全文 »

How to Code and Understand DeepMind's Neural Stack Machine

转载自：https://iamtrask.github.io/2016/02/25/deepminds-neural-stack-machine/

未知

发表于 2016-02-25

Summary: I learn best with toy code that I can play with. This tutorial teaches DeepMind’s Neural Stack machine via a very simple toy example, a short python implementation. I will also explain my thought process along the way for reading and implementing research papers from scratch, which I hope you will find useful.

阅读全文 »

Oil Changes, Gas Mileage, and my Unreliable Gut

转载自：http://daynebatten.com/2016/02/oil-change-gas-mileage/

daynebatten

发表于 2016-02-24

Kia recommends that I get the oil in my 2009 Rio changed every 7,500 miles. But, anecdotally, it seemed that I always got better gas mileage right after an oil change than I did right before I was due for another one. So, I got to wondering - if an oil change costs $20, but saves me a few MPGs, is it cheaper overall to change my oil sooner than 7,500 miles? If so, where’s the optimal point?

阅读全文 »

Science Week Talk 2016

转载自：http://inverseprobability.com/2016/02/24/science-week-talk

未知

发表于 2016-02-24

I’m doing a public event for Science Week this year on 17th March in the Diamond building. More details and seat bookings are here

阅读全文 »

Guide to an in-depth understanding of logistic regression

转载自：https://www.dataschool.io/guide-to-logistic-regression/

Kevin Markham

发表于 2016-02-22

When faced with a new classification problem, machine learning practitioners have a dizzying array of algorithms from which to choose: Naive Bayes, decision trees, Random Forests, Support Vector Machines, and many others. Where do you start? For many practitioners, the first algorithm they reach for is one of the oldest in the field: logistic regression.

阅读全文 »