Timothy Brathwaite sends along this wonderfully-titled article (also here, and here’s the replication code), which begins:
If you did not already know
Manifold Interpretation and diagnosis of machine learning models have gained renewed interest in recent years with breakthroughs in new approaches. We present Manifold, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner. Conventional techniques usually focus on visualizing the internal logic of a specific model type (i.e., deep neural networks), lacking the ability to extend to a more complex scenario where different model types are integrated. To this end, Manifold is designed as a generic framework that does not rely on or access the internal logic of the model and solely observes the input (i.e., instances or features) and the output (i.e., the predicted result and probability distribution). We describe the workflow of Manifold as an iterative process consisting of three major phases that are commonly involved in the model development and diagnosis process: inspection (hypothesis), explanation (reasoning), and refinement (verification). The visual components supporting these tasks include a scatterplot-based visual summary that overviews the models’ outcome and a customizable tabular view that reveals feature discrimination. We demonstrate current applications of the framework on the classification and regression tasks and discuss other potential machine learning use scenarios where Manifold can be applied. …
Whats new on arXiv
Uncertainty Quantification for Kernel Methods
R Packages worth a look
Bayesian Online Changepoint Detection (ocp)Implements the Bayesian online changepoint detection method by Adams and MacKay (2007)
Part 5: Code corrections to optimism corrected bootstrapping series
The truth is out there R readers, but often it is not what we have been led to believe. The previous post examined the strong positive results bias in optimism corrected bootstrapping (a method of assessing a machine learning model’s predictive power) with increasing p (completely random features). There were 2 implementations of the method given, 1 has a slight error, 2 seems fine. The trend is still the same with the corrected code, but the problem with my code is I did not set ‘replace=TRUE’ in the call to the ‘sample’ function.
Document worth reading: “Learnable: Theory vs Applications”
Two different views on machine learning problem: Applied learning (machine learning with business applications) and Agnostic PAC learning are formalized and compared here. I show that, under some conditions, the theory of PAC Learnable provides a way to solve the Applied learning problem. However, the theory requires to have the training sets so large, that it would make the learning practically useless. I suggest shedding some theoretical misconceptions about learning to make the theory more aligned with the needs and experience of practitioners. Learnable: Theory vs Applications
Deep Learning for Media Content
Machine learning continues to make its way into the arts, most recently in film and TV.
Manning Countdown to 2019 – Big Deals on AI, Data Science, Machine Learning books and videos
Sponsored Post.The end of the year represents a chance for new beginnings, and if you’re looking to start 2019 with your tech skills as good as they can be, then Manning Publications is happy to help.
Using emojis as scatterplot points
Recently I wanted to learn how to use emojis as points in a scatterplot points. It seems like the emojifont
package is a popular way to do it. However, I couldn’t seem to get it to work on my machine (perhaps I need to install the font manually?). The other package I found was emoGG
; this post shows how to use this package. (For another example involving fire stations, see this script.)
Part 4: Why does bias occur in optimism corrected bootstrapping?
In the previous parts of the series we demonstrated a positive results bias in optimism corrected bootstrapping by simply adding random features to our labels. This problem is due to an ‘information leak’ in the algorithm, meaning the training and test datasets are not kept seperate when estimating the optimism. Due to this, the optimism, under some conditions, can be very under estimated. Let’s analyse the code, it is pretty straightforward to understand then we can see where the problem originates.