Recently at work I’ve been asked to help some clinicians understand why my risk model classifies specific patients as high risk. Just prior to this work I stumbled across the work of some data scientists at the University of Washington called lime
. LIME stands for “Local Interpretable Model-Agnostic Explanations�. The idea is that I can answer those questions I’m getting from clinicians for a specific patient by locally fitting a linear (aka “interpretable�) model in the parameter space just around my data point. I decided to pursue lime
as a solution and the last few months I’ve been focusing on implementing this explainer for my risk model. Happily, I also discovered an R package that implements this solution that originated in python.
Document worth reading: “A Tutorial on Modeling and Inference in Undirected Graphical Models for Hyperspectral Image Analysis”
Undirected graphical models have been successfully used to jointly model the spatial and the spectral dependencies in earth observing hyperspectral images. They produce less noisy, smooth, and spatially coherent land cover maps and give top accuracies on many datasets. Moreover, they can easily be combined with other state-of-the-art approaches, such as deep learning. This has made them an essential tool for remote sensing researchers and practitioners. However, graphical models have not been easily accessible to the larger remote sensing community as they are not discussed in standard remote sensing textbooks and not included in the popular remote sensing software and toolboxes. In this tutorial, we provide a theoretical introduction to Markov random fields and conditional random fields based spatial-spectral classification for land cover mapping along with a detailed step-by-step practical guide on applying these methods using freely available software. Furthermore, the discussed methods are benchmarked on four public hyperspectral datasets for a fair comparison among themselves and easy comparison with the vast number of methods in literature which use the same datasets. The source code necessary to reproduce all the results in the paper is published on-line to make it easier for the readers to apply these techniques to different remote sensing problems. A Tutorial on Modeling and Inference in Undirected Graphical Models for Hyperspectral Image Analysis
Document worth reading: “An Overview of Blockchain Integration with Robotics and Artificial Intelligence”
Blockchain technology is growing everyday at a fast-passed rhythm and it’s possible to integrate it with many systems, namely Robotics with AI services. However, this is still a recent field and there isn’t yet a clear understanding of what it could potentially become. In this paper, we conduct an overview of many different methods and platforms that try to leverage the power of blockchain into robotic systems, to improve AI services or to solve problems that are present in the major blockchains, which can lead to the ability of creating robotic systems with increased capabilities and security. We present an overview, discuss the methods and conclude the paper with our view on the future of the integration of these technologies. An Overview of Blockchain Integration with Robotics and Artificial Intelligence
If you did not already know
QuAC
We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total). The interactions involve two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as we show in a detailed qualitative evaluation. We also report results for a number of reference models, including a recently state-of-the-art reading comprehension architecture extended to model dialog context. Our best model underperforms humans by 20 F1, suggesting that there is significant room for future work on this data. Dataset, baseline, and leaderboard are available at quac.ai. …
Melanie Miller says, “As someone who has worked in A.I. for decades, I’ve witnessed the failure of similar predictions of imminent human-level A.I., and I’m certain these latest forecasts will fall short as well. “
Melanie Miller‘s piece, Artificial Intelligence Hits the Barrier of Meaning (NY Times behind limited paywall), is spot-on regarding the hype surrounding the current A.I. boom. It’s soon to come out in book length from FSG, so I suspect I’ll hear about it again in the New Yorker.
AzureR: R packages to control Azure services
by Hong Ooi, senior data scientist, Microsoft Azure
Your Client Engagement Program Isn't Doing What You Think It Is.
Amazing products without engaged clients are bound to fail, and companies claiming to have found the single best solution to client engagement are only fooling themselves.
K-means clustering with Amazon SageMaker
Amazon SageMaker provides several built-in machine learning (ML) algorithms that you can use for a variety of problem types. These algorithms provide high-performance, scalable machine learning and are optimized for speed, scale, and accuracy. Using these algorithms you can train on petabyte-scale data. They are designed to provide up to 10x the performance of the other available implementations. In this blog post, we will explore k-means, which is an unsupervised learning problem. In addition, we’ll walk through the details of the Amazon SageMaker built-in k-means algorithm.
Deep Learning Performance Cheat Sheet
By Chris Dossman, Machine Learning Person, Future asteroid miner.
Practical statistics books for software engineers
So you have read my (draft) book on evidence-based software engineering and want to learn more about the statistical techniques used, but are not interested lots of detailed mathematics. What books do I suggest?