In statistical work in the age of big data we often get hung up on differences that are statistically significant (reliable enough to show up again and again in repeated measurements), but clinically insignificant (visible in aggregation, but too small to make any real difference to individuals).
Six Dice Betting Game
||
Rules to Learn By
Longtime readers of this newsletter know that we follow the Fairness, Accountability, and Transparency in Machine Learning conversation closely (see here and here). These conversations address and attempt to mitigate the potential for technical systems to produce unfairness. Much of this unfairness arises from how algorithmic systems might perpetuate historical inequalities or otherwise produce discriminatory effects. This conversation is broader than could be encapsulated in any newsletter, but we want to point to some recommendations that have come out of this conversation to demonstrate how we think through the challenges of building models that don’t learn or perpetuate bias. We embrace these challenges not just because of an overriding ethical commitment to build safely, but also because addressing these challenges helps us build things that work better than they otherwise might.
Convolve all the things
While deep learning can be applied generally, much of the excitement around it has stemmed from significant breakthroughs in two main areas: computer vision and natural language processing. Practitioners have typically applied convolutional neural networks (CNNs) to spatial data (e.g. images) and recurrent neural networks (RNNs) to sequence data (e.g. text). However, a recent research paper has shown that convolutional neural networks are not only capable of performing well on sequential data tasks, but they have inherent advantages over recurrent networks and may be a better default starting point.
Data Science in 30 Minutes: Deep Learning to Detect Fake News with Uber ATG Head of Data Science, Mike Tamir
This FREE webinar will take place LIVE online on August 21st at 5:30PM ET. Register below now, space is limited!
BDD100K: A Large-scale Diverse Driving Video Database
Update 06/18/2018: please also check our follow-up blog post after reading this.
The Data Incubator Unofficial Frequently Asked Questions
About a year ago I wrote a review of The Data Incubator (updated review is here). I always know when the Data Incubator application season is here because I always get a few people who have found my blog reaching out with questions about the process. I decided to put together a short list of some of the most common questions I get asked.
How to Overcome Imposter Syndrome For Good
Some updates
The blog has been eerily silent for most of 2018, here is why:
Import AI:
Satellite imagery competition challenges systems to outline buildings, segment roads, and analyze land use patterns:…DeepGlobe competition and associated datasets designed to speed progress on strategic domain…Researchers with Facebook, DigitalGlobe, CosmiQ Works, Wageningen University, and the MIT Media Lab have revealed DeepGlobe 2018, a satellite imagery competition with three tasks and associated datasets. DeepGlobe is intended to yield improvements in the automated analysis of satellite images for disaster response, planning, and object detection. DeepGlobe 2018 has three tracks with linked datasets: road extraction (8,570 images), building detection (24,586 ‘scenes’, equivalent to a 650×650 image), and land cover classification (1,146 satellite images). Results: **The researchers introduce some baseline performance numbers for each task; for road extraction they used a modified version of DeepLab with a ResNet18 backbone and Focal Loss, obtaining an Intersection over Union (IoU) score of 0.545; for building detection they used the top scoring solutions from a competition held on the same dataset in 2017, which obtain IoU scores of as high as .88 on cities like Las Vegas and as low as 0.54 on Khartoum; for land cover classification they implement a DeepLab system with a ResNet18 backbone and atrous spatial pyramid pooling (ASPP) to obtain an IoU scoe of 0.43. Why it matters: AI will increase the automated analysis capabilities people and nations can wield over their satellite imagery repositories. Progress in this domain directly influences geopolitics by giving rise to new techniques that different nations can use in conjunction with satellite data to watch and react to the world. Read more:** DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images (Arxiv).