At: Global Legal Entity Identifier Foundation (GLEIF)Location: Frankfurt, Germany
Web: www.gleif.org/en/Position: Data Analyst
Instance segmentation with OpenCV
Adrian Rosebrock
发表于
Whats new on arXiv
A Survey on Spark Ecosystem for Big Data Processing
R Packages worth a look
Variance Stabilizing Transformations for Single Cell UMI Data (sctransform)A normalization method for single-cell UMI count data using a variance stabilizing transformation. The transformation is based on a negative binomial r …
Import AI: 122: Google obtains new ImageNet state-of-the-art with GPipe; drone learns to land more effectively than PD controller policy; and Facebook releases its ‘CherryPi’ StarCraft bot
Google obtains new ImageNet state-of-the-art accuracy with mammoth networks trained via ‘GPipe’ infrastructure:…If you want to industrialize AI, you need to build infrastructure like GPipe…Parameter growth = Performance Growth: The researchers note that the winner of the 2014 ImageNet competition had 4 million parameters in its model, while the winner of the 2017 challenge had 145.8 million parameters – a 36X increase in three years. GPipe, by comparison, can support models of up to almost 2-billion parameters across 8 accelerators. Pipeline parallelism via GPipe: GPipe is a distributed ML library that uses synchronous mini-batch gradient descent for training. It is designed to spread workloads across heterogeneous hardware systems (multiple types of chips) and comes with a bunch of inbuilt features which let it efficiently scale up model training, with the researchers reporting a (very rare) near-linear speedup: “with 4 times more accelerators we can achieve a 3.5 times speedup for training giant neural networks [with GPipe]” they write. Results: **To test out how effective GPipe is the researchers trained ResNet and AmoebaNet (previous ImageNet SOTA) networks on it, running the experiments on TPU-V2s, each of which has 8 accelerator cores and an aggregate memory of 64GB. Using this technique they were able to train a new ImageNet system with a state-of-the-art Top-1 Accuracy of 84.3% (up from 82.7 percent), and a Top-5 Accuracy of 97 percent. Why it matters: “Our work validates the hypothesis that bigger models and more computation would lead to higher model quality,” write the researchers. This trend of research bifurcating into large-compute and small-compute domains has significant ramifications for the ability for smaller entities (for instance, startups) to effectively compete with organizations with access to large computational infrastructure (eg, Google). A more troubling effect with long-term negative consequences is that at these compute scales it is almost impossible for academia to do research at the same scale as corporate research entities. I continue to worry that this will lead to a splitting of the AI research community and potentially the creation of the sort of factionalism and ‘us vs them’ attitude seen elsewhere in contemporary life.Companies will seek to ameliorate this inequality of compute by releasing the artifacts of compute (eg, pre-trained models). Though this will go some way to empowering researchers it will fail to deal with the underlying problems which are systemic and likely require a policy solution (aka, more money for academia, and so on). Read more:** GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (Arxiv).
You Can’t Do AI Without Augmented Analytics and AutoML
lynn.heidmann@dataiku.com (Lynn Heidmann)
发表于
By now, the race to AI - and especially to using AI in the enterprise - is officially on. But if AutoML is not in your toolbox or if you’re not at least thinking about aligning the data organization around augmented analytics, you won’t get very far.
Cathy O’Neil discusses the current lack of fairness in artificial intelligence and much more.
Physics-Based Learned Design: Teaching a Microscope How to Image
3 Challenges for Companies Tackling Data Science
Sponsored Post.By Seth Deland, Product Marketing Manager of Data Analytics, MathWorks
Building Blocks of Decision Tree
It’s been said that Data Scientist is the “sexiest job title of the 21st century.” This is because of one main reason that there is a humongous amount of data available as we are producing data at a rate as never before. With the dramatic access to data, there are sophisticated algorithms present such as Decision trees, Random Forests etc. When there is a humongous amount of data available, the most intricate part is to select the correct algorithm to solve the problem. Each model has its own pros and cons and should be selected depending on the type of problem at hand and data available.