SunJackson Blog

Variance Stabilizing Transformations for Single Cell UMI Data (sctransform)A normalization method for single-cell UMI count data using a variance stabilizing transformation. The transformation is based on a negative binomial r …

阅读全文 »

Import AI： 122： Google obtains new ImageNet state-of-the-art with GPipe; drone learns to land more effectively than PD controller policy; and Facebook releases its ‘CherryPi’ StarCraft bot

转载自：https://jack-clark.net/2018/11/26/import-ai-122-google-obtains-new-imagenet-state-of-the-art-with-gpipe-drone-learns-to-land-more-effectively-than-pd-controller-policy-and-facebook-releases-its-cherrypi-starcraft-bot/

Jack Clark

发表于 2018-11-26

Google obtains new ImageNet state-of-the-art accuracy with mammoth networks trained via ‘GPipe’ infrastructure:…If you want to industrialize AI, you need to build infrastructure like GPipe…Parameter growth = Performance Growth: The researchers note that the winner of the 2014 ImageNet competition had 4 million parameters in its model, while the winner of the 2017 challenge had 145.8 million parameters – a 36X increase in three years. GPipe, by comparison, can support models of up to almost 2-billion parameters across 8 accelerators. Pipeline parallelism via GPipe: GPipe is a distributed ML library that uses synchronous mini-batch gradient descent for training. It is designed to spread workloads across heterogeneous hardware systems (multiple types of chips) and comes with a bunch of inbuilt features which let it efficiently scale up model training, with the researchers reporting a (very rare) near-linear speedup: “with 4 times more accelerators we can achieve a 3.5 times speedup for training giant neural networks [with GPipe]” they write. Results: **To test out how effective GPipe is the researchers trained ResNet and AmoebaNet (previous ImageNet SOTA) networks on it, running the experiments on TPU-V2s, each of which has 8 accelerator cores and an aggregate memory of 64GB. Using this technique they were able to train a new ImageNet system with a state-of-the-art Top-1 Accuracy of 84.3% (up from 82.7 percent), and a Top-5 Accuracy of 97 percent. Why it matters: “Our work validates the hypothesis that bigger models and more computation would lead to higher model quality,” write the researchers. This trend of research bifurcating into large-compute and small-compute domains has significant ramifications for the ability for smaller entities (for instance, startups) to effectively compete with organizations with access to large computational infrastructure (eg, Google). A more troubling effect with long-term negative consequences is that at these compute scales it is almost impossible for academia to do research at the same scale as corporate research entities. I continue to worry that this will lead to a splitting of the AI research community and potentially the creation of the sort of factionalism and ‘us vs them’ attitude seen elsewhere in contemporary life.Companies will seek to ameliorate this inequality of compute by releasing the artifacts of compute (eg, pre-trained models). Though this will go some way to empowering researchers it will fail to deal with the underlying problems which are systemic and likely require a policy solution (aka, more money for academia, and so on). Read more:** GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (Arxiv).

阅读全文 »

You Can’t Do AI Without Augmented Analytics and AutoML

转载自：https://blog.dataiku.com/you-cant-do-ai-without-augmented-analytics-and-automl

lynn.heidmann@dataiku.com (Lynn Heidmann)

发表于 2018-11-26

By now, the race to AI - and especially to using AI in the enterprise - is officially on. But if AutoML is not in your toolbox or if you’re not at least thinking about aligning the data organization around augmented analytics, you won’t get very far.

阅读全文 »

Cathy O’Neil discusses the current lack of fairness in artificial intelligence and much more.

转载自：http://feedproxy.google.com/~r/RBloggers/~3/p2Zs98fMbro/

Hugo Bowne-Anderson

发表于 2018-11-26

阅读全文 »

Physics-Based Learned Design： Teaching a Microscope How to Image

转载自：http://bair.berkeley.edu/blog/2018/11/26/physics-learning/

未知

发表于 2018-11-26

阅读全文 »

3 Challenges for Companies Tackling Data Science

转载自：http://feedproxy.google.com/~r/kdnuggets-data-mining-analytics/~3/ShRv7Fj9HS4/mathworks-3-challenges-companies-data-science.html

Dan Clark

发表于 2018-11-26

Sponsored Post.By Seth Deland, Product Marketing Manager of Data Analytics, MathWorks

阅读全文 »

Building Blocks of Decision Tree

转载自：https://dimensionless.in/building-blocks-of-decision-tree/

Jagrati Valecha

发表于 2018-11-26

It’s been said that Data Scientist is the “sexiest job title of the 21st century.” This is because of one main reason that there is a humongous amount of data available as we are producing data at a rate as never before. With the dramatic access to data, there are sophisticated algorithms present such as Decision trees, Random Forests etc. When there is a humongous amount of data available, the most intricate part is to select the correct algorithm to solve the problem. Each model has its own pros and cons and should be selected depending on the type of problem at hand and data available.

阅读全文 »