SunJackson Blog


  • 首页

  • 分类

  • 关于

  • 归档

  • 标签

  • 站点地图

  • 公益404

If you did not already know

转载自:https://advanceddataanalytics.net/2018/09/20/if-you-did-not-already-know-488/

Michael Laux


发表于 2018-09-20

Quasi-KL Divergence (QKL) Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm. We show that the proposed framework suffers from several issues; from undefined or pathological behaviour of the true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the approximating distribution relative to the true posterior. Our analysis of the improper log uniform prior used in variational Gaussian dropout suggests the pathologies are generally irredeemable, and that the algorithm still works only because the variational formulation annuls some of the pathologies. To address the singularity issue, we proffer Quasi-KL (QKL) divergence, a new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a simple practical example which shows that the QKL-optimal approximation of a full rank Gaussian with a degenerate one naturally leads to the Principal Component Analysis solution. …

阅读全文 »

Magister Dixit

转载自:https://advanceddataanalytics.net/2018/09/20/magister-dixit-1350/

Michael Laux


发表于 2018-09-20

“Analysts will need a proper understanding of math, statistics, algorithms, and other related sciences in order to deliver meaningful results. They must pair that theoretical knowledge with a firm grasp of the modern-day tools that make the analyses possible. That means having an ability to express queries in terms of MapReduce or some other distributed system, an understanding of how to model data storage across different NoSQL-style systems, and familiarity with libraries that implement common algorithms.” Q Ethan McCallum, Ken Gleason ( 2013 )

阅读全文 »

Learning Statistics Online for Data Science

转载自:https://dimensionless.in/learning-statistics-online-for-data-science/

Kartik Singh


发表于 2018-09-20

Data science is one of the hottest topics in the 21st century because we are generating data at a rate which is much higher than what we can actually process. A lot of business and tech firms are now leveraging key benefits by harnessing the benefits of data science. Due to this, data science right now is really booming.

阅读全文 »

Judging connectedness of American communities, based on Facebook friendships

转载自:https://flowingdata.com/2018/09/20/judging-connectedness-of-american-communities-based-on-facebook-friendships/

Nathan Yau


发表于 2018-09-20

We talk about geographic bubbles a lot these days. Some areas are isolated, in their own bubble. Other areas seem more connected. Emily Badger and Quoctrung Bui for The Upshot looked at this geographic connectedness through the lens of Facebook friendships.

阅读全文 »

Document worth reading: “Automatic Language Identification in Texts: A Survey”

转载自:https://advanceddataanalytics.net/2018/09/21/document-worth-reading-automatic-language-identification-in-texts-a-survey/

Michael Laux


发表于 2018-09-20

Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used so far in the LI literature. For describing the features and methods we introduce a unified notation. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI. Automatic Language Identification in Texts: A Survey

阅读全文 »

Discovering and indexing podcast episodes using Amazon Transcribe and Amazon Comprehend

转载自:https://aws.amazon.com/blogs/machine-learning/discovering-and-indexing-podcast-episodes-using-amazon-transcribe-and-amazon-comprehend/

Angela Wang


发表于 2018-09-20

As an avid podcast listener, I had always wished for an easy way to glimpse at the transcript of an episode to decide whether I should add it to my playlist (not all episode abstracts are equally helpful!). Another challenge with podcasts is that, although they contain a wealth of knowledge that is often not available in blogs and other text formats, there’s isn’t a readily available search engine like Google to index and search the content. What if we could build a tool that converts the audio to text and then build a searchable index on all of our favorite podcast feeds so users could discover information that interests them, without having to listen to a full episode?

阅读全文 »

R Packages worth a look

转载自:https://advanceddataanalytics.net/2018/09/20/r-packages-worth-a-look-1278/

Michael Laux


发表于 2018-09-20

Shiny Applications Internationalization (shiny.i18n)It provides easy internationalization of Shiny applications. It can be used as standalone translation package to translate reports, interactive visuali …

阅读全文 »

How to graph a function of 4 variables using a grid

转载自:https://andrewgelman.com/2018/09/20/how-to-graph-a-function-of-4-variables-using-a-grid/

Andrew


发表于 2018-09-20

This came up in response to a student’s question.

阅读全文 »

PyConUK 2018

转载自:http://ianozsvald.com/2018/09/19/pyconuk-2018/

Ian


发表于 2018-09-19

Last weekend we had another fine PyConUK (2018) conference. Each year the conference grows, the Django Girls group had 70 or so women learning Django (and, often, Python for the first time). The kids hack day was a great success. The Pythonic-hardware demo session was fun.

阅读全文 »

Distilled News

转载自:https://advanceddataanalytics.net/2018/09/20/distilled-news-864/

Michael Laux


发表于 2018-09-19

Help! I can’t reproduce a machine learning project!

阅读全文 »
1 … 214 215 216 … 398
SunJackson

SunJackson

3974 日志
5 分类
© 2018 - 2019 SunJackson