SunJackson Blog

If you did not already know

转载自：https://advanceddataanalytics.net/2018/09/25/if-you-did-not-already-know-493/

Michael Laux

发表于 2018-09-25

Dyadic Data Dyadic data refers to a domain with two nite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises naturally in many application ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework of learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures. We propose an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains. http://…/gonzalez-griffin-2012-dyadic-ch.pdf …

阅读全文 »

One Drink Per Day, Your Chances of Developing an Alcohol-Related Condition

转载自：https://flowingdata.com/2018/09/25/risk-is-relative/

Nathan Yau

发表于 2018-09-25

The headline-grabbing conclusion to a recent study was that not just heavy drinking — all levels of alcohol consumption — are bad for you, which of course is in direct contrast to previous studies that suggested a glass of wine or a beer every now and then were good for your heart. What a bummer.

阅读全文 »

Document worth reading： “Data Innovation for International Development： An overview of natural language processing for qualitative data analysis”

转载自：https://advanceddataanalytics.net/2018/09/25/document-worth-reading-data-innovation-for-international-development-an-overview-of-natural-language-processing-for-qualitative-data-analysis/

Michael Laux

发表于 2018-09-25

Availability, collection and access to quantitative data, as well as its limitations, often make qualitative data the resource upon which development programs heavily rely. Both traditional interview data and social media analysis can provide rich contextual information and are essential for research, appraisal, monitoring and evaluation. These data may be difficult to process and analyze both systematically and at scale. This, in turn, limits the ability of timely data driven decision-making which is essential in fast evolving complex social systems. In this paper, we discuss the potential of using natural language processing to systematize analysis of qualitative data, and to inform quick decision-making in the development context. We illustrate this with interview data generated in a format of micro-narratives for the UNDP Fragments of Impact project. Data Innovation for International Development: An overview of natural language processing for qualitative data analysis

阅读全文 »

Whats new on arXiv

转载自：https://advanceddataanalytics.net/2018/09/25/whats-new-on-arxiv-773/

Michael Laux

发表于 2018-09-25

Towards Differential Privacy for Symbolic Systems

阅读全文 »

Amazon SageMaker automatic model tuning produces better models, faster

转载自：https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-automatic-model-tuning-produces-better-models-faster/

David Arpin

发表于 2018-09-25

Amazon SageMaker recently released a feature that allows you to automatically tune the hyperparameter values of your machine learning model to produce more accurate predictions. Hyperparameters are user-defined settings that dictate how an algorithm should behave during training. Examples include how large a decision tree should be grown, the number of clusters desired from a segmentation, or how much you should incrementally update neural network weights as you iterate through the data.

阅读全文 »

Distilled News

转载自：https://advanceddataanalytics.net/2018/09/25/distilled-news-869/

Michael Laux

发表于 2018-09-25

Reinforcement Learning Guide: Solving the Multi-Armed Bandit Problem from Scratch in Python

阅读全文 »

Distilled News

转载自：https://advanceddataanalytics.net/2018/09/24/distilled-news-868/

Michael Laux

发表于 2018-09-24

Free Book: Process Improvement Using Data

阅读全文 »

If you did not already know

转载自：https://advanceddataanalytics.net/2018/09/24/if-you-did-not-already-know-492/

Michael Laux

发表于 2018-09-24

Risk-Averse Imitation Learning (RAIL) Imitation learning algorithms learn viable policies by imitating an expert’s behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert’s behavior is available as a fixed set of trajectories. We evaluate in terms of the expert’s cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in safety-critical applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail-risk within the GAIL framework. We quantify tail-risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in safety-critical applications. …

阅读全文 »

R Packages worth a look

转载自：https://advanceddataanalytics.net/2018/09/24/r-packages-worth-a-look-1283/

Michael Laux

发表于 2018-09-24

Multi-Data-Driven Sparse PLS Robust to Missing Samples (ddsPLS)Allows to build Multi-Data-Driven Sparse PLS models. Multi-blocks with high-dimensional settings are particularly sensible to this.

阅读全文 »

Don’t calculate post-hoc power using observed estimate of effect size

转载自：https://andrewgelman.com/2018/09/24/dont-calculate-post-hoc-power-using-observed-estimate-effect-size/

Andrew

发表于 2018-09-24

Aleksi Reito writes:

阅读全文 »