Dyadic Data
Dyadic data refers to a domain with two nite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises naturally in many application ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework of learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures. We propose an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains.
http://…/gonzalez-griffin-2012-dyadic-ch.pdf …
One Drink Per Day, Your Chances of Developing an Alcohol-Related Condition
The headline-grabbing conclusion to a recent study was that not just heavy drinking — all levels of alcohol consumption — are bad for you, which of course is in direct contrast to previous studies that suggested a glass of wine or a beer every now and then were good for your heart. What a bummer.
Document worth reading: “Data Innovation for International Development: An overview of natural language processing for qualitative data analysis”
Availability, collection and access to quantitative data, as well as its limitations, often make qualitative data the resource upon which development programs heavily rely. Both traditional interview data and social media analysis can provide rich contextual information and are essential for research, appraisal, monitoring and evaluation. These data may be difficult to process and analyze both systematically and at scale. This, in turn, limits the ability of timely data driven decision-making which is essential in fast evolving complex social systems. In this paper, we discuss the potential of using natural language processing to systematize analysis of qualitative data, and to inform quick decision-making in the development context. We illustrate this with interview data generated in a format of micro-narratives for the UNDP Fragments of Impact project. Data Innovation for International Development: An overview of natural language processing for qualitative data analysis
Whats new on arXiv
Towards Differential Privacy for Symbolic Systems
Amazon SageMaker automatic model tuning produces better models, faster
Amazon SageMaker recently released a feature that allows you to automatically tune the hyperparameter values of your machine learning model to produce more accurate predictions. Hyperparameters are user-defined settings that dictate how an algorithm should behave during training. Examples include how large a decision tree should be grown, the number of clusters desired from a segmentation, or how much you should incrementally update neural network weights as you iterate through the data.
Distilled News
Reinforcement Learning Guide: Solving the Multi-Armed Bandit Problem from Scratch in Python
Distilled News
Free Book: Process Improvement Using Data
If you did not already know
Risk-Averse Imitation Learning (RAIL)
Imitation learning algorithms learn viable policies by imitating an expert’s behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert’s behavior is available as a fixed set of trajectories. We evaluate in terms of the expert’s cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in safety-critical applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail-risk within the GAIL framework. We quantify tail-risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in safety-critical applications. …
R Packages worth a look
Multi-Data-Driven Sparse PLS Robust to Missing Samples (ddsPLS)Allows to build Multi-Data-Driven Sparse PLS models. Multi-blocks with high-dimensional settings are particularly sensible to this.
Don’t calculate post-hoc power using observed estimate of effect size
Aleksi Reito writes: