Whats new on arXiv

Robust Counterfactual Inferences using Feature Learning and their Applications

In a wide variety of applications, including personalization, we want to measure the difference in outcome due to an intervention and thus have to deal with counterfactual inference. The feedback from a customer in any of these situations is only ‘bandit feedback’ – that is, a partial feedback based on whether we chose to intervene or not. Typically randomized experiments are carried out to understand whether an intervention is overall better than no intervention. Here we present a feature learning algorithm to learn from a randomized experiment where the intervention in consideration is most effective and where it is least effective rather than only focusing on the overall impact, thus adding a context to our learning mechanism and extract more information. From the randomized experiment, we learn the feature representations which divide the population into subpopulations where we observe statistically significant difference in average customer feedback between those who were subjected to the intervention and those who were not, with a level of significance l, where l is a configurable parameter in our model. We use this information to derive the value of the intervention in consideration for each instance in the population. With experiments, we show that using this additional learning, in future interventions, the context for each instance could be leveraged to decide whether to intervene or not.

DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

Recently, the introduction of the generative adversarial network (GAN) and its variants has enabled the generation of realistic synthetic samples, which has been used for enlarging training sets. Previous work primarily focused on data augmentation for semi-supervised and supervised tasks. In this paper, we instead focus on unsupervised anomaly detection and propose a novel generative data augmentation framework optimized for this task. In particular, we propose to oversample infrequent normal samples – normal samples that occur with small probability, e.g., rare normal events. We show that these samples are responsible for false positives in anomaly detection. However, oversampling of infrequent normal samples is challenging for real-world high-dimensional data with multimodal distributions. To address this challenge, we propose to use a GAN variant known as the adversarial autoencoder (AAE) to transform the high-dimensional multimodal data distributions into low-dimensional unimodal latent distributions with well-defined tail probability. Then, we systematically oversample at the `edge’ of the latent distributions to increase the density of infrequent normal samples. We show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions.

Estimation in the Cox Survival Regression Model with Covariate Measurement Error and a Changepoint

The Cox regression model is a popular model for analyzing the relationship between a covariate and a survival endpoint. The standard Cox model assumes a constant covariate effect across the entire covariate domain. However, in many epidemiological and other applications, the covariate of main interest is subject to a threshold effect: a change in the slope at a certain point within the covariate domain. Often, the covariate of interest is subject to some degree of measurement error. In this paper, we study measurement error correction in the case where the threshold is known. Several bias correction methods are examined: two versions of regression calibration (RC1 and RC2, the latter of which is new), two methods based on the induced relative risk under a rare event assumption (RR1 and RR2, the latter of which is new), a maximum pseudo-partial likelihood estimator (MPPLE), and simulation-extrapolation (SIMEX). We develop the theory, present simulations comparing the methods, and illustrate their use on data concerning the relationship between chronic air pollution exposure to particulate matter PM10 and fatal myocardial infarction (Nurses Health Study (NHS)), and on data concerning the effect of a subject’s long-term underlying systolic blood pressure level on the risk of cardiovascular disease death (Framingham Heart Study (FHS)). The simulations indicate that the best methods are RR2 and MPPLE.

Transfer Learning for Estimating Causal Effects using Neural Networks

We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature. By taking advantage of transfer learning, we are able to efficiently use different data sources that are related to the same underlying causal mechanisms. We compare our algorithms with those in the extant literature using extensive simulation studies based on large-scale voter persuasion experiments and the MNIST database. Our methods can perform an order of magnitude better than existing benchmarks while using a fraction of the data.

Privacy-Preserving Synthetic Datasets Over Weakly Constrained Domains

Techniques to deliver privacy-preserving synthetic datasets take a sensitive dataset as input and produce a similar dataset as output while maintaining differential privacy. These approaches have the potential to improve data sharing and reuse, but they must be accessible to non-experts and tolerant of realistic data. Existing approaches make an implicit assumption that the active domain of the dataset is similar to the global domain, potentially violating differential privacy. In this paper, we present an algorithm for generating differentially private synthetic data over the large, weakly constrained domains we find in realistic open data situations. Our algorithm models the unrepresented domain analytically as a probability distribution to adjust the output and compute noise, avoiding the need to compute the full domain explicitly. We formulate the tradeoff between privacy and utility in terms of a ‘tolerance for randomness’ parameter that does not require users to inspect the data to set. Finally, we show that the algorithm produces sensible results on real datasets.

XPCA: Extending PCA for a Combination of Discrete and Continuous Variables

Principal component analysis (PCA) is arguably the most popular tool in multivariate exploratory data analysis. In this paper, we consider the question of how to handle heterogeneous variables that include continuous, binary, and ordinal. In the probabilistic interpretation of low-rank PCA, the data has a normal multivariate distribution and, therefore, normal marginal distributions for each column. If some marginals are continuous but not normal, the semiparametric copula-based principal component analysis (COCA) method is an alternative to PCA that combines a Gaussian copula with nonparametric marginals. If some marginals are discrete or semi-continuous, we propose a new extended PCA (XPCA) method that also uses a Gaussian copula and nonparametric marginals and accounts for discrete variables in the likelihood calculation by integrating over appropriate intervals. Like PCA, the factors produced by XPCA can be used to find latent structure in data, build predictive models, and perform dimensionality reduction. We present the new model, its induced likelihood function, and a fitting algorithm which can be applied in the presence of missing data. We demonstrate how to use XPCA to produce an estimated full conditional distribution for each data point, and use this to produce to provide estimates for missing data that are automatically range respecting. We compare the methods as applied to simulated and real-world data sets that have a mixture of discrete and continuous variables.

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a generic analytic solution. This solution not only subsumes some existing augmentation schemes, but also leads to an extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies. We name this method SwitchOut. Experiments on three translation datasets of different scales show that SwitchOut yields consistent improvements of about 0.5 BLEU, achieving better or comparable performances to strong alternatives such as word dropout (Sennrich et al., 2016a). Code to implement this method is included in the appendix.

Deep Neural Network Structures Solving Variational Inequalities

We propose a novel theoretical framework to investigate deep neural networks using the formalism of proximal fixed point methods for solving variational inequalities. We first show that almost all activation functions used in neural networks are actually proximity operators. This leads to an algorithmic model alternating firmly nonexpansive and linear operators. We derive new results on averaged operator iterations to establish the convergence of this model, and show that the limit of the resulting algorithm is a solution to a variational inequality.

Approximation Trees: Statistical Stability in Model Distillation

This paper examines the stability of learned explanations for black-box predictions via model distillation with decision trees. One approach to intelligibility in machine learning is to use an understandable student’ model to mimic the output of an accurate teacher’. Here, we consider the use of regression trees as a student model, in which nodes of the tree can be used as `explanations’ for particular predictions, and the whole structure of the tree can be used as a global representation of the resulting function. However, individual trees are sensitive to the particular data sets used to train them, and an interpretation of a student model may be suspect if small changes in the training data have a large effect on it. In this context, access to outcomes from a teacher helps to stabilize the greedy splitting strategy by generating a much larger corpus of training examples than was originally available. We develop tests to ensure that enough examples are generated at each split so that the same splitting rule would be chosen with high probability were the tree to be re trained. Further, we develop a stopping rule to indicate how deep the tree should be built based on recent results on the variability of Random Forests when these are used as the teacher. We provide concrete examples of these procedures on the CAD-MDD and COMPAS data sets.

Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

State-of-the-art distributed machine learning suffers from significant delays due to frequent communication and synchronizing between worker nodes. Emerging communication-efficient SGD algorithms that limit synchronization between locally trained models have been shown to be effective in speeding-up distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a new framework called Coooperative SGD that subsumes existing communication-efficient SGD algorithms such as federated-averaging, elastic-averaging and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence.

TreeGAN: Syntax-Aware Sequence Generation with Generative Adversarial Networks

Generative Adversarial Networks (GANs) have shown great capacity on image generation, in which a discriminative model guides the training of a generative model to construct images that resemble real images. Recently, GANs have been extended from generating images to generating sequences (e.g., poems, music and codes). Existing GANs on sequence generation mainly focus on general sequences, which are grammar-free. In many real-world applications, however, we need to generate sequences in a formal language with the constraint of its corresponding grammar. For example, to test the performance of a database, one may want to generate a collection of SQL queries, which are not only similar to the queries of real users, but also follow the SQL syntax of the target database. Generating such sequences is highly challenging because both the generator and discriminator of GANs need to consider the structure of the sequences and the given grammar in the formal language. To address these issues, we study the problem of syntax-aware sequence generation with GANs, in which a collection of real sequences and a set of pre-defined grammatical rules are given to both discriminator and generator. We propose a novel GAN framework, namely TreeGAN, to incorporate a given Context-Free Grammar (CFG) into the sequence generation process. In TreeGAN, the generator employs a recurrent neural network (RNN) to construct a parse tree. Each generated parse tree can then be translated to a valid sequence of the given grammar. The discriminator uses a tree-structured RNN to distinguish the generated trees from real trees. We show that TreeGAN can generate sequences for any CFG and its generation fully conforms with the given syntax. Experiments on synthetic and real data sets demonstrated that TreeGAN significantly improves the quality of the sequence generation in context-free languages.

Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs

This paper addresses the problem of mapping natural language text to knowledge base entities. The mapping process is approached as a composition of a phrase or a sentence into a point in a multi-dimensional entity space obtained from a knowledge graph. The compositional model is an LSTM equipped with a dynamic disambiguation mechanism on the input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, the knowledge base space is prepared by collecting random walks from a graph enhanced with textual features, which act as a set of semantic bridges between text and knowledge base entities. The ideas of this work are demonstrated on large-scale text-to-entity mapping and entity classification tasks, with state of the art results.

Probabilistic Multilayer Networks

Here we introduce probabilistic weighted and unweighted multilayer networks as derived from information theoretical correlation measures on large multidimensional datasets. We present the fundamentals of the formal application of probabilistic inference on problems embedded in multilayered environments, providing examples taken from the analysis of biological and social systems: cancer genomics and drug-related violence.

• Optimal Input Placement in Lattice Graphs• Prediction of Reynolds Stresses in High-Mach-Number Turbulent Boundary Layers using Physics-Informed Machine Learning• Predictive Image Regression for Longitudinal Studies with Missing Data• Tunable Eight-Element MIMO Antenna Based on the Antenna Cluster Concept• The Planar Modular Partition Monoid• Progressive Deep Neural Networks Acceleration via Soft Filter Pruning• Bivariate Discrete Inverse Weibull Distribution• Disjunctive domination in trees• Capsule Networks for Protein Structure Classification and Prediction• Everybody Dance Now• The First Order Truth behind Undecidability of Regular Path Queries Determinacy• In the Dance Studio: An Art and Engineering Exploration of Human Flocking• Calibration Scoring Rules for Practical Prediction Training• Second-order Democratic Aggregation• Coprime Sensing via Chinese Remaindering over Quadratic Fields, Part I: Array Designs• Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition• Single-particle localization in dynamical potentials• Coprime Sensing via Chinese Remaindering over Quadratic Fields, Part II: Generalizations and Applications• Vehicles Lane-changing Behavior Detection• Controllability of a system of degenerate parabolic equations with non-diagonalizable diffusion matrix• Rethinking Monocular Depth Estimation with Adversarial Training• An Overview of Datatype Quantization Techniques for Convolutional Neural Networks• Sarcasm Analysis using Conversation Context• Learning Hierarchical Semantic Image Manipulation through Structured Representations• Capacity-Achieving Private Information Retrieval Codes with Optimal Message Size and Upload Cost• Cookie Clicker• The random heat equation in dimensions three and higher: the homogenization viewpoint• Crossing Numbers and Stress of Random Graphs• Training Deeper Neural Machine Translation Models with Transparent Attention• Optimizing the tie-breaker regression discontinuity design• Wong-Zakai approximation and support theorem for semilinear SPDEs with finite dimensional noise in the whole space• Exploring Author Gender in Book Rating and Recommendation• Stakes are higher, risk is lower: Citation distributions are more equal in high quality journals• Generating Magnetic Resonance Spectroscopy Imaging Data of Brain Tumours from Linear, Non-Linear and Deep Learning Models• Pathologies in information bottleneck for deterministic supervised learning• Propagation Measurement System and Approach at 140 GHz-Moving to 6G and Above 100 GHz• Structured Interpretation of Temporal Relations• Discrete Decreasing Minimization, Part I, Base-polyhedra with Applications in Network Optimization• Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations• Robust Directional Modulation Design for Secrecy Rate Maximization in Multi-User Networks• Sparse General Wigner-type Matrices: Local Law and Eigenvector Delocalization• Tomlinson-Harashima Precoding-Aided Multi-Antenna Non-Orthogonal Multiple Access• Discussion of Parameters Setting for A Distributed Probabilistic Modeling Algorithm• Latent Dirichlet Allocation for Internet Price War• Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model• Weakly-supervised Neural Semantic Parsing with a Generative Ranker• Global product structure for a space of special matrices• A Probabilistic Approach to Extended Finite State Mean Field Games• Attention-Guided Answer Distillation for Machine Reading Comprehension• Playing 20 Question Game with Policy-Based Reinforcement Learning• Reflected maxmin copulas and modelling quadrant subindependence• Machine Learning at the Edge: A Data-Driven Architecture with Applications to 5G Cellular Networks• Exploring Shared Structures and Hierarchies for Multiple NLP Tasks• PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition• Spatial verification of high-resolution ensemble precipitation forecasts using local wavelet spectra• Humans make best use of social heuristics when confronting hard problems in large groups• Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification• Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images• An iterative generalized Golub-Kahan algorithm for problems in structural mechanics• Guidelines and Annotation Framework for Arabic Author Profiling• Avoiding long Berge cycles, the missing cases $k=r+1$ and $k = r+2$• Role of Intonation in Scoring Spoken English• Optimal Precoder Designs for Sum-utility Maximization in SWIPT-enabled Multi-user MIMO Cognitive Radio Networks• Counting the number of metastable states in the modularity landscape: Algorithmic detectability limit of greedy algorithms in community detection• Leakage Rate Analysis for Artificial Noise Assisted Massive MIMO with Non-coherent Passive Eavesdropper in Block-fading• A Directionally Selective Neural Network with Separated ON and OFF Pathways for Translational Motion Perception in a Visually Cluttered Environment• End-to-End Neural Entity Linking• Discriminative out-of-distribution detection for semantic segmentation• Data-adaptive trimming of the Hill estimator and detection of outliers in the extremes of heavy-tailed data• On the strong convergence of the discrete and the continuous projected gradient method• Systematic time expansion for the Kardar-Parisi-Zhang equation, linear statistics of the GUE at the edge and trapped fermions• Predicting Action Tubes• Adversarial Attacks on Deep-Learning Based Radio Signal Classification• Joint Models with Multiple Longitudinal Outcomes and a Time-to-Event Outcome• Euler tours in hypergraphs• Spike and slab empirical Bayes sparse credible sets• Adaptive Tuning Of Hamiltonian Monte Carlo Within Sequential Monte Carlo• Measuring network resilience through connection patterns• An Exact Upper Bound on the $L^p$ Lebesgue Constant and The $\infty$-Rényi Entropy Power Inequality for Integer Valued Random Variables• Revisiting the Importance of Encoding Logic Rules in Sentiment Classification• The ballistic annihilation threshold is positive• Asymmetric linkages: maxmin vs. reflected maxmin copulas• Diversity-Driven Selection of Exploration Strategies in Multi-Armed Bandits• On the Diversity of OTFS Modulation in Doubly-Dispersive Channels• Segmentation of Bleeding Regions in Wireless Capsule Endoscopy for Detection of Informative Frames• A Smooth Inexact Penalty Reformulation of Convex Problems with Linear Constraints• Topology and Prediction Focused Research on Graph Convolutional Neural Networks• EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction• Time-Agnostic Prediction: Predicting Predictable Video Frames• Exploring Parallel Execution Strategies for Constraint Handling Rules – Work-in-Progress Report• Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval• Regression-with-residuals Estimation of Marginal Effects: A Method of Adjusting for Treatment-induced Confounders that may also be Moderators• The LU-decomposition of Lehmer’s tridiagonal matrix• On a ‘Two Truths’ Phenomenon in Spectral Graph Clustering• Substitutive structure of Jeandel-Rao aperiodic tilings• High quality ultrasonic multi-line transmission through deep learning• High frame-rate cardiac ultrasound imaging with deep learning• Collective mode reductions for populations of coupled noisy oscillators• Learning to Importance Sample in Primary Sample Space• Comparing seven variants of the Ensemble Kalman Filter: How many synthetic experiments are needed?• Enhancing Cellular Performance through Device-to-Device Distributed MIMO• Sentiment Index of the Russian Speaking Facebook• Equivariant Kazhdan-Lusztig polynomials of $q$-niform matroids• Entanglement Availability Differentiation Service for the Quantum Internet• Multilayer Optimization for the Quantum Internet• On model selection criteria for climate change impact studies• Secure Relaying in Non-Orthogonal Multiple Access: Trusted and Untrusted Scenarios

Like this:

Like Loading…

Related