Whats new on arXiv

NNCubes: Learned Structures for Visual Data Exploration

Visual exploration of large multidimensional datasets has seen tremendous progress in recent years, allowing users to express rich data queries that produce informative visual summaries, all in real time. However, a limitation with current techniques is their lack of guidance. Exploration in existing methods is typically driven by data aggregation queries, but these are unable to suggest interesting aggregations and are limited in helping the user understand the types of queries that lead to certain aggregations. To tackle this problem, it is necessary to understand how the space of queries relates to their aggregation results. We present NNCubes: neural networks that are surrogate models for data cube techniques. NNCubes learns a function that takes as input a given query, for instance a geographic region and temporal interval, and outputs an aggregation of the query. The learned function serves as a real-time, low-memory approximator for aggregation queries. Moreover, using neural networks as querying engines opens up new ways to guide user interactions that would be challenging, to do with existing techniques. First, we show how to use the network for discovering queries that lead to user-specified aggregation results, thus providing a form of direct manipulation. Second, our networks are designed in such a way that we learn meaningful 2D projections of the individual inputs, namely that they are predictive of the aggregation operation. We use these learned projections to allow the user to explore the space of aggregation queries, to help discover trends and patterns in the data. We demonstrate both of these forms of guidance using NNCubes on a variety of datasets.

Interpretation of Natural Language Rules in Conversational Machine Reading

Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader’s background knowledge. One example is the task of interpreting regulations to answer ‘Can I…?’ or ‘Do I have to…?’ questions such as ‘I am working in Canada. Do I have to carry on paying UK National Insurance?’ after reading a UK government website about this topic. This task requires both the interpretation of rules and the application of background knowledge. It is further complicated due to the fact that, in practice, most questions are underspecified, and a human assistant will regularly have to ask clarification questions such as ‘How long have you been working abroad?’ when the answer cannot be directly derived from the question and text. In this paper, we formalise this task and develop a crowd-sourcing strategy to collect 32k task instances based on real-world rules and crowd-generated questions and scenarios. We analyse the challenges of this task and assess its difficulty by evaluating the performance of rule-based and machine-learning baselines. We observe promising results when no background knowledge is necessary, and substantial room for improvement whenever background knowledge is needed.

DADA: Deep Adversarial Data Augmentation for Extremely Low Data Regime Classification

A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems

Search-oriented conversational systems rely on information needs expressed in natural language (NL). We focus here on the understanding of NL expressions for building keyword-based queries. We propose a reinforcement-learning-driven translation model framework able to 1) learn the translation from NL expressions to queries in a supervised way, and, 2) to overcome the lack of large-scale dataset by framing the translation model as a word selection approach and injecting relevance feedback in the learning process. Experiments are carried out on two TREC datasets and outline the effectiveness of our approach.

Towards Large Scale Training Of Autoencoders For Collaborative Filtering

In this paper, we apply a mini-batch based negative sampling method to efficiently train a latent factor autoencoder model on large scale and sparse data for implicit feedback collaborative filtering. We compare our work against a state-of-the-art baseline model on different experimental datasets and show that this method can lead to a good and fast approximation of the baseline model performance.

A Deep Neural Network Sentence Level Classification Method with Context Information

In the sentence classification task, context formed from sentences adjacent to the sentence being classified can provide important information for classification. This context is, however, often ignored. Where methods do make use of context, only small amounts are considered, making it difficult to scale. We present a new method for sentence classification, Context-LSTM-CNN, that makes use of potentially large contexts. The method also utilizes long-range dependencies within the sentence being classified, using an LSTM, and short-span features, using a stacked CNN. Our experiments demonstrate that this approach consistently improves over previous methods on two different datasets.

MULDEF: Multi-model-based Defense Against Adversarial Examples for Neural Networks

Despite being popularly used in many application domains such as image recognition and classification, neural network models have been found to be vulnerable to adversarial examples: given a model and an example correctly classified by the model, an adversarial example is a new example formed by applying small perturbation (imperceptible to human) on the given example so that the model misclassifies the new example. Adversarial examples can pose potential risks on safety or security in real-world applications. In recent years, given a vulnerable model, defense approaches, such as adversarial training and defensive distillation, improve the model to make it more robust against adversarial examples. However, based on the improved model, attackers can still generate adversarial examples to successfully attack the model. To address such limitation, we propose a new defense approach, named MULDEF, based on the design principle of diversity. Given a target model (as a seed model) and an attack approach to be defended against, MULDEF constructs additional models (from the seed model) together with the seed model to form a family of models, such that the models are complementary to each other to accomplish robustness diversity (i.e., one model’s adversarial examples typically do not become other models’ adversarial examples), while maintaining about the same accuracy for normal examples. At runtime, given an input example, MULDEF randomly selects a model from the family to be applied on the given example. The robustness diversity of the model family and the random selection of a model from the family together lower the success rate of attacks. Our evaluation results show that MULDEF substantially improves the target model’s accuracy on adversarial examples by 35-50% and 2-10% in the white-box and black-box attack scenarios, respectively.

Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet. Denoising is concerned with a different type of data quality and tries to reduce the negative impact of data noise on MT training, in particular, neural MT (NMT) training. This paper generalizes methods for measuring and selecting data for domain MT and applies them to denoising NMT training. The proposed approach uses trusted data and a denoising curriculum realized by online data selection. Intrinsic and extrinsic evaluations of the approach show its significant effectiveness for NMT to train on data with severe noise.

Entropy and Graph Energy of Complex Networks

The concept of graph energies has remained mostly a theoretical idea with no practical applications in the world of complex and social networks. Graph energy is the energy of the matrix representation of the graph, where matrix energy is the sum of absolute values of the eigenvalues of the matrix. Although theoretical properties of various graph energies have been investigated in the past, there has never been a serious attempt to utilize them in practice. In this work we investigate the usefulnes and usability of graph energies and their entropies in describing a wide spectrum of networks. We show that when graph energies are applied to local egocentric networks within larger topological structures, the values of these energies correlate strongly with several centrality indexes. In particular, for some network models, graph energies tend to correlate very strongly with the betweenness and the eigencentrality of nodes. As the computation of these centrality measures is expensive and requires global processing of a network, our research opens the possibility of devising efficient algorithms for the estimation of these centrality metrics based only on local information.

Learning Low Precision Deep Neural Networks through Regularization

A simulation-based approach to estimate joint model of longitudinal and event-time data with many missing longitudinal observations

Joint models of longitudinal and event-time data have been extensively studied and applied in many different fields. Estimation of joint models is challenging, most present procedures are computational expensive and have a strict requirement on data quality. In this study, a novel simulation-based procedure is proposed to estimate a general family of joint models, which include many widely-applied joint models as special cases. Our procedure can easily handle low-quality data where longitudinal observations are systematically missed for some of the covariate dimensions. In addition, our estimation procedure is compatible with parallel computing framework when combining with stochastic descending algorithm, it is perfectly applicable to massive data and therefore suitable for many financial applications. Consistency and asymptotic normality of our estimator are proved, a simulation study is conducted to illustrate its effectiveness. Finally, as an application, the procedure is applied to estimate pre-payment probability of a massive consumer-loan dataset drawn from one biggest P2P loan platform of China.

Dependency-based Hybrid Trees for Semantic Parsing

We propose a novel dependency-based hybrid tree model for semantic parsing, which converts natural language utterance into machine interpretable meaning representations. Unlike previous state-of-the-art models, the semantic information is interpreted as the latent dependency between the natural language words in our joint representation. Such dependency information can capture the interactions between the semantics and natural language words. We integrate a neural component into our model and propose an efficient dynamic-programming algorithm to perform tractable inference. Through extensive experiments on the standard multilingual GeoQuery dataset with eight languages, we demonstrate that our proposed approach is able to achieve state-of-the-art performance across several languages. Analysis also justifies the effectiveness of using our new dependency-based representation.

Semi-supervised Learning on Graphs with Generative Adversarial Nets

We investigate how generative adversarial nets (GANs) can help semi-supervised learning on graphs. We first provide insights on working principles of adversarial learning over graphs and then present GraphSGAN, a novel approach to semi-supervised learning on graphs. In GraphSGAN, generator and classifier networks play a novel competitive game. At equilibrium, generator generates fake samples in low-density areas between subgraphs. In order to discriminate fake samples from the real, classifier implicitly takes the density property of subgraph into consideration. An efficient adversarial learning algorithm has been developed to improve traditional normalized graph Laplacian regularization with a theoretical guarantee. Experimental results on several different genres of datasets show that the proposed GraphSGAN significantly outperforms several state-of-the-art methods. GraphSGAN can be also trained using mini-batch, thus enjoys the scalability advantage.

Attack Tolerance of Link Prediction Algorithms: How to Hide Your Relations in a Social Network

Link prediction is one of the fundamental research problems in network analysis. Intuitively, it involves identifying the edges that are most likely to be added to a given network, or the edges that appear to be missing from the network when in fact they are present. Various algorithms have been proposed to solve this problem over the past decades. For all their benefits, such algorithms raise serious privacy concerns, as they could be used to expose a connection between two individuals who wish to keep their relationship private. With this in mind, we investigate the ability of such individuals to evade link prediction algorithms. More precisely, we study their ability to strategically alter their connections so as to increase the probability that some of their connections remain unidentified by link prediction algorithms. We formalize this question as an optimization problem, and prove that finding an optimal solution is NP-complete. Despite this hardness, we show that the situation is not bleak in practice. In particular, we propose two heuristics that can easily be applied by members of the general public on existing social media. We demonstrate the effectiveness of those heuristics on a wide variety of networks and against a plethora of link prediction algorithms.

Data Dropout: Optimizing Training Data for Convolutional Neural Networks

Deep learning models learn to fit training data while they are highly expected to generalize well to testing data. Most works aim at finding such models by creatively designing architectures and fine-tuning parameters. To adapt to particular tasks, hand-crafted information such as image prior has also been incorporated into end-to-end learning. However, very little progress has been made on investigating how an individual training sample will influence the generalization ability of a model. In other words, to achieve high generalization accuracy, do we really need all the samples in a training dataset? In this paper, we demonstrate that deep learning models such as convolutional neural networks may not favor all training samples, and generalization accuracy can be further improved by dropping those unfavorable samples. Specifically, the influence of removing a training sample is quantifiable, and we propose a Two-Round Training approach, aiming to achieve higher generalization accuracy. We locate unfavorable samples after the first round of training, and then retrain the model from scratch with the reduced training dataset in the second round. Since our approach is essentially different from fine-tuning or further training, the computational cost should not be a concern. Our extensive experimental results indicate that, with identical settings, the proposed approach can boost performance of the well-known networks on both high-level computer vision problems such as image classification, and low-level vision problems such as image denoising.

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Matching Estimators for Causal Effects of Multiple Treatments

Matching estimators for average treatment effects are widely used in the binary treatment setting, in which missing potential outcomes are imputed as the average of observed outcomes of all matches for each unit. With more than two treatment groups, however, estimation using matching requires additional techniques. In this paper, we propose a nearest-neighbors matching estimator for use with multiple, nominal treatments, and use simulations to show that this method is precise and has coverage levels that are close to nominal.

Matching Algorithms for Causal Inference with Multiple Treatments

Randomized clinical trials (RCTs) are ideal for estimating causal effects, because the distributions of background covariates are similar in expectation across treatment groups. When estimating causal effects using observational data, matching is a commonly used method to replicate the covariate balance achieved in a RCT. Matching algorithms have a rich history dating back to the mid-1900s, but have been used mostly to estimate causal effects between two treatment groups. When there are more than two treatments, estimating causal effects requires additional assumptions and techniques. We propose matching algorithms that address the drawbacks of the current methods, and we use simulations to compare current and new methods. All of the methods display improved covariate balance in the matched sets relative to the pre-matched cohorts. In addition, we provide advice to investigators on which matching algorithms are preferred for different covariate distributions.

Weakly-Supervised Neural Text Classification

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.

Learning to Navigate for Fine-grained Classification

Fine-grained classification is challenging due to the difficulty of finding discriminative features. Finding those subtle traits that fully characterize the object is not straightforward. To handle this circumstance, we propose a novel self-supervision mechanism to effectively localize informative regions without the need of bounding-box/part annotations. Our model, termed NTS-Net for Navigator-Teacher-Scrutinizer Network, consists of a Navigator agent, a Teacher agent and a Scrutinizer agent. In consideration of intrinsic consistency between informativeness of the regions and their probability being ground-truth class, we design a novel training paradigm, which enables Navigator to detect most informative regions under the guidance from Teacher. After that, the Scrutinizer scrutinizes the proposed regions from Navigator and makes predictions. Our model can be viewed as a multi-agent cooperation, wherein agents benefit from each other, and make progress together. NTS-Net can be trained end-to-end, while provides accurate fine-grained classification predictions as well as highly informative regions during inference. We achieve state-of-the-art performance in extensive benchmark datasets.

Towards Automated Customer Support

Recent years have seen growing interest in conversational agents, such as chatbots, which are a very good fit for automated customer support because the domain in which they need to operate is narrow. This interest was in part inspired by recent advances in neural machine translation, esp. the rise of sequence-to-sequence (seq2seq) and attention-based models such as the Transformer, which have been applied to various other tasks and have opened new research directions in question answering, chatbots, and conversational systems. Still, in many cases, it might be feasible and even preferable to use simple information retrieval techniques. Thus, here we compare three different models:(i) a retrieval model, (ii) a sequence-to-sequence model with attention, and (iii) Transformer. Our experiments with the Twitter Customer Support Dataset, which contains over two million posts from customer support services of twenty major brands, show that the seq2seq model outperforms the other two in terms of semantics and word overlap.

• Deep Learning Based Vehicle Make-Model Classification• A Bayesian GED-Gamma stochastic volatility model for return data: a marginal likelihood approach• Organ at Risk Segmentation in Head and Neck CT Images by Using a Two-Stage Segmentation Framework Based on 3D U-Net• MSCE: An edge preserving robust loss function for improving super-resolution algorithms• Road User Abnormal Trajectory Detection using a Deep Autoencoder• Ptychographic Ambiguity and Reconstruction• Twin-GAN — Unpaired Cross-Domain Image Translation with Weight-Sharing GANs• A Deeper Insight into the UnDEMoN: Unsupervised Deep Network for Depth and Ego-Motion Estimation• Task adapted reconstruction for inverse problems• Iterative multi-path tracking for video and volume segmentation with sparse point supervision• Targeted Nonlinear Adversarial Perturbations in Images and Videos• Migrating Knowledge between Physical Scenarios based on Artificial Neural Networks• Wavelet based edge feature enhancement for convolutional neural networks• Performing energy modelling exercises in a transparent way the issue of data quality in power plant databases• Chest X-ray Inpainting with Deep Generative Models• Group-Representative Functional Network Estimation from Multi-Subject fMRI Data via MRF-based Image Segmentation• Dynamic Psychological Game Theory for Secure Internet of Battlefield Things (IoBT) Systems• Learning Gender-Neutral Word Embeddings• Chinese Discourse Segmentation Using Bilingual Discourse Commonality• Finding Dory in the Crowd: Detecting Social Interactions using Multi-Modal Mobile Sensing• Bayesian Outdoor Defect Detection• Skip-gram word embeddings in hyperbolic space• DeepFall — Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders• Regularizing Matrix Factorization with User and Item Embeddings for Recommendation• Bayesian Modeling of Inconsistent Plastic Response due to Material Variability• Matching problem for primary and secondary signals in dual-phase TPC detectors• Neural DrugNet• JuncNet: A Deep Neural Network for Road Junction Disambiguation for Autonomous Vehicles• A Cayley-type identity for trees• Non-Gaussian Stochastic Volatility Model with Jumps via Gibbs Sampler• Minimum Violation Control Synthesis on Cyber-Physical Systems under Attacks• Automated segmentation on the entire cardiac cycle using a deep learning work-flow• On $Z_pZ_{p^k}$-additive codes and their duality• Gromov-Wasserstein Alignment of Word Embedding Spaces• Performance Analysis of Plug-and-Play ADMM: A Graph Signal Processing Perspective• Bilinear Recovery using Adaptive Vector-AMP• Social Network Structure is Predictive of Health and Wellness• Estimation for Quadrotors• A Supervised Learning Approach For Heading Detection• What do RNN Language Models Learn about Filler-Gap Dependencies?• A Multi-Timescale Data-Driven Approach to Enhance Distribution System Observability• A Game-Theoretic Data-Driven Approach for Pseudo-Measurement Generation in Distribution System State Estimation• Your Actions or Your Associates? Predicting Certification and Dropout in MOOCs with Behavioral and Social Features• Towards Resilient Operation of Multi-Microgrids: An MISOCP-Based Frequency-Constrained Approach• On The Capacity of Gaussian MIMO Channels Under The Joint Power Constraints• A Survey on State Estimation Techniques and Challenges in Smart Distribution Systems• Aesthetic Features for Personalized Photo Recommendation• Generalizing Procrustes Analysis for Better Bilingual Dictionary Induction• Indicatements that character language models learn English morpho-syntactic units and regularities• When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size)• Nightmare at test time: How punctuation prevents parsers from generalizing• Rx-Caffe: Framework for evaluating and training Deep Neural Networks on Resistive Crossbars• 3D Segmentation with Exponential Logarithmic Loss for Highly Unbalanced Object Sizes• Collective fast delivery by energy-efficient agents• Location and Capacity Planning of Facilities with General Service-Time Distributions Using Conic Optimization• The NEU Meta-Algorithm for Geometric Learning with Applications in Finance• Predicting protein inter-residue contacts using composite likelihood maximization and deep learning• Understanding Neural Pathways in Zebrafish through Deep Learning and High Resolution Electron Microscope Data• A Simplified Approach to Deep Learning for Image Segmentation• Hierarchical CVAE for Fine-Grained Hate Speech Classification• Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts• Eliminating Boundaries in Cloud Storage with Anna• Attentive Crowd Flow Machines• Continuous data assimilation with blurred-in-time measurements of the surface quasi-geostrophic equation• Multi-UAV Continuum Deformation Flight Optimization in Cluttered Urban Environments• DAC-SDC Low Power Object Detection Challenge for UAV Applications• Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter• Simple Fusion: Return of the Language Model• Contextual Encoding for Translation Quality Estimation• Angle-Domain Approach for Parameter Estimation in High-Mobility OFDM with Fully/Partly Calibrated Massive ULA• Channel PSD Analysis and Optimal Antenna Weighting for High-Mobility Massive MIMO• The Star Dichromatic Number• A spectral characterization for concentration of the cover time• Model-free trading and hedging with continuous price paths• Why is unsupervised alignment of English embeddings from different algorithms so hard?• LIUM-CVC Submissions for WMT18 Multimodal Translation Task• The Optimization Problem of Quantum Discord In the Language of Correlated Observables• Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution• On Adjacency and e-Adjacency in General Hypergraphs: Towards a New e-Adjacency Tensor• Hypergraph Modeling and Visualisation of Complex Co-occurence Networks• Implications of Ocular Pathologies for Iris Recognition Reliability• Linear regression analysis of template aging in iris biometrics• Human Iris Recognition in Post-mortem Subjects: Study and Database• Hyperparameter Learning for Conditional Mean Embeddings with Rademacher Complexity Bounds• Iris Recognition Under Biologically Troublesome Conditions – Effects of Aging, Diseases and Post-mortem Changes• MS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing• Exchange-Based Diffusion in Hb-Graphs: Highlighting Complex Relationships• Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data• Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora• Nucleation instability in super-cooled Cu-Zr-Al glass-forming liquids• Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions• Assessment of iris recognition reliability for eyes affected by ocular pathologies• Post-mortem Human Iris Recognition• Data-Driven Chance Constrained Programs over Wasserstein Balls• Cataract influence on iris recognition performance• Database of iris images acquired in the presence of ocular pathologies and assessment of iris recognition reliability for disease-affected eyes• Iris and periocular recognition in arabian race horses using deep convolutional neural networks• Iris Recognition with a Database of Iris Images Obtained in Visible Light Using Smartphone Camera• Evaluation of Neural Networks for Image Recognition Applications: Designing a 0-1 MILP Model of a CNN to create adversarials• A Multilingual Information Extraction Pipeline for Investigative Journalism• Finding the Answers with Definition Models• VoxSegNet: Volumetric CNNs for Semantic Part Segmentation of 3D Shapes• Gallai-Ramsey numbers of cycles• Sleep Stage Classification: Scalability Evaluations of Distributed Approaches• Vectorization of Large Amounts of Raster Satellite Images in a Distributed Architecture Using HIPI• Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs• A Machine Learning Driven IoT Solution for Noise Classification in Smart Cities• Activity Recognition on a Large Scale in Short Videos – Moments in Time Dataset• Pillar Universities in Russia: The Rise of ‘the Second Wave’• Car Monitoring System in Apartment Garages by Small Autonomous Car using Deep Learning• Parameter Sharing Methods for Multilingual Self-Attentional Translation Models• Parametric Shape Optimization using the Support Function• A Contextual-bandit-based Approach for Informed Decision-making in Clinical Trials• Stochastic Linear-Quadratic Optimal Control Problems with Random Coefficients: Closed-Loop Representation of Open-Loop Optimal Controls• A Decentralized Optimal Control Framework for Connected Automated Vehicles at Urban Intersections with Dynamic Resequencing• Stochastic Video Long-term Interpolation• Function-on-Scalar Quantile Regression with Application to Mass Spectrometry Proteomics Data• The Complexity of Leader Election: A Chasm at Diameter Two• Factorization of Frieze Patterns• Finiteness theorems for matroid complexes with prescribed topology• Opinion Conflicts: An Effective Route to Detect Incivility in Twitter• ${\it Ab : initio}$ density-functional studies of 13-atom Cu and Ag clusters• Stable approximation schemes for optimal filters

Like this:

Like Loading…

Related