Whats new on arXiv

Overoptimization Failures and Specification Gaming in Multi-agent Systems

Overoptimization failures in machine learning and AI can involve specification gaming, reward hacking, fragility to distributional shifts, and Goodhart’s or Campbell’s law. These failure modes are an important challenge in building safe AI systems, but multi-agent systems have additional related failure modes. These failure modes are more complex, more problematic, and less well understood in the multi-agent setting, at least partially because they are not yet observed in practice. This paper explains why this is the case, then lays out some of the classes of such failure, such as accidental steering, coordination failures, adversarial misalignment, input spoofing or filtering, and goal co-option or direct hacking.

Classifying and Visualizing Emotions with Emotional DAN

Classification of human emotions remains an important and challenging task for many computer vision algorithms, especially in the era of humanoid robots which coexist with humans in their everyday life. Currently proposed methods for emotion recognition solve this task using multi-layered convolutional networks that do not explicitly infer any facial features in the classification phase. In this work, we postulate a fundamentally different approach to solve emotion recognition task that relies on incorporating facial landmarks as a part of the classification loss function. To that end, we extend a recently proposed Deep Alignment Network (DAN) with a term related to facial features. Thanks to this simple modification, our model called EmotionalDAN is able to outperform state-of-the-art emotion classification methods on two challenging benchmark dataset by up to 5%. Furthermore, we visualize image regions analyzed by the network when making a decision and the results indicate that our EmotionalDAN model is able to correctly identify facial landmarks responsible for expressing the emotions.

Convolutional Set Matching for Graph Similarity

We introduce GSimCNN (Graph Similarity Computation via Convolutional Neural Networks) for predicting the similarity score between two graphs. As the core operation of graph similarity search, pairwise graph similarity computation is a challenging problem due to the NP-hard nature of computing many graph distance/similarity metrics. We demonstrate our model using the Graph Edit Distance (GED) as the example metric. Experiments on three real graph datasets demonstrate that our model achieves the state-of-the-art performance on graph similarity search.

A mathematical theory of semantic development in deep neural networks

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep learning dynamics to give rise to these regularities.

Inverse reinforcement learning for video games

Deep reinforcement learning achieves superhuman performance in a range of video game environments, but requires that a designer manually specify a reward function. It is often easier to provide demonstrations of a target behavior than to design a reward function describing that behavior. Inverse reinforcement learning (IRL) algorithms can infer a reward from demonstrations in low-dimensional continuous control environments, but there has been little work on applying IRL to high-dimensional video games. In our CNN-AIRL baseline, we modify the state-of-the-art adversarial IRL (AIRL) algorithm to use CNNs for the generator and discriminator. To stabilize training, we normalize the reward and increase the size of the discriminator training dataset. We additionally learn a low-dimensional state representation using a novel autoencoder architecture tuned for video game environments. This embedding is used as input to the reward network, improving the sample efficiency of expert demonstrations. Our method achieves high-level performance on the simple Catcher video game, substantially outperforming the CNN-AIRL baseline. We also score points on the Enduro Atari racing game, but do not match expert performance, highlighting the need for further work.

Exchangeable, Markov multi-state survival process

Continual Classification Learning Using Generative Models

Toward Robust Neural Networks via Sparsification

Dynamic Graph Neural Networks

Graphs, which describe pairwise relations between objects, are essential representations of many real-world data such as social networks. In recent years, graph neural networks, which extend the neural network models to graph data, have attracted increasing attention. Graph neural networks have been applied to advance many different graph related tasks such as reasoning dynamics of the physical system, graph classification, and node classification. Most of the existing graph neural network models have been designed for static graphs, while many real-world graphs are inherently dynamic. For example, social networks are naturally evolving as new users joining and new relations being created. Current graph neural network models cannot utilize the dynamic information in dynamic graphs. However, the dynamic information has been proven to enhance the performance of many graph analytical tasks such as community detection and link prediction. Hence, it is necessary to design dedicated graph neural networks for dynamic graphs. In this paper, we propose DGNN, a new {\bf D}ynamic {\bf G}raph {\bf N}eural {\bf N}etwork model, which can model the dynamic information as the graph evolving. In particular, the proposed framework can keep updating node information by capturing the sequential information of edges, the time intervals between edges and information propagation coherently. Experimental results on various dynamic graphs demonstrate the effectiveness of the proposed framework.

On the analysis of scheduling algorithms for structured parallel computations

Algorithms for scheduling structured parallel computations have been widely studied in the literature. For some time now, Work Stealing is one of the most popular for scheduling such computations, and its performance has been studied in both theory and practice. Although it delivers provably good performances, the effectiveness of its underlying load balancing strategy is known to be limited for certain classes of computations, particularly the ones exhibiting irregular parallelism (e.g. depth first searches). Many studies have addressed this limitation from a purely load balancing perspective, viewing computations as sets of independent tasks, and then analyzing the expected amount of work attached to each processor as the execution progresses. However, these studies make strong assumptions regarding work generation which, despite being standard from a queuing theory perspective — where work generation can be assumed to follow some random distribution — do not match the reality of structured parallel computations — where the work generation is not random, only depending on the structure of a computation. In this paper, we introduce a formal framework for studying the performance of structured computation schedulers, define a criterion that is appropriate for measuring their performance, and present a methodology for analyzing the performance of randomized schedulers. We demonstrate the convenience of this methodology by using it to prove that the performance of Work Stealing is limited, and to analyze the performance of a Work Stealing and Spreading algorithm, which overcomes Work Stealing’s limitation.

Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search

We present a learning-based approach to computing solutions for certain NP-hard problems. Our approach combines deep learning techniques with useful algorithmic elements from classic heuristics. The central component is a graph convolutional network that is trained to estimate the likelihood, for each vertex in a graph, of whether this vertex is part of the optimal solution. The network is designed and trained to synthesize a diverse set of solutions, which enables rapid exploration of the solution space via tree search. The presented approach is evaluated on four canonical NP-hard problems and five datasets, which include benchmark satisfiability problems and real social network graphs with up to a hundred thousand nodes. Experimental results demonstrate that the presented approach substantially outperforms recent deep learning work, and performs on par with highly optimized state-of-the-art heuristic solvers for some NP-hard problems. Experiments indicate that our approach generalizes across datasets, and scales to graphs that are orders of magnitude larger than those used during training.

K For The Price Of 1: Parameter Efficient Multi-task And Transfer Learning

We introduce a novel method that enables parameter-efficient transfer and multitask learning. The basic approach is to allow a model patch – a small set of parameters – to specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases allows a network to learn a completely different embedding that could be used for different tasks (such as converting an SSD detection model into a 1000-class classification model while reusing 98% of parameters of the feature extractor). Similarly, we show that re-learning the existing low-parameter layers (such as depth-wise convolutions) also improves accuracy significantly. Our approach allows both simultaneous (multi-task) learning as well as sequential transfer learning wherein we adapt pretrained networks to solve new problems. For multi-task learning, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task-based performance.

Learning with Interpretable Structure from RNN

In structure learning, the output is generally a structure that is used as supervision information to achieve good performance. Considering the interpretation of deep learning models has raised extended attention these years, it will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks (RNNs) whose inner mechanism is still not clearly understood. We find that Finite State Automaton (FSA) that processes sequential data has more interpretable inner mechanism and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. We first give the graphical illustration of FSA for human beings to follow, which shows the interpretability. From the FSA’s point of view, we then analyze how the performance of RNNs are affected by the number of gates, as well as the semantic meaning behind the transition of numerical hidden states. Our results suggest that RNNs with simple gated structure such as Minimal Gated Unit (MGU) is more desirable and the transitions in FSA leading to specific classification result are associated with corresponding words which are understandable by human beings.

Attack Graph Convolutional Networks by Adding Fake Nodes

Graph convolutional networks (GCNs) have been widely used for classifying graph nodes in the semi-supervised setting. Previous work have shown that GCNs are vulnerable to the perturbation on adjacency and feature matrices of existing nodes. However, it is unrealistic to change existing nodes in many applications, such as existing users in social networks. In this paper, we design algorithms to attack GCNs by adding fake nodes. A greedy algorithm is proposed to generate adjacency and feature matrices of fake nodes, aiming to minimize the classification accuracy on the existing nodes. In additional, we introduce a discriminator to classify fake nodes from real nodes, and propose a Greedy-GAN attack to simultaneously update the discriminator and the attacker, to make fake nodes indistinguishable to the real ones. Our non-targeted attack decreases the accuracy of GCN down to 0.10, and our targeted attack reaches a success rate of 99% on the whole datasets, and 94% on average for attacking a single target node.

Adversarially Robust Optimization with Gaussian Processes

In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation. This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point. We show that standard GP optimization algorithms do not exhibit the desired robustness properties, and provide a novel confidence-bound based algorithm StableOpt for this purpose. We rigorously establish the required number of samples for StableOpt to find a near-optimal point, and we complement this guarantee with an algorithm-independent lower bound. We experimentally demonstrate several potential applications of interest using real-world data sets, and we show that StableOpt consistently succeeds in finding a stable maximizer where several baseline methods fail.

Perceptual Visual Interactive Learning

Supervised learning methods are widely used in machine learning. However, the lack of labels in existing data limits the application of these technologies. Visual interactive learning (VIL) compared with computers can avoid semantic gap, and solve the labeling problem of small label quantity (SLQ) samples in a groundbreaking way. In order to fully understand the importance of VIL to the interaction process, we re-summarize the interactive learning related algorithms (e.g. clustering, classification, retrieval etc.) from the perspective of VIL. Note that, perception and cognition are two main visual processes of VIL. On this basis, we propose a perceptual visual interactive learning (PVIL) framework, which adopts gestalt principle to design interaction strategy and multi-dimensionality reduction (MDR) to optimize the process of visualization. The advantage of PVIL framework is that it combines computer’s sensitivity of detailed features and human’s overall understanding of global tasks. Experimental results validate that the framework is superior to traditional computer labeling methods (such as label propagation) in both accuracy and efficiency, which achieves significant classification results on dense distribution and sparse classes dataset.

GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks

One of the biggest issues facing the use of machine learning in medical imaging is the lack of availability of large, labelled datasets. The annotation of medical images is not only expensive and time consuming but also highly dependent on the availability of expert observers. The limited amount of training data can inhibit the performance of supervised machine learning algorithms which often need very large quantities of data on which to train to avoid overfitting. So far, much effort has been directed at extracting as much information as possible from what data is available. Generative Adversarial Networks (GANs) offer a novel way to unlock additional information from a dataset by generating synthetic samples with the appearance of real images. This paper demonstrates the feasibility of introducing GAN derived synthetic data to the training datasets in two brain segmentation tasks, leading to improvements in Dice Similarity Coefficient (DSC) of between 1 and 5 percentage points under different conditions, with the strongest effects seen fewer than ten training image stacks are available.

Batch Normalization Sampling

Deep Neural Networks (DNNs) thrive in recent years in which Batch Normalization (BN) plays an indispensable role. However, it has been observed that BN is costly due to the reduction operations. In this paper, we propose alleviating this problem through sampling only a small fraction of data for normalization at each iteration. Specifically, we model it as a statistical sampling problem and identify that by sampling less correlated data, we can largely reduce the requirement of the number of data for statistics estimation in BN, which directly simplifies the reduction operations. Based on this conclusion, we propose two sampling strategies, ‘Batch Sampling’ (randomly select several samples from each batch) and ‘Feature Sampling’ (randomly select a small patch from each feature map of all samples), that take both computational efficiency and sample correlation into consideration. Furthermore, we introduce an extremely simple variant of BN, termed as Virtual Dataset Normalization (VDN), that can normalize the activations well with few synthetical random samples. All the proposed methods are evaluated on various datasets and networks, where an overall training speedup by up to 20% on GPU is practically achieved without the support of any specialized libraries, and the loss on accuracy and convergence rate are negligible. Finally, we extend our work to the ‘micro-batch normalization’ problem and yield comparable performance with existing approaches at the case of tiny batch size.

On a generalized form of subjective probability

This paper is motivated by the questions of how to give the concept of probability an adequate real-world meaning, and how to explain a certain type of phenomenon that can be found, for instance, in Ellsberg’s paradox. It attempts to answer these questions by constructing an alternative theory to one that was proposed in earlier papers on the basis of various important criticisms that were raised against this earlier theory. The conceptual principles of the corresponding definition of probability are laid out and explained in detail. In particular, what is required to fully specify a probability distribution under this definition is not just the distribution function of the variable concerned, but also an assessment of the internal and/or the external strength of this function relative to other distribution functions of interest. This way of defining probability is applied to various examples and problems including, perhaps most notably, to a long-running controversy concerning the distinction between Bayesian and fiducial inference. The characteristics of this definition of probability are carefully evaluated in terms of the issues that it sets out to address.

• SilentPhone: Inferring User Unavailability based Opportune Moments to Minimize Call Interruptions• Critical Neuromorphic Computing based on Explosive Synchronization• Finding the best design parameters for optical nanostructures using reinforcement learning• Towards a compact representation of temporal rasters• Helix modelling through the Mardia-Holmes model framework and an extension of the Mardia-Holmes model• Active Decoupling of Transmit and Receive Coils for Full-Duplex MRI• Planification en temps réel avec agenda de buts et sauts• MGP: Un algorithme de planification temps réel prenant en compte l’évolution dynamique du but• Une architecture cognitive et affective orient{é}e interaction• Une approche totalement instanciée pour la planification HTN• Visual Rendering of Shapes on 2D Display Devices Guided by Hand Gestures• Learning from the Syndrome• A Weak Martingale Approach to Linear-Quadratic McKean-Vlasov Stochastic Control Problems• Complementary Lipschitz continuity results for the distribution of intersections or unions of independent random sets in finite spaces• Segmentation Analysis in Human Centric Cyber-Physical Systems using Graphical Lasso• Fast and accurate object detection in high resolution 4K and 8K video using GPUs• Leave-one-out cross-validation for non-factorizable normal models• Friezes satisfying higher SL$_k$-determinants• A Relaxed Optimization Approach for Cardinality-Constrained Portfolio Optimization• Multimodal Polynomial Fusion for Detecting Driver Distraction• Clinical Concept Extraction with Contextual Word Embedding• Patch-based Interferometric Phase Estimation via Mixture of Gaussian Density Modelling & Non-local Averaging in the Complex Domain• Local calibration of verbal autopsy algorithms• Cops and Robbers on Toroidal Chess Graphs• On the Real Stability Radius of Sparse Systems• Statistical mechanics of bipartite $z$-matchings• New insights on concentration inequalities for self-normalized martingales• Reasoning about Norms Revision• The speaker-independent lipreading play-off; a survey of lipreading machines• Loop corrections in spin models through density consistency• Many-body localization in Landau level subbands• Scheduling computations with provably low synchronization overheads• Using Personal Environmental Comfort Systems to Mitigate the Impact of Occupancy Prediction Errors on HVAC Performance• A Reliability Model for Dependent and Distributed MDS Disk Array Units• Minsum $k$-Sink Problem on Path Networks• Strong laws of large numbers for arrays of random variables and stable random fields• Lower Bounds for Oblivious Data Structures• Compositional Set Invariance in Network Systems with Assume-Guarantee Contracts• Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure• A Multilingual Study of Compressive Cross-Language Text Summarization• Predicting the Semantic Textual Similarity with Siamese CNN and LSTM• Graph isomorphism and Gaussian boson sampling• Multi-level Memory for Task Oriented Dialogs• Stationary distributions of the multi-type ASEP• Meta-modeling game for deriving theoretical-consistent, micro-structural-based traction-separation laws via deep reinforcement learning• Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions• Understand, Compose and Respond – Answering Visual Questions by a Composition of Abstract Procedures• Sports Camera Calibration via Synthetic Data• Near-Optimal Online Knapsack Strategy for Real-Time Bidding in Internet Advertising• Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health• Engaging Image Captioning Via Personality• Coding for Computing Arbitrary Functions Unknown to the Encoder• Truncated Back-propagation for Bilevel Optimization• Structure Learning of Deep Networks via DNA Computing Algorithm• SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization• A New Class of Symmetric Distributions Including the Elliptically Symmetric Logistic• Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian• Alzheimer’s Disease Prediction Using Longitudinal and Heterogeneous Magnetic Resonance Imaging• Beamforming Optimization for Intelligent Reflecting Surface with Discrete Phase Shifts• Generalized Beamspace Modulation Using Multiplexing: A Breakthrough in mmWave MIMO• Convolutional Deblurring for Natural Imaging• Speaker Selective Beamformer with Keyword Mask Estimation• Wearable Affective Robot• Gaussian Message Passing for Overloaded Massive MIMO-NOMA• Word Embedding based Edit Distance• A novel bidding method for combined heat and power units in district heating systems• Low-Power Wide-Area Networks for Sustainable IoT• A two-phase stochastic programming approach to biomass supply planning for combined heat and power plants• Operational planning and bidding for district heating systems with uncertain renewable energy production• Efficient Learning of Restricted Boltzmann Machines Using Covariance estimates• Supervised Classification Methods for Flash X-ray single particle diffraction Imaging• Identifying multi-scale communities in networks by asymptotic surprise• Non-Coherent Sensor Fusion via Entropy Regularized Optimal Mass Transport• The Logoscope: a Semi-Automatic Tool for Detecting and Documenting French New Words• Spanning Tests for Markowitz Stochastic Dominance• Adaptive motor control and learning in a spiking neural network realised on a mixed-signal neuromorphic processor• Tackling Sequence to Sequence Mapping Problems with Neural Networks• Analyzing Visual Mappings of Traditional and Alternative Music Notation• Adaptive Online Learning in Dynamic Environments• HANDS18: Methods, Techniques and Applications for Hand Observation• Compressed Sensing Plus Motion (CS+M): A New Perspective for Improving Undersampled MR Image Reconstruction• Exactly Solving the Maximum Weight Independent Set Problem on Large Real-World Graphs• Use of Magnetoresistive Random-Access Memory as Approximate Memory for Training Neural Networks• Exploiting NOMA for Multi-Beam UAV Communication in Cellular Uplink• An Adversarial Learning Approach to Medical Image Synthesis for Lesion Removal• Training of a Skull-Stripping Neural Network with efficient data augmentation• Structure learning of undirected graphical models for count data• Approximation of trees by self-nested trees• Wireless Map-Reduce Distributed Computing with Full-Duplex Radios and Imperfect CSI• Dynamic Oracles for Top-Down and In-Order Shift-Reduce Constituent Parsing• Fast Exact Bayesian Inference for Sparse Signals in the Normal Sequence Model• Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings• Investigating the Automatic Classification of Algae Using Fusion of Spectral and Morphological Characteristics of Algae via Deep Residual Learning• Linearly Convergent Variable Sample-Size Schemes for Stochastic Nash Games: Best-Response Schemes and Distributed Gradient-Response Schemes• Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs• Adversarial Semantic Scene Completion from a Single Depth Image• A Markov model for inferring flows in directed contact networks• Bayesian Compression for Natural Language Processing• HAR-Net:Fusing Deep Representation and Hand-crafted Features for Human Activity Recognition• A Deamortization Approach for Dynamic Spanner and Dynamic Maximal Matching• Practical Shape Analysis and Segmentation Methods for Point Cloud Models• Evading classifiers in discrete domains with provable optimality guarantees• Maximum Independent Sets in Subcubic Graphs: New Results• Alzheimer’s Disease Diagnosis Based on Cognitive Methods in Virtual Environments and Emotions Analysis• Understanding the Role of Two-Sided Argumentation in Online Consumer Reviews: A Language-Based Perspective• Training Generative Adversarial Networks Via Turing Test• Differential Variable Speed Limits Control for Freeway Recurrent Bottlenecks via Deep Reinforcement learning• A Preliminary Study on Hyperparameter Configuration for Human Activity Recognition• Art Gallery Problem with Rook Vision• Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features• Efficient Load Sampling for Worst-Case Structural Analysis Under Force Location Uncertainty• Improving the condition number of estimated covariance matrices• Nuclear Norm Regularized Estimation of Panel Regression Models• Heterogeneous Treatment Effect Estimation through Deep Learning

Like this:

Like Loading…

Related