Whats new on arXiv

Neural Mesh: Introducing a Notion of Space and Conservation of Energy to Neural Networks

Neural networks are based on a simplified model of the brain. In this project, we wanted to relax the simplifying assumptions of a traditional neural network by making a model that more closely emulates the low level interactions of neurons. Like in an RNN, our model has a state that persists between time steps, so that the energies of neurons persist. However, unlike an RNN, our state consists of a 2 dimensional matrix, rather than a 1 dimensional vector, thereby introducing a concept of distance to other neurons within the state. In our model, neurons can only fire to adjacent neurons, as in the brain. Like in the brain, we only allow neurons to fire in a time step if they contain enough energy, or excitement. We also enforce a notion of conservation of energy, so that a neuron cannot excite its neighbors more than the excitement it already contained at that time step. Taken together, these two features allow signals in the form of activations to flow around in our network over time, making our neural mesh more closely model signals traveling through the brain the brain. Although our main goal is to design an architecture to more closely emulate the brain in the hope of having a correct internal representation of information by the time we know how to properly train a general intelligence, we did benchmark our neural mash on a specific task. We found that by increasing the runtime of the mesh, we were able to increase its accuracy without increasing the number of parameters.

Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems

This proposal introduces a Dialogue Challenge for building end-to-end task-completion dialogue systems, with the goal of encouraging the dialogue research community to collaborate and benchmark on standard datasets and unified experimental environment. In this special session, we will release human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. The final submitted systems will be evaluated both in simulated setting and by human judges.

Causal Modeling with Probabilistic Simulation Models

Recent authors have proposed analyzing conditional reasoning through a notion of intervention on a simulation program, and have found a sound and complete axiomatization of the logic of conditionals in this setting. Here we extend this setting to the case of probabilistic simulation models. We give a natural definition of probability on formulas of the conditional language, allowing for the expression of counterfactuals, and prove foundational results about this definition. We also find an axiomatization for reasoning about linear inequalities involving probabilities in this setting. We prove soundness, completeness, and NP-completeness of the satisfiability problem for this logic.

KB4Rec: A Dataset for Linking Knowledge Bases with Recommender Systems

To develop a knowledge-aware recommender system, a key data problem is how we can obtain rich and structured knowledge information for recommender system (RS) items. Existing datasets or methods either use side information from original recommender systems (containing very few kinds of useful information) or utilize private knowledge base (KB). In this paper, we present the first public linked KB dataset for recommender systems, named KB4Rec v1.0, which has linked three widely used RS datasets with the popular KB Freebase. Based on our linked dataset, we first preform some interesting qualitative analysis experiments, in which we discuss the effect of two important factors (i.e. popularity and recency) on whether a RS item can be linked to a KB entity. Finally, we present the comparison of several knowledge-aware recommendation algorithms on our linked dataset.

Robust Student Network Learning

Deep neural networks bring in impressive accuracy in various applications, but the success often relies on the heavy network architecture. Taking well-trained heavy networks as teachers, classical teacher-student learning paradigm aims to learn a student network that is lightweight yet accurate. In this way, a portable student network with significantly fewer parameters can achieve a considerable accuracy which is comparable to that of teacher network. However, beyond accuracy, robustness of the learned student network against perturbation is also essential for practical uses. Existing teacher-student learning frameworks mainly focus on accuracy and compression ratios, but ignore the robustness. In this paper, we make the student network produce more confident predictions with the help of the teacher network, and analyze the lower bound of the perturbation that will destroy the confidence of the student network. Two important objectives regarding prediction scores and gradients of examples are developed to maximize this lower bound, so as to enhance the robustness of the student network without sacrificing the performance. Experiments on benchmark datasets demonstrate the efficiency of the proposed approach to learn robust student networks which have satisfying accuracy and compact sizes.

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical \emph{guidelines} for efficient network design. Accordingly, a new architecture is presented, called \emph{ShuffleNet V2}. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.

Active Learning for Interactive Neural Machine Translation of Data Streams

We study the application of active learning techniques to the translation of unbounded data streams via interactive neural machine translation. The main idea is to select, from an unbounded stream of source sentences, those worth to be supervised by a human agent. The user will interactively translate those samples. Once validated, these data is useful for adapting the neural machine translation model. We propose two novel methods for selecting the samples to be validated. We exploit the information from the attention mechanism of a neural machine translation system. Our experiments show that the inclusion of active learning techniques into this pipeline allows to reduce the effort required during the process, while increasing the quality of the translation system. Moreover, it enables to balance the human effort required for achieving a certain translation quality. Moreover, our neural system outperforms classical approaches by a large margin.

Preference-based Online Learning with Dueling Bandits: A Survey

In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. In the standard setting, the agent learns from stochastic feedback in the form of real-valued rewards. In many applications, however, numerical reward signals are not readily available — instead, only weaker information is provided, in particular relative preferences in the form of qualitative comparisons between pairs of alternatives. This observation has motivated the study of variants of the multi-armed bandit problem, in which more general representations are used both for the type of feedback to learn from and the target of prediction. The aim of this paper is to provide a survey of the state of the art in this field, referred to as preference-based multi-armed bandits or dueling bandits. To this end, we provide an overview of problems that have been considered in the literature as well as methods for tackling them. Our taxonomy is mainly based on the assumptions made by these methods about the data-generating process and, related to this, the properties of the preference-based feedback.

HybridNet: Classification and Reconstruction Cooperation for Semi-Supervised Learning

In this paper, we introduce a new model for leveraging unlabeled data to improve generalization performances of image classifiers: a two-branch encoder-decoder architecture called HybridNet. The first branch receives supervision signal and is dedicated to the extraction of invariant class-related representations. The second branch is fully unsupervised and dedicated to model information discarded by the first branch to reconstruct input data. To further support the expected behavior of our model, we propose an original training objective. It favors stability in the discriminative branch and complementarity between the learned representations in the two branches. HybridNet is able to outperform state-of-the-art results on CIFAR-10, SVHN and STL-10 in various semi-supervised settings. In addition, visualizations and ablation studies validate our contributions and the behavior of the model on both CIFAR-10 and STL-10 datasets.

Local Linear Forests

Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure, local linear forests, enables us to improve on asymptotic rates of convergence for random forests with smooth signals, and provides substantial gains in accuracy on both real and simulated data.

Comparator Networks

The objective of this work is set-based verification, e.g. to decide if two sets of images of a face are of the same person or not. The traditional approach to this problem is to learn to generate a feature vector per image, aggregate them into one vector to represent the set, and then compute the cosine similarity between sets. Instead, we design a neural network architecture that can directly learn set-wise verification. Our contributions are: (i) We propose a Deep Comparator Network (DCN) that can ingest a pair of sets (each may contain a variable number of images) as inputs, and compute a similarity between the pair–this involves attending to multiple discriminative local regions (landmarks), and comparing local descriptors between pairs of faces; (ii) To encourage high-quality representations for each set, internal competition is introduced for recalibration based on the landmark score; (iii) Inspired by image retrieval, a novel hard sample mining regime is proposed to control the sampling process, such that the DCN is complementary to the standard image classification models. Evaluations on the IARPA Janus face recognition benchmarks show that the comparator networks outperform the previous state-of-the-art results by a large margin.

Improving Transferability of Deep Neural Networks

Learning from small amounts of labeled data is a challenge in the area of deep learning. This is currently addressed by Transfer Learning where one learns the small data set as a transfer task from a larger source dataset. Transfer Learning can deliver higher accuracy if the hyperparameters and source dataset are chosen well. One of the important parameters is the learning rate for the layers of the neural network. We show through experiments on the ImageNet22k and Oxford Flowers datasets that improvements in accuracy in range of 127% can be obtained by proper choice of learning rates. We also show that the images/label parameter for a dataset can potentially be used to determine optimal learning rates for the layers to get the best overall accuracy. We additionally validate this method on a sample of real-world image classification tasks from a public visual recognition API.

• Reality-aware Sybil-Resilient Voting• Excess Versions of the Minkowski and Hölder Inequalities• Waiter-Client Maximum Degree Game• While Tuning is Good, No Tuner is Best• Reinforced Auto-Zoom Net: Towards Accurate and Fast Breast Cancer Segmentation in Whole-slide Images• Percolation for level-sets of Gaussian free fields on metric graphs• Story Understanding in Video Advertisements• Geo-Supervised Visual Depth Prediction• A Hybrid Quantum-Classical Paradigm to Mitigate Embedding Costs in Quantum Annealing—Abridged Version• An Atemporal Model of Physical Complexity• ARM: Augment-REINFORCE-Merge Gradient for Discrete Latent Variable Models• Occluded Joints Recovery in 3D Human Pose Estimation based on Distance Matrix• To Ship or Not to (Function) Ship (Extended version)• Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration• Pose Guided Human Video Generation• Transformationally Identical and Invariant Convolutional Neural Networks by Combining Symmetric Operations or Input Vectors• Binding Number, Toughness and General Matching Extendability in Graphs• Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network• A Group-Theoretic Approach to Abstraction: Hierarchical, Interpretable, and Task-Free Clustering• Online Learning with an Almost Perfect Expert• Dividend and Capital Injection Optimization with Transaction Cost for Spectrally Negative Lévy Risk Processes• Leveraging Medical Sentiment to Understand Patients Health on Social Media• Active Object Perceiver: Recognition-guided Policy Learning for Object Searching on Mobile Robots• Human Motion Analysis with Deep Metric Learning• A reconstruction of Florida Traffic Flow During Hurricane Irma (2017)• Deep Group-shuffling Random Walk for Person Re-identification• Graphs of Vectorial Plateaued Functions as Difference Sets• End-to-End Deep Kronecker-Product Matching for Person Re-identification• Characterisation and classification of signatures of spanning trees of the $n$-cube• Andrews-Gordon Type Series for Capparelli’s and Göllnitz-Gordon Identities• Distributed Stochastic Optimization in Networks with Low Informational Exchange• Multi-Fiber Networks for Video Recognition• Structured two-point stepsize gradient methods for nonlinear least squares• Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes• Hard-Aware Point-to-Set Deep Metric for Person Re-identification• Eulerian summation operators and a remarkable family of polynomials• The Turán number of Berge-K_4 in triple systems• Persistence Atlas for Critical Point Variability in Ensembles• CAKE: Compact and Accurate K-dimensional representation of Emotion• Reward Sharing Schemes for Stake Pools• Training Neural Machine Translation using Word Embedding-based Loss• Joint Estimation of Model and Observation Error Covariance Matrices in Data Assimilation: a Review• Delay controls chimera relay synchronization in multiplex networks• Sharp upper and lower bounds for the spectral radius of a nonnegative weakly irreducible tensor and its applications• Deep Hybrid Real and Synthetic Training for Intrinsic Decomposition• YouTube AV 50K: an Annotated Corpus for Comments in Autonomous Vehicles• Predicting Conversion of Mild Cognitive Impairments to Alzheimer’s Disease and Exploring Impact of Neuroimaging• An Approximate Newton Smoothing Method for Shape Optimization• On soft capacities, quasi-stationary distributions and the pathwise approach to metastability• Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder• Semantic Labeling in Very High Resolution Images via a Self-Cascaded Convolutional Neural Network• Mechanomyography based closed-loop Functional Electrical Stimulation cycling system• Recurrently Exploring Class-wise Attention in A Hybrid Convolutional and Bidirectional LSTM Network for Multi-label Aerial Image Classification• Comparison of Production Serverless Function Orchestration Systems• Modular Sensor Fusion for Semantic Segmentation• Fast Analog Transmission for High-Mobility Wireless Data Acquisition in Edge Learning• Extreme Network Compression via Filter Group Approximation• Cumulative distribution functions for the five simplest natural exponential families• Uncertainty Quantification in CNN-Based Surface Prediction Using Shape Priors• Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces• Graphene: Semantically-Linked Propositions in Open Information Extraction• Self-Calibration of Cameras with Euclidean Image Plane in Case of Two Views and Known Relative Rotation Angle• Unsupervised Domain Adaptation by Adversarial Learning for Robust Speech Recognition• Sparse Bayesian Imaging of Solar Flares• Model predictive control of linear systems with preview information: feasibility, stability and inherent robustness• On the Most Informative Boolean Functions of the Very Noisy Channel• Variational Inequalities Governed By Merely Continuous and Strongly Pseudomonotone Operators• Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning• Relative kinematics of an anchorless network• Progress on misère dead ends: game comparison, canonical form, and conjugate inverses• Self-dual, self-Petrie-dual and Möbius regular maps on linear fractional groups• Robust Calibration of Radio Interferometers in Multi-Frequency Scenario• Regularization of inverse problems via box constrained minimization• Restricted Local Differential Privacy for Distribution Estimation with High Data Utility• Kernel Density Estimation-Based Markov Models with Hidden State• Guidesort: Simpler Optimal Deterministic Sorting for the Parallel Disk Model• Concentration of scalar ergodic diffusions and some statistical implications• Action Detection from a Robot-Car Perspective• Unsupervised Domain Adaptive Re-Identification: Theory and Practice• Vertex Covers Revisited: Indirect Certificates and FPT Algorithms• Dropout-GAN: Learning from a Dynamic Ensemble of Discriminators• Connectivity of some Algebraically Defined Digraphs• Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking• Slow manifold analysis of accelerated gradient methods• The saturation number of carbon nanocones and nanotubes• High-dimensional scaling limits of piecewise deterministic sampling algorithms• Baseline wander removal methods for ECG signals: A comparative study• Diameter of Some Monomial Digraphs• A Note on the Isomorphism Problem for Monomial Digraphs• Fairly Allocating Many Goods with Few Queries• Small Organ Segmentation in Whole-body MRI using a Two-stage FCN and Weighting Schemes• Pluripotential Theory and Convex Bodies: Large Deviation Principle• Bayesian Calibration using Different Prior Distributions: an Iterative Maximum A Posteriori Approach for Radio Interferometers• On the number of biased graphs• Security Solutions for Local Wireless Networks in Control Applications based on Physical Layer Security• The sine process under the influence of a varying potential• Multi-bin Trainable Linear Unit for Fast Image Restoration Networks• Making Classifier Chains Resilient to Class Imbalance• Disorder and denaturation transition in the generalized Poland-Scheraga model• The Kuramoto model on directed and signed graphs• A Non-structural Representation Scheme for Articulated Shapes• Almost p-ary Sequences• Faster Convergence & Generalization in DNNs• Variational solutions of stochastic partial differential equations with cylindrical Lévy noise• High-dimensional estimation via sum-of-squares proofs• Kalman Filter-based Heuristic Ensemble: A New Perspective on Ensemble Classification Using Kalman Filters• REFUGE CHALLENGE 2018-Task 2:Deep Optic Disc and Cup Segmentation in Fundus Images Using U-Net and Multi-scale Feature Matching Networks• Leveraging Motion Priors in Videos for Improving Human Segmentation• Gaussian density estimates for solutions of fully coupled forward-backward SDEs• Factor analysis of dynamic PET images: beyond Gaussian noise• Norms, Institutions, and Robots• To learn image super-resolution, use a GAN to learn how to do image degradation first• Non-monotone Submodular Maximization in Exponentially Fewer Iterations• Edge Coloring Signed Graphs• Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis

Like this:

Like Loading…

Related