Whats new on arXiv

Making Classification Competitive for Deep Metric Learning

Deep metric learning aims to learn a function mapping image pixels to embedding feature vectors that model the similarity between images. The majority of current approaches are non-parametric, learning the metric space directly through the supervision of similar (pairs) or relatively similar (triplets) sets of images. A difficult challenge for training these approaches is mining informative samples of images as the metric space is learned with only the local context present within a single mini-batch. Alternative approaches use parametric metric learning to eliminate the need for sampling through supervision of images to proxies. Although this simplifies optimization, such proxy-based approaches have lagged behind in performance. In this work, we demonstrate that a standard classification network can be transformed into a variant of proxy-based metric learning that is competitive against non-parametric approaches across a wide variety of image retrieval tasks. We address key challenges in proxy-based metric learning such as performance under extreme classification and describe techniques to stabilize and learn higher dimensional embeddings. We evaluate our approach on the CAR-196, CUB-200-2011, Stanford Online Product, and In-Shop datasets for image retrieval and clustering. Finally, we show that our softmax classification approach can learn high-dimensional binary embeddings that achieve new state-of-the-art performance on all datasets evaluated with a memory footprint that is the same or smaller than competing approaches.

Large Datasets, Bias and Model Oriented Optimal Design of Experiments

We review recent literature that proposes to adapt ideas from classical model based optimal design of experiments to problems of data selection of large datasets. Special attention is given to bias reduction and to protection against confounders. Some new results are presented. Theoretical and computational comparisons are made.

Evaluating Bayesian Deep Learning Methods for Semantic Segmentation

Deep learning has been revolutionary for computer vision and semantic segmentation in particular, with Bayesian Deep Learning (BDL) used to obtain uncertainty maps from deep models when predicting semantic classes. This information is critical when using semantic segmentation for autonomous driving for example. Standard semantic segmentation systems have well-established evaluation metrics. However, with BDL’s rising popularity in computer vision we require new metrics to evaluate whether a BDL method produces better uncertainty estimates than another method. In this work we propose three such metrics to evaluate BDL models designed specifically for the task of semantic segmentation. We modify DeepLab-v3+, one of the state-of-the-art deep neural networks, and create its Bayesian counterpart using MC dropout and Concrete dropout as inference techniques. We then compare and test these two inference techniques on the well-known Cityscapes dataset using our suggested metrics. Our results provide new benchmarks for researchers to compare and evaluate their improved uncertainty quantification in pursuit of safer semantic segmentation.

Markov chain Monte Carlo Methods For Lattice Gaussian Sampling:Convergence Analysis and Enhancement

Sampling from lattice Gaussian distribution has emerged as an important problem in coding, decoding and cryptography. In this paper, the classic Gibbs algorithm from Markov chain Monte Carlo (MCMC) methods is demonstrated to be geometrically ergodic for lattice Gaussian sampling, which means the Markov chain arising from it converges exponentially fast to the stationary distribution. Meanwhile, the exponential convergence rate of Markov chain is also derived through the spectral radius of forward operator. Then, a comprehensive analysis regarding to the convergence rate is carried out and two sampling schemes are proposed to further enhance the convergence performance. The first one, referred to as Metropolis-within-Gibbs (MWG) algorithm, improves the convergence by refining the state space of the univariate sampling. On the other hand, the blocked strategy of Gibbs algorithm, which performs the sampling over multivariate at each Markov move, is also shown to yield a better convergence rate than the traditional univariate sampling. In order to perform blocked sampling efficiently, Gibbs-Klein (GK) algorithm is proposed, which samples block by block using Klein’s algorithm. Furthermore, the validity of GK algorithm is demonstrated by showing its ergodicity. Simulation results based on MIMO detections are presented to confirm the convergence gain brought by the proposed Gibbs sampling schemes.

Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks

We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows. * Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world. * We show that NNs are special cases of S-System when the probability kernels assume certain exponential family distributions. Activation Functions are derived formally. We further endow geometry on NNs through information geometry, show that intermediate feature spaces of NNs are stochastic manifolds, and prove that ‘distance’ between samples is contracted as layers stack up. * S-System shows NNs are inherently stochastic, and under a set of realistic boundedness and diversity conditions, it enables us to prove that for large size nonlinear deep NNs with a class of losses, including the hinge loss, all local minima are global minima with zero loss errors, and regions around the minima are flat basins where all eigenvalues of Hessians are concentrated around zero, using tools and ideas from mean field theory, random matrix theory, and nonlinear operator equations. * S-System, the information-geometry structure and the optimization behaviors combined completes the analog between Renormalization Group (RG) and NNs. It shows that a NN is a complex adaptive system that estimates the statistic dependency of microscopic object, e.g., pixels, in multiple scales. Unlike clear-cut physical quantity produced by RG in physics, e.g., temperature, NNs renormalize/recompose manifolds emerging through learning/optimization that divide the sample space into highly semantically meaningful groups that are dictated by supervised labels (in supervised NNs).

Graph-Based Global Reasoning Networks

Globally modeling and reasoning over relations between regions can be beneficial for many computer vision tasks on both images and videos. Convolutional Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed. After reasoning, relation-aware features are distributed back to the original coordinate space for down-stream tasks. We further present a highly efficient instantiation of the proposed approach and introduce the Global Reasoning unit (GloRe unit) that implements the coordinate-interaction space mapping by weighted global pooling and weighted broadcasting, and the relation reasoning via graph convolution on a small graph in interaction space. The proposed GloRe unit is lightweight, end-to-end trainable and can be easily plugged into existing CNNs for a wide range of tasks. Extensive experiments show our GloRe unit can consistently boost the performance of state-of-the-art backbone architectures, including ResNet, ResNeXt, SE-Net and DPN, for both 2D and 3D CNNs, on image classification, semantic segmentation and video action recognition task.

Systematic Generalization: What Is Required and Can It Be Learned?

Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task with little adaptation and (ii) intuitively appealing modular models that require background knowledge to be instantiated. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more end-to-end by learning their layout and parametrization. We find that end-to-end methods from prior work often learn a wrong layout and a spurious parametrization that do not facilitate systematic generalization. Our results suggest that, in addition to modularity, systematic generalization in language understanding may require explicit regularizers or priors.

Flexible and Scalable State Tracking Framework for Goal-Oriented Dialogue Systems

Goal-oriented dialogue systems typically rely on components specifically developed for a single task or domain. This limits such systems in two different ways: If there is an update in the task domain, the dialogue system usually needs to be updated or completely re-trained. It is also harder to extend such dialogue systems to different and multiple domains. The dialogue state tracker in conventional dialogue systems is one such component – it is usually designed to fit a well-defined application domain. For example, it is common for a state variable to be a categorical distribution over a manually-predefined set of entities (Henderson et al., 2013), resulting in an inflexible and hard-to-extend dialogue system. In this paper, we propose a new approach for dialogue state tracking that can generalize well over multiple domains without incorporating any domain-specific knowledge. Under this framework, discrete dialogue state variables are learned independently and the information of a predefined set of possible values for dialogue state variables is not required. Furthermore, it enables adding arbitrary dialogue context as features and allows for multiple values to be associated with a single state variable. These characteristics make it much easier to expand the dialogue state space. We evaluate our framework using the widely used dialogue state tracking challenge data set (DSTC2) and show that our framework yields competitive results with other state-of-the-art results despite incorporating little domain knowledge. We also show that this framework can benefit from widely available external resources such as pre-trained word embeddings.

Recurrent machines for likelihood-free inference

Likelihood-free inference is concerned with the estimation of the parameters of a non-differentiable stochastic simulator that best reproduce real observations. In the absence of a likelihood function, most of the existing inference methods optimize the simulator parameters through a handcrafted iterative procedure that tries to make the simulated data more similar to the observations. In this work, we explore whether meta-learning can be used in the likelihood-free context, for learning automatically from data an iterative optimization procedure that would solve likelihood-free inference problems. We design a recurrent inference machine that learns a sequence of parameter updates leading to good parameter estimates, without ever specifying some explicit notion of divergence between the simulated data and the real data distributions. We demonstrate our approach on toy simulators, showing promising results both in terms of performance and robustness.

• Transferable Adversarial Attacks for Image and Video Object Detection• AI Neurotechnology for Aging Societies — Task-load and Dementia EEG Digital Biomarker Development Using Information Geometry Machine Learning Methods• Robust Learning-Based ML Detection for Massive MIMO Systems with One-Bit Quantized Signals• Frozen $(Δ+1)$-colourings of bounded degree graphs• Prior-free Data Acquisition for Accurate Statistical Estimation• Optimal Algorithms for Scheduling under Time-of-Use Tariffs• Alternative Characterizations of Fitch’s Xenology Relation• Stabilization of a linearized Cahn-Hilliard system for phase separation by proportional boundary feedbacks• FSNet: An Identity-Aware Generative Model for Image-based Face Swapping• Improved Crowding Distance for NSGA-II• Instance-level Facial Attributes Transfer with Geometry-Aware Flow• The Hall–Paige conjecture, and synchronization for affine and diagonal groups• ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples• Void Filling of Digital Elevation Models with Deep Generative Models• An Efficient Image Retrieval Based on Fusion of Low-Level Visual Features• Fixed points of metrically nonspreading mappings in Hadamard spaces• Stabilizability in optimal control• Style Decomposition for Improved Neural Style Transfer• The Approach to Managing Provenance Metadata and Data Access Rights in Distributed Storage Using the Hyperledger Blockchain Platform• LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks• Non-coercive first order Mean Field Games• Security Code Smells in Android ICC• Measuring precise radial velocities and cross-correlation function line-profile variations using a Skew Normal density• Wavelet variance scale-dependence as a dynamics discriminating tool in high-frequency urban wind speed time series• Document Structure Measure for Hypernym discovery• Neural separation of observed and unobserved distributions• Towards Secure and Efficient Payment Channels• Dynamic Load Balancing Techniques for Particulate Flow Simulations• Improving Landmark Recognition using Saliency detection and Feature classification• Domain-Invariant Adversarial Learning for Unsupervised Domain Adaption• Practical methods for graph two-sample testing• Projection Convolutional Neural Networks for 1-bit CNNs via Discrete Back Propagation• Optimal lower bounds on hitting probabilities for stochastic heat equations in spatial dimension $k \geq 1$• Non-Local Video Denoising by CNN• A Decentralized Event-Based Approach for Robust Model Predictive Control• Arbitrary many walkers meet infinitely often in a subballistic random environment• Model-blind Video Denoising Via Frame-to-frame Training• From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts• Cross-database non-frontal facial expression recognition based on transductive deep transfer learning• Cost-sensitive Learning of Deep Semantic Models for Sponsored Ad Retrieval• Faster Attractor-Based Indexes• A Framework for Fast and Efficient Neural Network Compression• The GAN that Warped: Semantic Attribute Editing with Unpaired Data• TextMountain: Accurate Scene Text Detection via Instance Segmentation• A Tutorial for Weighted Bipolar Argumentation with Continuous Dynamical Systems and the Java Library Attractor• Optimal Uncertainty Quantification on moment class using canonical moments• iW-Net: an automatic and minimalistic interactive lung nodule segmentation deep network• TIFTI: A Framework for Extracting Drug Intervals from Longitudinal Clinic Notes• Structure and Motion from Multiframes• On the maximal number of real embeddings of minimally rigid graphs in $\mathbb{R}^2$, $\mathbb{R}^3$ and $S^2$• Generative Models for Simulating Mobility Trajectories• Localization from Incomplete Euclidean Distance Matrix: Performance Analysis for the SVD-MDS Approach• Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices• Millimeter Wave Receiver Comparison Under Energy vs Spectral Efficiency Trade-off• Dislocation lines in three-dimensional solids at low temperature• Real Time Bangladeshi Sign Language Detection using Faster R-CNN• Practical Full Resolution Learned Lossless Image Compression• Runtime Analysis for Self-adaptive Mutation Rates• Beltrami-Net: Domain Independent Deep D-bar Learning for Absolute Imaging with Electrical Impedance Tomography (a-EIT)• ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation• Quantum spins and random loops on the complete graph• Lipizzaner: A System That Scales Robust Generative Adversarial Network Training• Optimal designs for $K$-factor two-level models with first-order interactions on a symmetrically restricted design region• Enumerating coloured partitions in 2 and 3 dimensions• Super-Resolution based on Image-Adapted CNN Denoisers: Incorporating Generalization of Training Data and Internal Learning in Test Time• Efficient allocation of law enforcement resources using predictive police patrolling• Spatial analysis between particulate matter and emergency room visits for conjunctivitis and keratitis• On splitting and splittable families• Restricted $r$-Stirling Numbers and their Combinatorial Applications• Detecting Offensive Content in Open-domain Conversations using Two Stage Semi-supervision• Optimized Portfolio Contracts for Bidding the Cloud• Asymptotically Optimal Multi-Armed Bandit Activation Policies under Side Constraints• Automated Tactical Decision Planning Model with Strategic Values Guidance for Local Action-Value-Ambiguity• Joint Information Freshness and Completion Time Optimization for Vehicular Networks• Online abstraction with MDP homomorphisms for Deep Learning• Computational Bounds For Photonic Inverse Design• Advance Prediction of Ventricular Tachyarrhythmias using Patient Metadata and Multi-Task Networks• On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent

Like this:

Like Loading…

Related