GaterNet: Dynamic Filter Selection in Convolutional Neural Network via a Dedicated Global Gating Network
The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing. In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks (CNNs). The problem is interesting because the idea of forcing different parts of the model to learn from different types of samples may help us acquire better filters in CNNs, improve the model generalization performance and potentially increase the interpretability of model behavior. We propose a novel yet simple framework called GaterNet, which involves a backbone and a gater network. The backbone network is a regular CNN that performs the major computation needed for making a prediction, while a global gater network is introduced to generate binary gates for selectively activating filters in the backbone network based on each input. Extensive experiments on CIFAR and ImageNet datasets show that our models consistently outperform the original models with a large margin. On CIFAR-10, our model also improves upon state-of-the-art results.
Partitioned Variational Inference: A unified framework encompassing federated and continual learning
Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. However, practitioners are faced with a fragmented literature that offers a bewildering array of algorithmic options. First, the variational family. Second, the granularity of the updates e.g. whether the updates are local to each data point and employ message passing or global. Third, the method of optimization (bespoke or blackbox, closed-form or stochastic updates, etc.). This paper presents a new framework, termed Partitioned Variational Inference (PVI), that explicitly acknowledges these algorithmic dimensions of VI, unifies disparate literature, and provides guidance on usage. Crucially, the proposed PVI framework allows us to identify new ways of performing VI that are ideally suited to challenging learning scenarios including federated learning (where distributed computing is leveraged to process non-centralized data) and continual learning (where new data and tasks arrive over time and must be accommodated quickly). We showcase these new capabilities by developing communication-efficient federated training of Bayesian neural networks and continual learning for Gaussian process models with private pseudo-points. The new methods significantly outperform the state-of-the-art, whilst being almost as straightforward to implement as standard VI.
DLHub: Model and Data Serving for Science
While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the ‘learning systems’ needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities with a focus on science applications. DLHub addresses two significant shortcomings in current systems. First, its selfservice model repository allows users to share, publish, verify, reproduce, and reuse models, and addresses concerns related to model reproducibility by packaging and distributing models and all constituent components. Second, it implements scalable and low-latency serving capabilities that can leverage parallel and distributed computing resources to democratize access to published models through a simple web interface. Unlike other model serving frameworks, DLHub can store and serve any Python 3-compatible model or processing function, plus multiple-function pipelines. We show that relative to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed. We also describe early uses of DLHub for scientific applications.
ShelfNet for Real-time Semantic Segmentation
Synthesizing Tabular Data using Generative Adversarial Networks
Generative adversarial networks (GANs) implicitly learn the probability distribution of a dataset and can draw samples from the distribution. This paper presents, Tabular GAN (TGAN), a generative adversarial network which can generate tabular data like medical or educational records. Using the power of deep neural networks, TGAN generates high-quality and fully synthetic tables while simultaneously generating discrete and continuous variables. When we evaluate our model on three datasets, we find that TGAN outperforms conventional statistical generative models in both capturing the correlation between columns and scaling up for large datasets.
Generalizing semi-supervised generative adversarial networks to regression
In this work, we generalize semi-supervised generative adversarial networks (GANs) from classification problems to regression problems. In the last few years, the importance of improving the training of neural networks using semi-supervised training has been demonstrated for classification problems. With probabilistic classification being a subset of regression problems, this generalization opens up many new possibilities for the use of semi-supervised GANs as well as presenting an avenue for a deeper understanding of how they function. We first demonstrate the capabilities of semi-supervised regression GANs on a toy dataset which allows for a detailed understanding of how they operate in various circumstances. This toy dataset is used to provide a theoretical basis of the semi-supervised regression GAN. We then apply the semi-supervised regression GANs to the real-world application of age estimation from single images. We perform extensive tests of what accuracies can be achieved with significantly reduced annotated data. Through the combination of the theoretical example and real-world scenario, we demonstrate how semi-supervised GANs can be generalized to regression problems.
Lagged correlation-based deep learning for directional trend change prediction in financial time series
Trend change prediction in complex systems with a large number of noisy time series is a problem with many applications for real-world phenomena, with stock markets as a notoriously difficult to predict example of such systems. We approach predictions of directional trend changes via complex lagged correlations between them, excluding any information about the target series from the respective inputs to achieve predictions purely based on such correlations with other series. We propose the use of deep neural networks that employ step-wise linear regressions with exponential smoothing in the preparatory feature engineering for this task, with regression slopes as trend strength indicators for a given time interval. We apply this method to historical stock market data from 2011 to 2016 as a use case example of lagged correlations between large numbers of time series that are heavily influenced by externally arising new information as a random factor. The results demonstrate the viability of the proposed approach, with state-of-the-art accuracies and accounting for the statistical significance of the results for additional validation, as well as important implications for modern financial economics.
Questioning the assumptions behind fairness solutions
In addition to their benefits, optimization systems can have negative economic, moral, social, and political effects on populations as well as their environments. Frameworks like fairness have been proposed to aid service providers in addressing subsequent bias and discrimination during data collection and algorithm design. However, recent reports of neglect, unresponsiveness, and malevolence cast doubt on whether service providers can effectively implement fairness solutions. These reports invite us to revisit assumptions made about the service providers in fairness solutions. Namely, that service providers have (i) the incentives or (ii) the means to mitigate optimization externalities. Moreover, the environmental impact of these systems suggests that we need (iii) novel frameworks that consider systems other than algorithmic decision-making and recommender systems, and (iv) solutions that go beyond removing related algorithmic biases. Going forward, we propose Protective Optimization Technologies that enable optimization subjects to defend against negative consequences of optimization systems.
Effective Ways to Build and Evaluate Individual Survival Distributions
An accurate model of a patient’s individual survival distribution can help determine the appropriate treatment for terminal patients. Unfortunately, risk scores (e.g., from Cox Proportional Hazard models) do not provide survival probabilities, single-time probability models (e.g., the Gail model, predicting 5 year probability) only provide for a single time point, and standard Kaplan-Meier survival curves provide only population averages for a large class of patients meaning they are not specific to individual patients. This motivates an alternative class of tools that can learn a model which provides an individual survival distribution which gives survival probabilities across all times – such as extensions to the Cox model, Accelerated Failure Time, an extension to Random Survival Forests, and Multi-Task Logistic Regression. This paper first motivates such ‘individual survival distribution’ (ISD) models, and explains how they differ from standard models. It then discusses ways to evaluate such models – namely Concordance, 1-Calibration, Brier score, and various versions of L1-loss – and then motivates and defines a novel approach ‘D-Calibration’, which determines whether a model’s probability estimates are meaningful. We also discuss how these measures differ, and use them to evaluate several ISD prediction tools, over a range of survival datasets.
Metropolis-Hastings Generative Adversarial Networks
We introduce the Metropolis-Hastings generative adversarial network (MH-GAN), which combines aspects of Markov chain Monte Carlo and GANs. The MH-GAN draws samples from the distribution implicitly defined by a GAN’s discriminator-generator pair, as opposed to sampling in a standard GAN which draws samples from the distribution defined by the generator. It uses the discriminator from GAN training to build a wrapper around the generator for improved sampling. With a perfect discriminator, this wrapped generator samples from the true distribution on the data exactly even when the generator is imperfect. We demonstrate the benefits of the improved generator on multiple benchmark datasets, including CIFAR-10 and CelebA, using DCGAN and WGAN.
FADL:Federated-Autonomous Deep Learning for Distributed Electronic Health Record
Electronic health record (EHR) data is collected by individual institutions and often stored across locations in silos. Getting access to these data is difficult and slow due to security, privacy, regulatory, and operational issues. We show, using ICU data from 58 different hospitals, that machine learning models to predict patient mortality can be trained efficiently without moving health data out of their silos using a distributed machine learning strategy. We propose a new method, called Federated-Autonomous Deep Learning (FADL) that trains part of the model using all data sources in a distributed manner and other parts using data from specific data sources. We observed that FADL outperforms traditional federated learning strategy and conclude that balance between global and local training is an important factor to consider when design distributed machine learning methods , especially in healthcare.
Deep Collective Matrix Factorization for Augmented Multi-View Learning
Learning by integrating multiple heterogeneous data sources is a common requirement in many tasks. Collective Matrix Factorization (CMF) is a technique to learn shared latent representations from arbitrary collections of matrices. It can be used to simultaneously complete one or more matrices, for predicting the unknown entries. Classical CMF methods assume linearity in the interaction of latent factors which can be restrictive and fails to capture complex non-linear interactions. In this paper, we develop the first deep-learning based method, called dCMF, for unsupervised learning of multiple shared representations, that can model such non-linear interactions, from an arbitrary collection of matrices. We address optimization challenges that arise due to dependencies between shared representations through Multi-Task Bayesian Optimization and design an acquisition function adapted for collective learning of hyperparameters. Our experiments show that dCMF significantly outperforms previous CMF algorithms in integrating heterogeneous data for predictive modeling. Further, on two tasks – recommendation and prediction of gene-disease association – dCMF outperforms state-of-the-art matrix completion algorithms that can utilize auxiliary sources of information.
• $HS^2$: Active Learning over Hypergraphs• Interlacing Personal and Reference Genomes for Machine Learning Disease-Variant Detection• A Residual Bootstrap for Conditional Expected Shortfall• Distributed Impedance Control of Latency-Prone Robotic Systems with Series Elastic Actuation• Internal observability of the wave equation in tiled domains• Document classification using a Bi-LSTM to unclog Brazil’s supreme court• Continuous Trade-off Optimization between Fast and Accurate Deep Face Detectors• Scan2CAD: Learning CAD Model Alignment in RGB-D Scans• Semantically-aware population health risk analyses• Node Diversification in Complex Networks by Decentralized Coloring• Iterative Transformer Network for 3D Point Cloud• Calibrating Uncertainties in Object Localization Task• Self-Supervised Generative Adversarial Networks• Understanding the impact of entropy in policy learning• Undermining User Privacy on Mobile Devices Using AI• A combinatorial proof of a partition identity of Andrews and Stanley• Grammars and reinforcement learning for molecule optimization• Partial Difference Sets in $C_{2^n} \times C_{2^n}$• CT organ segmentation using GPU data augmentation, unsupervised labels and IOU loss• Distributed traffic light control at uncoupled intersections with real-world topology by deep reinforcement learning• Further results on the inducibility of $d$-ary trees• A Note on Random Sampling for Matrix Multiplication• A Compositional Textual Model for Recognition of Imperfect Word Images• Wrangling Messy CSV Files by Detecting Row and Type Patterns• eXclusive Autoencoder (XAE) for Nucleus Detection and Classification on Hematoxylin and Eosin (H&E) Stained Histopathological Images• Generic constructions of 5-valued spectra Boolean functions• Distributed Variable Sample-Size Gradient-response and Best-response Schemes for Stochastic Nash Games over Graphs• Multiview Supervision By Registration• Probabilistic properties of detrended fluctuation analysis for Gaussian processes• Large deviations of time-averaged statistics for Gaussian processes• Fitness Estimation in Models of Genetic Evolution of Asexual Populations• Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning• Clustering Player Strategies from Variable-Length Game Logs in Dominion• Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving• Improved upper bound on root number of linearized polynomials and its application to nonlinearity estimation of Boolean functions• A Compact Embedding for Facial Expression Similarity• On identities of the Rogers–Ramanujan type• Patch-based Progressive 3D Point Set Upsampling• Intra-class Variation Isolation in Conditional GANs• Prioritizing Starting States for Reinforcement Learning• Calculating CVaR and bPOE for Common Probability Distributions With Application to Portfolio Optimization and Density Estimation• A QR Decomposition Approach to Factor Modelling: A Thesis Report• Capacity Upper Bounds for the Relay Channel via Reverse Hypercontractivity• Universal Adversarial Training• Improved Speech Enhancement with the Wave-U-Net• Using Attribution to Decode Dataset Bias in Neural Network Models for Chemistry• Target Driven Visual Navigation with Hybrid Asynchronous Universal Successor Representations• Skin lesion segmentation using U-Net and good training strategies• Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection• Non-Hermitian Many-Body Localization• Higher-Order Clustering in Heterogeneous Information Networks• Incorporating Equity into the School Bus Scheduling Problem• Image Labeling with Markov Random Fields and Conditional Random Fields• CyLKs: Unsupervised Cycle Lucas-Kanade Network for Landmark Tracking• Deep Reinforcement Learning for Autonomous Driving• Asynchronous Local Construction of Bounded-Degree Network Topologies Using Only Neighborhood Information• Characterizing Shortest Paths in Road Systems Modeled as Manhattan Poisson Line Processes• Statistical Robust Chinese Remainder Theorem for Multiple Numbers: Wrapped Gaussian Mixture Model• Finding a Nonnegative Solution to an M-Tensor Equation• Constructions of involutions over finite fields• Two-Dimensional (2D) Particle Swarms for Structure Selection of Nonlinear Systems• A modified Riccati approach to analytic interpolation with applications to system identification and robust control• Multi-label classification search space in the MEKA software• Robust neural circuit reconstruction from serial electron microscopy with convolutional recurrent networks• Future Segmentation Using 3D Structure• Unsupervised Control Through Non-Parametric Discriminative Rewards• Stochastic orders for convolution of heterogeneous gamma and negative binomial random variables• Unsupervised Multi-modal Neural Machine Translation• First-order Newton-type Estimator for Distributed Estimation and Inference• Data Detection in Single User Massive MIMO Using Re-Transmissions• Formal Verification of CNN-based Perception Systems• A Deep Cascade Model for Multi-Document Reading Comprehension• Instance-level Sketch-based Retrieval by Deep Triplet Classification Siamese Network• A Doubly Accelerated Inexact Proximal Point Method for Nonconvex Composite Optimization Problems• Option Pricing in a Regime Switching Jump Diffusion Model• Linear convergence of a dual optimization formulation for distributed optimization on directed graphs with unreliable communications• Self-supervised Spatiotemporal Feature Learning by Video Geometric Transformations• Image Generation from Layout• Composable Probabilistic Inference Networks Using MRAM-based Stochastic Neurons• Spin-glass model for the C-dismantling problem• DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds• Adversarial Machine Learning And Speech Emotion Recognition: Utilizing Generative Adversarial Networks For Robustness• Spectral Feature Transformation for Person Re-identification• Extremizing the number of connected subgraphs of graphs• Alea Iacta Est: Auctions, Persuasion, Interim Rules, and Dice• Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals• On circumcenter mappings induced by nonexpansive operators• Exploiting ‘Quantum-like Interference’ in Decision Fusion for Ranking Multimodal Documents• MeshNet: Mesh Neural Network for 3D Shape Representation• Semi-supervised learning with Bidirectional GANs• An Opportunistic Thresholding Detector for IoT Random Access in Massive MIMO• Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems
Like this:
Like Loading…
Related