Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs
In this work, we provide a new formulation for Graph Convolutional Neural Networks (GCNNs) for link prediction on graph data that addresses common challenges for biomedical knowledge graphs (KGs). We introduce a regularized attention mechanism to GCNNs that not only improves performance on clean datasets, but also favorably accommodates noise in KGs, a pervasive issue in real-world applications. Further, we explore new visualization methods for interpretable modelling and to illustrate how the learned representation can be exploited to automate dataset denoising. The results are demonstrated on a synthetic dataset, the common benchmark dataset FB15k-237, and a large biomedical knowledge graph derived from a combination of noisy and clean data sources. Using these improvements, we visualize a learned model’s representation of the disease cystic fibrosis and demonstrate how to interrogate a neural network to show the potential of PPARG as a candidate therapeutic target for rheumatoid arthritis.
Learning Curriculum Policies for Reinforcement Learning
Curriculum learning in reinforcement learning is a training methodology that seeks to speed up learning of a difficult target task, by first training on a series of simpler tasks and transferring the knowledge acquired to the target task. Automatically choosing a sequence of such tasks (i.e. a curriculum) is an open problem that has been the subject of much recent work in this area. In this paper, we build upon a recent method for curriculum design, which formulates the curriculum sequencing problem as a Markov Decision Process. We extend this model to handle multiple transfer learning algorithms, and show for the first time that a curriculum policy over this MDP can be learned from experience. We explore various representations that make this possible, and evaluate our approach by learning curriculum policies for multiple agents in two different domains. The results show that our method produces curricula that can train agents to perform on a target task as fast or faster than existing methods.
On variation of gradients of deep neural networks
We provide a theoretical explanation of the role of the number of nodes at each layer in deep neural networks. We prove that the largest variation of a deep neural network with ReLU activation function arises when the layer with the fewest nodes changes its activation pattern. An important implication is that deep neural network is a useful tool to generate functions most of whose variations are concentrated on a smaller area of the input space near the boundaries corresponding to the layer with the fewest nodes. In turn, this property makes the function more invariant to input transformation. That is, our theoretical result gives a clue about how to design the architecture of a deep neural network to increase complexity and transformation invariancy simultaneously.
Fake News: A Survey of Research, Detection Methods, and Opportunities
The explosive growth in fake news and its erosion to democracy, justice, and public trust has increased the demand for fake news analysis, detection and intervention. This survey comprehensively and systematically reviews fake news research. The survey identifies and specifies fundamental theories across various disciplines, e.g., psychology and social science, to facilitate and enhance the interdisciplinary research of fake news. Current fake news research is reviewed, summarized and evaluated. These studies focus on fake news from four perspective: (1) the false knowledge it carries, (2) its writing style, (3) its propagation patterns, and (4) the credibility of its creators and spreaders. We characterize each perspective with various analyzable and utilizable information provided by news and its spreaders, various strategies and frameworks that are adaptable, and techniques that are applicable. By reviewing the characteristics of fake news and open issues in fake news studies, we highlight some potential research tasks at the end of this survey.
Model Selection and estimation of Multi Screen Penalty
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Image Score: How to Select Useful Samples
There has long been debates on how we could interpret neural networks and understand the decisions our models make. Specifically, why deep neural networks tend to be error-prone when dealing with samples that output low softmax scores. We present an efficient approach to measure the confidence of decision-making steps by statistically investigating each unit’s contribution to that decision. Instead of focusing on how the models react on datasets, we study the datasets themselves given a pre-trained model. Our approach is capable of assigning a score to each sample within a dataset that measures the frequency of occurrence of that sample’s chain of activation. We demonstrate with experiments that our method could select useful samples to improve deep neural networks in a semi-supervised leaning setting.
GAN-EM: GAN based EM learning framework
Network Compression via Recursive Bayesian Pruning
Recently, compression and acceleration of deep neural networks are in critic need. Bayesian generalization of structured pruning represents an important research direction to solve the above problem. However, the existing Bayesian methods ignore the dependency among neurons and filters for computational simplicity. In this study, we explore, under Bayesian framework, a structured pruning method with layer-wise sequential dependency assumed, a more general learning setting. Based on the property of Dirac distribution, we further derive a new dropout noise, which makes it possible to approximate the posterior of dropout noise knowing that of the previous layer. With the Dirac-like dropout noise, we further propose a recursive strategy, named \emph{Recursive Bayesian Pruning} (RBP), to train and prune networks in a layer-by-layer fashion. The unimportant neurons and filters are directly targeted and removed, taking the influence from the previous layer. Experiments on typical neural networks LeNet-300-100, LeNet-5 and VGG-16 have demonstrated the proposed method are competitive with or even outperform the state-of-the-art methods in several compression and acceleration metrics.
Feature Selection Based on Unique Relevant Information for Health Data
Feature selection, which searches for the most representative features in observed data, is critical for health data analysis. Unlike feature extraction, such as PCA and autoencoder based methods, feature selection preserves interpretability, meaning that the selected features provide direct information about certain health conditions (i.e., the label). Thus, feature selection allows domain experts, such as clinicians, to understand the predictions made by machine learning based systems, as well as improve their own diagnostic skills. Mutual information is often used as a basis for feature selection since it measures dependencies between features and labels. In this paper, we introduce a novel mutual information based feature selection (MIBFS) method called SURI, which boosts features with high unique relevant information. We compare SURI to existing MIBFS methods using 3 different classifiers on 6 publicly available healthcare data sets. The results indicate that, in addition to preserving interpretability, SURI selects more relevant feature subsets which lead to higher classification performance. More importantly, we explore the dynamics of mutual information on a public low-dimensional health data set via exhaustive search. The results suggest the important role of unique relevant information in feature selection and verify the principles behind SURI.
Imputation of Clinical Covariates in Time Series
Efficient Lifelong Learning with A-GEM
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.
Towards a More Practice-Aware Runtime Analysis of Evolutionary Algorithms
Theory of evolutionary computation (EC) aims at providing mathematically founded statements about the performance of evolutionary algorithms (EAs). The predominant topic in this research domain is runtime analysis, which studies the time it takes a given EA to solve a given optimization problem. Runtime analysis has witnessed significant advances in the last couple of years, allowing us to compute precise runtime estimates for several EAs and several problems. Runtime analysis is, however (and unfortunately!), often judged by practitioners to be of little relevance for real applications of EAs. Several reasons for this claim exist. We address two of them in this present work: (1) EA implementations often differ from their vanilla pseudocode description, which, in turn, typically form the basis for runtime analysis. To close the resulting gap between empirically observed and theoretically derived performance estimates, we therefore suggest to take this discrepancy into account in the mathematical analysis and to adjust, for example, the cost assigned to the evaluation of search points that equal one of their direct parents (provided that this is easy to verify as is the case in almost all standard EAs). (2) Most runtime analysis results make statements about the expected time to reach an optimal solution (and possibly the distribution of this optimization time) only, thus explicitly or implicitly neglecting the importance of understanding how the function values evolve over time. We suggest to extend runtime statements to runtime profiles, covering the expected time needed to reach points of intermediate fitness values. As a direct consequence, we obtain a result showing that the greedy (2+1) GA of Sudholt [GECCO 2012] outperforms any unary unbiased black-box algorithm on OneMax.
Generalization in anti-causal learning
The ability to learn and act in novel situations is still a prerogative of animate intelligence, as current machine learning methods mostly fail when moving beyond the standard i.i.d. setting. What is the reason for this discrepancy? Most machine learning tasks are anti-causal, i.e., we infer causes (labels) from effects (observations). Typically, in supervised learning we build systems that try to directly invert causal mechanisms. Instead, in this paper we argue that strong generalization capabilities crucially hinge on searching and validating meaningful hypotheses, requiring access to a causal model. In such a framework, we want to find a cause that leads to the observed effect. Anti-causal models are used to drive this search, but a causal model is required for validation. We investigate the fundamental differences between causal and anti-causal tasks, discuss implications for topics ranging from adversarial attacks to disentangling factors of variation, and provide extensive evidence from the literature to substantiate our view. We advocate for incorporating causal models in supervised learning to shift the paradigm from inference only, to search and validation.
Modeling Irregularly Sampled Clinical Time Series
While the volume of electronic health records (EHR) data continues to grow, it remains rare for hospital systems to capture dense physiological data streams, even in the data-rich intensive care unit setting. Instead, typical EHR records consist of sparse and irregularly observed multivariate time series, which are well understood to present particularly challenging problems for machine learning methods. In this paper, we present a new deep learning architecture for addressing this problem based on the use of a semi-parametric interpolation network followed by the application of a prediction network. The interpolation network allows for information to be shared across multiple dimensions during the interpolation stage, while any standard deep learning model can be used for the prediction network. We investigate the performance of this architecture on the problems of mortality and length of stay prediction.
• HUMBI 1.0: HUman Multiview Behavioral Imaging Dataset• Vertex Priority Based Butterfly Counting for Large-scale Bipartite Networks• $K$-weight bounds for $γ$-hyperelliptic semigroups• Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data• On Bi-Objective convex-quadratic problems• A Behavioral Compact Model of 3D NAND Flash Memory• Classifying a specific image region using convolutional nets with an ROI mask as input• In-silico Risk Analysis of Personalized Artificial Pancreas Controllers via Rare-event Simulation• Containers Orchestration with Cost-Efficient Autoscaling in Cloud Computing Environments• Plan-Recognition-Driven Attention Modeling for Visual Recognition• An API for Development of User Defined Scheduling Algorithms in Aneka PaaS Cloud Software• Multi-modal Capsule Routing for Actor and Action Video Segmentation Conditioned on Natural Language Queries• Unilateral Left-Tail Anderson Darling Test Based Spectrum Sensing with Laplacian Noise• The directed landscape• Basic properties of the Airy line ensemble• ECO: Egocentric Cognitive Mapping• Efficiency and robustness in Monte Carlo sampling of 3-D geophysical inversions with Obsidian v0.1.2: Setting up for success• Derivatives of Schubert polynomials and proof of a determinant conjecture of Stanley• CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark• End-to-end Learning of Convolutional Neural Net and Dynamic Programming for Left Ventricle Segmentation• Iterative Reorganization with Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning• PVRNet: Point-View Relation Neural Network for 3D Shape Recognition• Macro action selection with deep reinforcement learning in StarCraft• Regularized Wasserstein Means Based on Variational Transportation• Machine-Learning-based High-resolution DOA Measurement and Robust DM for Hybrid Analog-Digital Massive MIMO Transceiver• Many Server Queueing Models with Heterogeneous Servers and Parameter Uncertainty with Customer Contact Centre Applications• Analysis on Gradient Propagation in Batch Normalized Residual Networks• How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos• A Study on Dialogue Reward Prediction for Open-Ended Conversational Agents• Observability on lattice points for heat equations and applications• Locally Consistent Parsing for Text Indexing in Small Space• Quick Best Action Identification in Linear Bandit Problems• A Tverberg type theorem for collectively unavoidable complexes• LCD codes from weighing matrices• Link Delay Estimation Using Sparse Recovery for Dynamic Network Tomography• Variations of dynamic random networks: localization approach• Predicting Inpatient Discharge Prioritization With Electronic Health Records• Asymptotics of a locally dependent statistic on finite reflection groups• Ensemble-based implicit sampling for Bayesian inverse problems with non-Gaussian priors• The smallest parts function associated with $ω(q)$• Mapping the Underground: Towards Automatic Discovery of Cybercrime Supply Chains• Improved and Robust Controversy Detection in General Web Pages Using Semantic Approaches under Large Scale Conditions• Kiki Kills: Identifying Dangerous Challenge Videos from Social Media• An equivalence between stationary points for rank constraints versus low-rank factorizations• CASIA-SURF: A Dataset and Benchmark for Large-scale Multi-modal Face Anti-Spoofing• Ann: A domain-specific language for the effective design and validation of Java annotations• A Psychovisual Analysis on Deep CNN Features for Perceptual Metrics and A Novel Psychovisual Loss• Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale• On the cardinality spectrum and the number of latin bitrades of order 3• Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations• A multi-task deep learning model for the classification of Age-related Macular Degeneration• Connecting empirical phenomena and theoretical models of biological coordination across scales• GPSfM: Global Projective SFM Using Algebraic Constraints on Multi-View Fundamental Matrices• Report on the 3rd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2018)• Precoder Design For Multi-group Multicasting with a Common Message• Binomial Eulerian polynomials for colored permutations• Learning Representations of Social Media Users• Computing Spatial Image Convolutions for Event Cameras• Pedestrian Detection with Autoregressive Network Phases• Dual Objective Approach Using A Convolutional Neural Network for Magnetic Resonance Elastography• Deep Cosine Metric Learning for Person Re-Identification• Distributed averaging integral Nash equilibrium seeking on networks• Integrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimer’s disease• Design and Implementation of a Neural Network Aided Self-Interference Cancellation Scheme for Full-Duplex Radios• The Inherent Instability of Disordered Systems• Disentangling Propagation and Generation for Video Prediction• Revisiting the Softmax Bellman Operator: Theoretical Properties and Practical Benefits• Metric mean dimension and analog compression• Personalizing Intervention Probabilities By Pooling• ‘Double-DIP’: Unsupervised Image Decomposition via Coupled Deep-Image-Priors• A Harary-Sachs Theorem for Hypergraphs• Anchor Box Optimization for Object Detection• Nearly-Regular Hypergraphs and Saturation of Berge Stars• Nonlinear Stochastic Position and Attitude Filter on the Special Euclidean Group 3• Why the World Reads Wikipedia: Beyond English Speakers• Multiple Instance Learning for ECG Risk Stratification• Distributed Cluster Formation and Power-Bandwidth Allocation for Imperfect NOMA in DL-HetNets• Fighting Fire with Fire: Using Antidote Data to Improve Polarization and Fairness of Recommender Systems• Ego-Downward and Ambient Video based Person Location Association• Unsupervised Domain Adaptation using Generative Models and Self-ensembling• Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization• Double and Triple Erasure-Correcting-Codes over Graphs• DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image• Improving Clinical Predictions through Unsupervised Time Series Representation Learning• VADRA: Visual Adversarial Domain Randomization and Augmentation• Estimation in linear errors-in-variables models with unknown error distribution• Permutations Unlabeled beyond Sampling Unknown• Multi-task Learning of Hierarchical Vision-Language Representation• Modelling and Simulation of Fog and Edge Computing Environments using iFogSim Toolkit• Optimal Resource Allocation over Networks via Lottery-Based Mechanisms• Prediction of New Onset Diabetes after Liver Transplant• Explore and Learn: Optimized Two-Stage Search for Millimeter-Wave Beam Alignment• Knowledge-driven generative subspaces for modeling multi-view dependencies in medical data• JSR-Net: A Deep Network for Joint Spatial-Radon Domain CT Reconstruction from incomplete data• Elastic Boundary Projection for 3D Medical Imaging Segmentation• A dual spectral projected gradient method for log-determinant semidefinite problems• Automated Segmentation of Cervical Nuclei in Pap Smear Images using Deformable Multi-path Ensemble Model• Modeling disease progression in longitudinal EHR data using continuous-time hidden Markov models• Large Spectral Density Matrix Estimation by Thresholding• A Graph Theory of Rook Placements• Bollobás-type inequalities on set $k$-tuples• Fast Covariance Estimation for Multivariate Sparse Functional Data• Interpretable Clustering via Optimal Trees• Exploiting Wireless Channel State Information Structures Beyond Linear Correlations: A Deep Learning Approach• Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent• Few-Shot Self Reminder to Overcome Catastrophic Forgetting• Nonlinear Cuff-less Blood Pressure Estimation of Healthy Subjects Using Pulse Transit Time and Arrival Time• Semi-supervised Rare Disease Detection Using Generative Adversarial Network• XNet: A convolutional neural network (CNN) implementation for medical X-Ray image segmentation suitable for small datasets• Adding a Helper Can Totally Remove the Secrecy Constraints in Interference Channel• Universal Perturbation Attack Against Image Retrieval• A Hidden Markov Model Based Unsupervised Algorithm for Sleep/Wake Identification Using Actigraphy• Modeling Treatment Delays for Patients using Feature Label Pairs in a Time Series• SUSAN: Segment Unannotated image Structure using Adversarial Network• Signal Reconstruction from Modulo Observations• Metric Subregularity of Subdifferential and KL Property of Exponent 1/2• Twists and Turns in the US-North Korea Dialogue: Key Figure Dynamic Network Analysis using News Articles• Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments• Split learning for health: Distributed deep learning without sharing raw patient data• Sequentially congruent partitions and related bijections• Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control• Practical Window Setting Optimization for Medical Image Deep Learning• Towards Visual Feature Translation• Recommending Paths: Follow or Not Follow?• Facets of the Cone of Totally Balanced Games• Joint Optimization of a UAV’s Trajectory and Transmit Power for Covert Communications• Rademacher Complexity and Generalization Performance of Multi-category Margin Classifiers• Critical base for the unique codings of fat Sierpinski gasket• Internet of Things (IoT) and New Computing Paradigms• Management and Orchestration of Network Slices in 5G, Fog, Edge and Clouds• Deep Learning Approach for Predicting 30 Day Readmissions after Coronary Artery Bypass Graft Surgery• Resource Constrained Deep Reinforcement Learning• Examining Deep Learning Architectures for Crime Classification and Prediction• Quantum plateaus in dynamical Hall conductivity• Generalized Differential Calculus in Infinite-Dimensional Convex Analysis via Quasi-Relative Interiors• Macdonald trees and determinants of representations for finite Coxeter groups• Bus bunching as a synchronisation phenomenon
Like this:
Like Loading…
Related