A note on rank constrained solutions to linear matrix equations
This preliminary note presents a heuristic for determining rank constrained solutions to linear matrix equations (LME). The method proposed here is based on minimizing a non-convex quadratic functional, which will hence-forth be termed as the \textit{Low-Rank-Functional} (LRF). Although this method lacks a formal proof/comprehensive analysis, for example in terms of a probabilistic guarantee for converging to a solution, the proposed idea is intuitive and has been seen to perform well in simulations. To that end, many numerical examples are provided to corroborate the idea.
Deep neural network marketplace recommenders in online experiments
Recommendations are broadly used in marketplaces to match users with items relevant to their interests and needs. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper focuses on the challenge of measuring recommender performance and summarizes the online experiment results with several promising types of deep neural network recommenders – hybrid item representation models combining features from user engagement and content, sequence-based models, and multi-armed bandit models that optimize user engagement by re-ranking proposals from multiple submodels. The recommenders are currently running in production at the leading Norwegian marketplace FINN.no and serves over one million visitors everyday.
Five lessons from building a deep neural network recommender
Recommendation algorithms are widely adopted in marketplaces to help users find the items they are looking for. The sparsity of the items by user matrix and the cold-start issue in marketplaces pose challenges for the off-the-shelf matrix factorization based recommender systems. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper summarizes five lessons we learned from experimenting with state-of-the-art deep learning recommenders at the leading Norwegian marketplace \textit{FINN.no}. We design a hybrid recommender system that takes the user-generated contents of a marketplace (including text, images and meta attributes) and combines them with user behavior data such as page views and messages to provide recommendations for marketplace items. Among various tactics we experimented with, the following five show the best impact: staged training instead of end-to-end training, leveraging rich user behaviors beyond page views, using user behaviors as noisy labels to train embeddings, using transfer learning to solve the unbalanced data problem, and using attention mechanisms in the hybrid model. This system is currently running with around 20\% click-through-rate in production at \textit{FINN.no} and serves over one million visitors everyday.
Variational Bayesian Inference for Robust Streaming Tensor Factorization and Completion
Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them susceptible to outliers and vulnerable to over-fitting. This paper presents a Bayesian robust streaming tensor factorization model to identify sparse outliers, automatically determine the underlying tensor rank and accurately fit low-rank structure. We implement our model in Matlab and compare it with existing algorithms on tensor datasets generated from dynamic MRI and Internet traffic.
Large Scale Learning with Krein Kernels
We extend the Nystr\’om method for low-rank approximation of positive definite Mercer kernels to approximation of indefinite kernel matrices. Our result is the first derivation of the approach that does not require the positive definiteness of the kernel function. Building on this result, we then devise highly scalable methods for learning in reproducing kernel Kre\u{\i}n spaces. The main motivation for our work comes from problems with structured representations (e.g., graphs, strings, time-series), where it is relatively easy to devise a pairwise (dis)similarity function based on intuition/knowledge of a domain expert. Such pairwise functions are typically not positive definite and it is often well beyond the expertise of practitioners to verify this condition. The proposed large scale approaches for learning in reproducing kernel Kre\u{\i}n spaces provide principled and theoretically well-founded means to tackle this class of problems. The effectiveness of the approaches is evaluated empirically using kernels defined on structured and vectorial data representations.
Differentially Private Bayesian Inference for Exponential Families
The study of private inference has been sparked by growing concern regarding the analysis of data when it stems from sensitive sources. We present the first method for private Bayesian inference in exponential families that properly accounts for noise introduced by the privacy mechanism. It is efficient because it works only with sufficient statistics and not individual data. Unlike other methods, it gives properly calibrated posterior beliefs in the non-asymptotic data regime.
Challenges of Context and Time in Reinforcement Learning: Introducing Space Fortress as a Benchmark
Research in deep reinforcement learning (RL) has coalesced around improving performance on benchmarks like the Arcade Learning Environment. However, these benchmarks conspicuously miss important characteristics like abrupt context-dependent shifts in strategy and temporal sensitivity that are often present in real-world domains. As a result, RL research has not focused on these challenges, resulting in algorithms which do not understand critical changes in context, and have little notion of real world time. To tackle this issue, this paper introduces the game of Space Fortress as a RL benchmark which incorporates these characteristics. We show that existing state-of-the-art RL algorithms are unable to learn to play the Space Fortress game. We then confirm that this poor performance is due to the RL algorithms’ context insensitivity and reward sparsity. We also identify independent axes along which to vary context and temporal sensitivity, allowing Space Fortress to be used as a testbed for understanding both characteristics in combination and also in isolation. We release Space Fortress as an open-source Gym environment.
ProdSumNet: reducing model parameters in deep neural networks via product-of-sums matrix decompositions
We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We show that good accuracy on MNIST and Fashion MNIST can be obtained using a relatively small number of trainable parameters. In addition, since implementation of the convolutional layer is resource-heavy, we consider an approach in the transform domain that obviates the need for convolutional layers. One of the advantages of this general framework over prior approaches is that the number of trainable parameters is not fixed and can be varied arbitrarily. In particular, we illustrate the tradeoff of varying the number of trainable variables and the corresponding error rate. As an example, by using this decomposition on a reference CNN architecture for MNIST with over 3×10^6 trainable parameters, we are able to obtain an accuracy of 98.44% using only 3554 trainable parameters.
Dynamic Hierarchical Empirical Bayes: A Predictive Model Applied to Online Advertising
Predicting keywords performance, such as number of impressions, click-through rate (CTR), conversion rate (CVR), revenue per click (RPC), and cost per click (CPC), is critical for sponsored search in the online advertising industry. An interesting phenomenon is that, despite the size of the overall data, the data are very sparse at the individual unit level. To overcome the sparsity and leverage hierarchical information across the data structure, we propose a Dynamic Hierarchical Empirical Bayesian (DHEB) model that dynamically determines the hierarchy through a data-driven process and provides shrinkage-based estimations. Our method is also equipped with an efficient empirical approach to derive inferences through the hierarchy. We evaluate the proposed method in both simulated and real-world datasets and compare to several competitive models. The results favor the proposed method among all comparisons in terms of both accuracy and efficiency. In the end, we design a two-phase system to serve prediction in real time.
2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy
Deep Convolutional Neural Networks~(CNNs) offer remarkable performance of classifications and regressions in many high-dimensional problems and have been widely utilized in real-word cognitive applications. However, high computational cost of CNNs greatly hinder their deployment in resource-constrained applications, real-time systems and edge computing platforms. To overcome this challenge, we propose a novel filter-pruning framework, two-phase filter pruning based on conditional entropy, namely \textit{2PFPCE}, to compress the CNN models and reduce the inference time with marginal performance degradation. In our proposed method, we formulate filter pruning process as an optimization problem and propose a novel filter selection criteria measured by conditional entropy. Based on the assumption that the representation of neurons shall be evenly distributed, we also develop a maximum-entropy filter freeze technique that can reduce over fitting. Two filter pruning strategies — global and layer-wise strategies, are compared. Our experiment result shows that combining these two strategies can achieve a higher neural network compression ratio than applying only one of them under the same accuracy drop threshold. Two-phase pruning, that is, combining both global and layer-wise strategies, achieves 10 X FLOPs reduction and 46% inference time reduction on VGG-16, with 2% accuracy drop.
Character-Aware Decoder for Neural Machine Translation
Standard neural machine translation (NMT) systems operate primarily on words, ignoring lower-level patterns of morphology. We present a character-aware decoder for NMT that can simultaneously work with both word-level and subword-level sequences which is designed to capture such patterns. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder network with convolutional neural networks that operate on spelling of a word (or subword). While character-aware embeddings have been successfully used in the source-side, we find that mixing character-aware embeddings with standard embeddings is crucial in the target-side. Furthermore, we show that a simple approximate softmax layer can be used for large target-side vocabularies which would otherwise require prohibitively large memory. We experiment on the TED multi-target dataset, translating English into 14 typologically diverse languages. We find that in this low-resource setting, the character-aware decoder provides consistent improvements over word-level and subword-level counterparts with BLEU score gains of up to +3.37.
Sequential Model Selection Method for Nonparametric Autoregression
In this paper for the first time the nonparametric autoregression estimation problem for the quadratic risks is considered. To this end we develop a new adaptive sequential model selection method based on the efficient sequential kernel estimators proposed by Arkoun and Pergamenshchikov (2016). Moreover, we develop a new analytical tool for general regression models to obtain the non asymptotic sharp oracle inequalities for both usual quadratic and robust quadratic risks. Then, we show that the constructed sequential model selection procedure is optimal in the sense of oracle inequalities.
Learning Optimal Fair Policies
We consider the problem of learning optimal policies from observational data in a way that satisfies certain fairness criteria. The issue of fairness arises where some covariates used in decision making are sensitive features, or are correlated with sensitive features. (Nabi and Shpitser 2018) formalized fairness in the context of regression problems as constraining the causal effects of sensitive features along certain disallowed causal pathways. The existence of these causal effects may be called retrospective unfairness in the sense of already being present in the data before analysis begins, and may be due to discriminatory practices or the biased way in which variables are defined or recorded. In the context of learning policies, what we call prospective bias, i.e., the inappropriate dependence of learned policies on sensitive features, is also possible. In this paper, we use methods from causal and semiparametric inference to learn optimal policies in a way that addresses both retrospective bias in the data, and prospective bias due to the policy. In addition, our methods appropriately address statistical bias due to model misspecification and confounding bias, which are important in the estimation of path-specific causal effects from observational data. We apply our methods to both synthetic data and real criminal justice data.
Multi-Source Domain Adaptation with Mixture of Experts
We propose a mixture-of-experts approach for unsupervised domain adaptation from multiple sources. The key idea is to explicitly capture the relationship between a target example and different source domains. This relationship, expressed by a point-to-set metric, determines how to combine predictors trained on various domains. The metric is learned in an unsupervised fashion using meta-training. Experimental results on sentiment analysis and part-of-speech tagging demonstrate that our approach consistently outperforms multiple baselines and can robustly handle negative transfer.
BubGAN: Bubble Generative Adversarial Networks for Synthesizing Realistic Bubbly Flow Images
Bubble segmentation and size detection algorithms have been developed in recent years for their high efficiency and accuracy in measuring bubbly two-phase flows. In this work, we proposed an architecture called bubble generative adversarial networks (BubGAN) for the generation of realistic synthetic images which could be further used as training or benchmarking data for the development of advanced image processing algorithms. The BubGAN is trained initially on a labeled bubble dataset consisting of ten thousand images. By learning the distribution of these bubbles, the BubGAN can generate more realistic bubbles compared to the conventional models used in the literature. The trained BubGAN is conditioned on bubble feature parameters and has full control of bubble properties in terms of aspect ratio, rotation angle, circularity and edge ratio. A million bubble dataset is pre-generated using the trained BubGAN. One can then assemble realistic bubbly flow images using this dataset and associated image processing tool. These images contain detailed bubble information, therefore do not require additional manual labeling. This is more useful compared with the conventional GAN which generates images without labeling information. The tool could be used to provide benchmarking and training data for existing image processing algorithms and to guide the future development of bubble detecting algorithms.
Approximation algorithms for stochastic clustering
We consider stochastic settings for clustering, and develop provably-good (approximation) algorithms for a number of these notions. These algorithms allow one to obtain better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including providing fairer clustering and clustering which has better long-term behavior for each user. In particular, they ensure that every user is guaranteed to get good service (on average). We also complement some of these with impossibility results.
Change-Point Testing and Estimation for Risk Measures in Time Series
We investigate methods of change-point testing and confidence interval construction for nonparametric estimators of expected shortfall and related risk measures in weakly dependent time series. A key aspect of our work is the ability to detect general multiple structural changes in the tails of time series marginal distributions. Unlike extant approaches for detecting tail structural changes using quantities such as tail index, our approach does not require parametric modeling of the tail and detects more general changes in the tail. Additionally, our methods are based on the recently introduced self-normalization technique for time series, allowing for statistical analysis without the issues of consistent standard error estimation. The theoretical foundation for our methods are functional central limit theorems, which we develop under weak assumptions. An empirical study of S 500 returns and US 30-Year Treasury bonds illustrates the practical use of our methods in detecting and quantifying market instability via the tails of financial time series during times of financial crisis.
Information-Theoretic Active Learning for Content-Based Image Retrieval
We propose Information-Theoretic Active Learning (ITAL), a novel batch-mode active learning method for binary classification, and apply it for acquiring meaningful user feedback in the context of content-based image retrieval. Instead of combining different heuristics such as uncertainty, diversity, or density, our method is based on maximizing the mutual information between the predicted relevance of the images and the expected user feedback regarding the selected batch. We propose suitable approximations to this computationally demanding problem and also integrate an explicit model of user behavior that accounts for possible incorrect labels and unnameable instances. Furthermore, our approach does not only take the structure of the data but also the expected model output change caused by the user feedback into account. In contrast to other methods, ITAL turns out to be highly flexible and provides state-of-the-art performance across various datasets, such as MIRFLICKR and ImageNet.
Hierarchical Characteristic Set Merging for Optimizing SPARQL Queries in Heterogeneous RDF
Characteristic sets (CS) organize RDF triples based on the set of properties characterizing their subject nodes. This concept is recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well in the presence of schema heterogeneity, i.e., when the number of CSs becomes very large, resulting in a highly partitioned data organization. In this paper, we address this problem by introducing a novel technique, for merging CSs based on their hierarchical structure. Our technique employs a lattice to capture the hierarchical relationships between CSs, identifies dense CSs and merges dense CSs with their ancestors, thus reducing the size of the CSs as well as the links between them. We implemented our algorithm on top of a relational backbone, where each merged CS is stored in a relational table, and we performed an extensive experimental study to evaluate the performance and impact of merging to the storage and querying of RDF datasets, indicating significant improvements.
FRESH: Fréchet Similarity with Hashing
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research.
A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations
On2Vec: Embedding-based Relation Prediction for Ontology Population
Populating ontology graphs represents a long-standing problem for the Semantic Web community. Recent advances in translation-based graph embedding methods for populating instance-level knowledge graphs lead to promising new approaching for the ontology population problem. However, unlike instance-level graphs, the majority of relation facts in ontology graphs come with comprehensive semantic relations, which often include the properties of transitivity and symmetry, as well as hierarchical relations. These comprehensive relations are often too complex for existing graph embedding methods, and direct application of such methods is not feasible. Hence, we propose On2Vec, a novel translation-based graph embedding method for ontology population. On2Vec integrates two model components that effectively characterize comprehensive relation facts in ontology graphs. The first is the Component-specific Model that encodes concepts and relations into low-dimensional embedding spaces without a loss of relational properties; the second is the Hierarchy Model that performs focused learning of hierarchical relation facts. Experiments on several well-known ontology graphs demonstrate the promising capabilities of On2Vec in predicting and verifying new relation facts. These promising results also make possible significant improvements in related methods.
Deep Feature Learning of Multi-Network Topology for Node Classification
Networks are ubiquitous structure that describes complex relationships between different entities in the real world. As a critical component of prediction task over nodes in networks, learning the feature representation of nodes has become one of the most active areas recently. Network Embedding, aiming to learn non-linear and low-dimensional feature representation based on network topology, has been proved to be helpful on tasks of network analysis, especially node classification. For many real-world systems, multiple types of relations are naturally represented by multiple networks. However, existing network embedding methods mainly focus on single network embedding and neglect the information shared among different networks. In this paper, we propose a novel multiple network embedding method based on semisupervised autoencoder, named DeepMNE, which captures complex topological structures of multi-networks and takes the correlation among multi-networks into account. We evaluate DeepMNE on the task of node classification with two real-world datasets. The experimental results demonstrate the superior performance of our method over four state-of-the-art algorithms.
Deep Recurrent Survival Analysis
Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at fine-grained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three real-world tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.
A Primer on Causality in Data Science
Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure (randomized or not) on an outcome interest. Even studies that are seemingly non-causal (e.g. prediction or prevalence estimation) have causal elements, such as differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in the data. In this work, we review the ‘Causal Roadmap’, a formal framework to augment our traditional statistical analyses in an effort to answer the causal questions driving our research. Specific steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to translate the causal parameter into a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. Throughout we focus on the effect of an exposure occurring at a single time point and provide extensions to more advanced settings.
Term-Mouse-Fixations as an Additional Indicator for Topical User Interests in Domain-Specific Search
Models in Interactive Information Retrieval (IIR) are grounded very much on the user’s task in order to give system support based on different task types and topics. However, the automatic recognition of user interests from log data in search systems is not trivial. Search queries entered by users a surely one such source. However, queries may be short, or users are only browsing. In this paper, we propose a method of term-mouse-fixations which takes the fixations on terms users are hovering over with the mouse into consideration to estimate topical user interests. We analyzed 22,259 search sessions of a domain-specific digital library over a period of about four months. We compared these mouse fixations to user-entered search terms and to titles and keywords from documents the user showed an interest in. These terms were found in 87.12% of all analyzed sessions; in this subset of sessions, per session on average 11.46 term-mouse-fixations from queries and viewed documents were found. These terms were fixated significantly longer with about 7 seconds than other terms with about 4.4 seconds. This means, term-mouse-fixations provide indicators for topical user interests and it is possible to extract them based on fixation time.
Multitask and Multilingual Modelling for Lexical Analysis
In Natural Language Processing (NLP), one traditionally considers a single task (e.g. part-of-speech tagging) for a single language (e.g. English) at a time. However, recent work has shown that it can be beneficial to take advantage of relatedness between tasks, as well as between languages. In this work I examine the concept of relatedness and explore how it can be utilised to build NLP models that require less manually annotated data. A large selection of NLP tasks is investigated for a substantial language sample comprising 60 languages. The results show potential for joint multitask and multilingual modelling, and hints at linguistic insights which can be gained from such models.
Convolutional Neural Network: Text Classification Model for Open Domain Question Answering System
Recently machine learning is being applied to almost every data domain one of which is Question Answering Systems (QAS). A typical Question Answering System is fairly an information retrieval system, which matches documents or text and retrieve the most accurate one. The idea of open domain question answering system put forth, involves convolutional neural network text classifiers. The Classification model presented in this paper is multi-class text classifier. The neural network classifier can be trained on large dataset. We report series of experiments conducted on Convolution Neural Network (CNN) by training it on two different datasets. Neural network model is trained on top of word embedding. Softmax layer is applied to calculate loss and mapping of semantically related words. Gathered results can help justify the fact that proposed hypothetical QAS is feasible. We further propose a method to integrate Convolutional Neural Network Classifier to an open domain question answering system. The idea of Open domain will be further explained, but the generality of it indicates to the system of domain specific trainable models, thus making it an open domain.
BiasedWalk: Biased Sampling for Representation Learning on Graphs
Network embedding algorithms are able to learn latent feature representations of nodes, transforming networks into lower dimensional vector representations. Typical key applications, which have effectively been addressed using network embeddings, include link prediction, multilabel classification and community detection. In this paper, we propose BiasedWalk, a scalable, unsupervised feature learning algorithm that is based on biased random walks to sample context information about each node in the network. Our random-walk based sampling can behave as Breath-First-Search (BFS) and Depth-First-Search (DFS) samplings with the goal to capture homophily and role equivalence between the nodes in the network. We have performed a detailed experimental evaluation comparing the performance of the proposed algorithm against various baseline methods, on several datasets and learning tasks. The experiment results show that the proposed method outperforms the baseline ones in most of the tasks and datasets.
Sparse Kernel PCA for Outlier Detection
In this paper, we propose a new method to perform Sparse Kernel Principal Component Analysis (SKPCA) and also mathematically analyze the validity of SKPCA. We formulate SKPCA as a constrained optimization problem with elastic net regularization (Hastie et al.) in kernel feature space and solve it. We consider outlier detection (where KPCA is employed) as an application for SKPCA, using the RBF kernel. We test it on 5 real-world datasets and show that by using just 4% (or even less) of the principal components (PCs), where each PC has on average less than 12% non-zero elements in the worst case among all 5 datasets, we are able to nearly match and in 3 datasets even outperform KPCA. We also compare the performance of our method with a recently proposed method for SKPCA by Wang et al. and show that our method performs better in terms of both accuracy and sparsity. We also provide a novel probabilistic proof to justify the existence of sparse solutions for KPCA using the RBF kernel. To the best of our knowledge, this is the first attempt at theoretically analyzing the validity of SKPCA.
Fairness Through Causal Awareness: Learning Latent-Variable Models for Biased Data
How do we learn from biased data? Historical datasets often reflect historical prejudices; sensitive or protected attributes may affect the observed treatments and outcomes. Classification algorithms tasked with predicting outcomes accurately from these datasets tend to replicate these biases. We advocate a causal modeling approach to learning from biased data and reframe fair classification as an intervention problem. We propose a causal model in which the sensitive attribute confounds both the treatment and the outcome. Building on prior work in deep learning and generative modeling, we describe how to learn the parameters of this causal model from observational data alone, even in the presence of unobserved confounders. We show experimentally that fairness-aware causal modeling provides better estimates of the causal effects between the sensitive attribute, the treatment, and the outcome. We further present evidence that estimating these causal effects can help us to learn policies which are both more accurate and fair, when presented with a historically biased dataset.
Differentially Private Continual Release of Graph Statistics
HyperGCN: Hypergraph Convolutional Networks for Semi-Supervised Classification
Graph-based semi-supervised learning (SSL) is an important learning problem where the goal is to assign labels to initially unlabeled nodes in a graph. Graph Convolutional Networks (GCNs) have recently been shown to be effective for graph-based SSL problems. GCNs inherently assume existence of pairwise relationships in the graph-structured data. However, in many real-world problems, relationships go beyond pairwise connections and hence are more complex. Hypergraphs provide a natural modeling tool to capture such complex relationships. In this work, we explore the use of GCNs for hypergraph-based SSL. In particular, we propose HyperGCN, an SSL method which uses a layer-wise propagation rule for convolutional neural networks operating directly on hypergraphs. To the best of our knowledge, this is the first principled adaptation of GCNs to hypergraphs. HyperGCN is able to encode both the hypergraph structure and hypernode features in an effective manner. Through detailed experimentation, we demonstrate HyperGCN’s effectiveness at hypergraph-based SSL.
Accelerating Deep Neural Networks with Spatial Bottleneck Modules
This paper presents an efficient module named spatial bottleneck for accelerating the convolutional layers in deep neural networks. The core idea is to decompose convolution into two stages, which first reduce the spatial resolution of the feature map, and then restore it to the desired size. This operation decreases the sampling density in the spatial domain, which is independent yet complementary to network acceleration approaches in the channel domain. Using different sampling rates, we can tradeoff between recognition accuracy and model complexity. As a basic building block, spatial bottleneck can be used to replace any single convolutional layer, or the combination of two convolutional layers. We empirically verify the effectiveness of spatial bottleneck by applying it to the deep residual networks. Spatial bottleneck achieves 2x and 1.4x speedup on the regular and channel-bottlenecked residual blocks, respectively, with the accuracies retained in recognizing low-resolution images, and even improved in recognizing high-resolution images.
• A Practical Design Optimization Method for Multicopter Propulsion Systems• Multi-Adversarial Domain Adaptation• Hierarchical Selective Recruitment in Linear-Threshold Brain Networks – Part II: Inter-Layer Dynamics and Top-Down Recruitment• Langevin equations in the small-mass limit: Higher order approximations• A note on concentration inequality for vector-valued martingales with weak exponential-type tails• On the Importance of Visual Context for Data Augmentation in Scene Understanding• CoverBLIP: scalable iterative matched filtering for MR Fingerprint recovery• Efficient Loop Detection in Forwarding Networks and Representing Atoms in a Field of Sets• Stochastically Controlled Stochastic Gradient for the Convex and Non-convex Composition problem• Reflected backward stochastic differentialequation with jumps and viscosity solution of second order integro-differential equation without monotonicity condition: case with the measure of Levy infinite• Balanced multi-shot EPI for accelerated Cartesian MRF: An alternative to spiral MRF• Suboptimal Control of Dividends under Exponential Utility• A self-consistent Hartree-Fock approach to Many-Body Localization• GANs beyond divergence minimization• Upcycle Your OCR: Reusing OCRs for Post-OCR Text Correction in Romanised Sanskrit• From FATS to feets: Further improvements to an astronomical feature extraction tool based on machine learning• Object Hallucination in Image Captioning• Gaussian statistics as an emergent symmetry far from equilibrium• Escaping Saddle Points in Constrained Optimization• Deep Learning for Generic Object Detection: A Survey• Turning a Blind Eye: Explicit Removal of Biases and Variation from Deep Neural Network Embeddings• Logical Rule Induction and Theory Learning Using Neural Theorem Proving• Bayesian Nonparametric Spectral Estimation• A Queueing Model for Sleep as a Vacation• Two short pieces around the Wigner problem• Assessing Gender Bias in Machine Translation — A Case Study with Google Translate• True Gaussian shaping for high count rate measurements of pulse amplitudes• An improved sum-product bound for quaternions• Well-posedness of distribution dependent SDEs with singular drifts• Time-dependent balls and bins model with positive feedback• Content-based Propagation of User Markings for Interactive Segmentation of Patterned Images• Adaptive Strategic Cyber Defense for Advanced Persistent Threats in Critical Infrastructure Networks• Obstacle Detection Quality as a Problem-Oriented Approach to Stereo Vision Algorithms Estimation in Road Situation Analysis• Deep Neural Net with Attention for Multi-channel Multi-touch Attribution• Distributed and Optimal Resilient Planning of Large-Scale Interdependent Critical Infrastructures• Automated Game Design via Conceptual Expansion• Applying Deep Learning to Derivatives Valuation• A Bandit Approach to Multiple Testing with False Discovery Control• 82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models• Hypergames and Cyber-Physical Security for Control Systems• Fractional Calculus of Variations: a novel way to look at it• Adversarial Feature-Mapping for Speech Enhancement• Cycle-Consistent Speech Enhancement• Quantum algorithms and approximating polynomials for composed functions with shared inputs• Adversarial Domain Adaptation for Duplicate Question Detection• Representing Images in 200 Bytes: Compression via Triangulation• The Force of Proof by Which Any Argument Prevails• Logistic Regression Augmented Community Detection for Network Data with Application in Identifying Autism-Related Gene Pathways• Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations• Cloud-based Quadratic Optimization with Partially Homomorphic Encryption• Computation of Total Kidney Volume from CT images in Autosomal Dominant Polycystic Kidney Disease using Multi-Task 3D Convolutional Neural Networks• edge2vec: Learning Node Representation Using Edge Semantics• Learning Embeddings of Directed Networks with Text-Associated Nodes—with Applications in Software Package Dependency Networks• Optimal Relaying Beamforming in Multiple Access Broadcast Channel (MABC) Bidirectional Cognitive Radio Networks in Presence of Interferers• Local Music Event Recommendation with Long Tail Artists• Cell-aware Stacked LSTMs for Modeling Sentences• Nash Equilibrium in Smoothed Polynomial Time for Network Coordination Games• Evolutionary Centrality and Maximal Cliques in Mobile Social Networks• Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations• Tensor Ring Decomposition with Rank Minimization on Latent Space: An Efficient Approach for Tensor Completion• The entropy function of an invariant measure• A Block Coordinate Ascent Algorithm for Mean-Variance Optimization• Lyapunov exponent and variance in the CLT for products of random matrices related to random Fibonacci sequences• Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing• Data Augmentation for Spoken Language Understanding via Joint Variational Generation• Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models• Splittings and symbolic powers of square-free monomial Ideals• A weakly convergent fully inexact Douglas-Rachford method with relative error tolerance• Fast greedy algorithms for dictionary selection with generalized sparsity constraints• QoS aware Automatic Web Service Composition with Multiple objectives• Scaling Video Analytics Systems to Large Camera Deployments• Collision Free Navigation of a Multi-Robot Team for Intruder Interception• ADM for grid CRF loss in CNN segmentation• Non-parametric uncertainties in the dark matter velocity distribution• Glassy properties of Anderson localization: pinning, avalanches and chaos• Dynamical coupling between Ising and FK percolation• Predicting Lung Nodule Malignancies by Combining Deep Convolutional Neural Network and Handcrafted Features• An Anderson-Chebyshev Mixing Method for Nonlinear Optimization• Performance Degradation Assessment for Electrical Machines Based on SOM and Hybrid DHMM• Exploiting local and global performance of candidate systems for aggregation of summarization techniques• Degradation Detection Method for Railway Point Machines• New methods for calculating the degree distance and the Gutman index• Asymptotic efficiency for covariance estimation under noise and asynchronicity• Self-optimal Piecewise Linearization Based Network Power Flow Constraints in Electrical Distribution System Optimization• Infinite Curriculum Learning for Efficiently Detecting Gastric Ulcers in WCE Images• Metric dimension reduction: A snapshot of the Ribe program• Monte Carlo Tree Search with Scalable Simulation Periods for Continuously Running Tasks• A simple probabilistic deep generative model for learning generalizable disentangled representations from grouped data• Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers• Improving On-policy Learning with Statistical Reward Accumulation• Relaxation schemes for mathematical programs with switching constraints• Hook, line and sinker: a bijective proof of the skew shifted hook-length formula• Latin Cubes with Forbidden Entries• Improving Neural Question Generation using Answer Separation• Detecting Potential Local Adversarial Examples for Human-Interpretable Defense• The joint spectrum• Performance Analysis of MRC under Spatially Correlated Interference using Mixture-based Method• Where Do All These Search Terms Come From? – Two Experiments in Domain-Specific Search• Data Requirements for Evaluation of Personalization of Information Retrieval – A Position Paper• Challenges for Measuring Usefulness of Interactive IR Systems with Log-based Approaches• Proceedings Ninth International Symposium on Games, Automata, Logics, and Formal Verification• Transient Chaos Generates Small Chimeras• Design Automation for Adiabatic Circuits• Studying the inertias of LCM matrices and revisiting the Bourque-Ligh conjecture• Heat kernel estimates of fractional Schrödinger operators with negative hardy potential• Arithmetic Progressions with Restricted Digits• Joint species distribution modeling with additive multivariate Gaussian process priors and heteregenous data• Dealing with the Dimensionality Curse in Dynamic Pricing Competition: Using Frequent Repricing to Compensate Imperfect Market Anticipations• Top-k Overlapping Densest Subgraphs: Approximation and Complexity• On Underlay-Aware Self-Stabilizing Overlay Networks• A Largest Empty Hypersphere Metaheuristic for Robust Optimisation with Implementation Uncertainty• Online optimization in dynamic environments: a regret analysis for sparse problems• Optimizing deep video representation to match brain activity• HC-Net: Memory-based Incremental Dual-Network System for Continual learning• Alternating chimeras in networks of ephaptically coupled bursting neurons• Posterior consistency for $n$ in the binomial $(n,p)$ problem with both parameters unknown – with applications to quantitative nanoscopy• Metamorphic Relation Based Adversarial Attacks on Differentiable Neural Computer• Posterior Consistency in the Binomial $(n,p)$ Model with Unknown $n$ and $p$: A Numerical Study• Dirichlet process mixtures under affine transformations of the data• Euclidean matchings and minimality of hyperplane arrangements• Almost sure convergence on chaoses• Meteorologists and Students: A resource for language grounding of geographical descriptors• MixUp as Locally Linear Out-Of-Manifold Regularization• Multi-level hypothesis testing for populations of heterogeneous networks• Local Coloring and its Complexity• Random intersection graphs with communities• Streaming dictionary matching with mismatches• Scalable Monte Carlo inference for state-space models• Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge• Growth on Two Limiting Essential Resources in a Self-Cycling Fermentor• A Deeper Look at 3D Shape Classifiers• Generalized weak rigidity: Theory, and local and global convergence of formations• Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning• The largest cognitive systems will be optoelectronic• Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices• The edge-statistics conjecture for $\ell\ll k^{6/5}$• Pebbling on Directed Graphs with Fixed Diameter• Self-Supervised Generation of Spatial Audio for 360 Video• Learning Invariances for Policy Generalization• Logographic Subword Model for Neural Machine Translation• VOS: a Method for Variational Oversampling of Imbalanced Data• Fast, High-Fidelity, Quantum Non-demolition Readout of a Superconducting Qubit Using a Transverse Coupling• Mobility-Aware Resource Allocation in VLC Networks Using T-Step Look-Ahead Policy• Simple coarse graining and sampling strategies for image recognition• Many body localization mediated by the presence of a central qudit• Liouville metric of star-scale invariant fields: tails and Weyl scaling
Like this:
Like Loading…
Related