DAPPER: Scaling Dynamic Author Persona Topic Model to Billion Word Corpora
Extracting common narratives from multi-author dynamic text corpora requires complex models, such as the Dynamic Author Persona (DAP) topic model. However, such models are complex and can struggle to scale to large corpora, often because of challenging non-conjugate terms. To overcome such challenges, in this paper we adapt new ideas in approximate inference to the DAP model, resulting in the Dynamic Author Persona Performed Exceedingly Rapidly (DAPPER) topic model. Specifically, we develop Conjugate-Computation Variational Inference (CVI) based variational Expectation-Maximization (EM) for learning the model, yielding fast, closed form updates for each document, replacing iterative optimization in earlier work. Our results show significant improvements in model fit and training time without needing to compromise the model’s temporal structure or the application of Regularized Variation Inference (RVI). We demonstrate the scalability and effectiveness of the DAPPER model by extracting health journeys from the CaringBridge corpus — a collection of 9 million journals written by 200,000 authors during health crises.
Multi-Level Sensor Fusion with Deep Learning
In the context of deep learning, this article presents an original deep network, namely CentralNet, for the fusion of information coming from different sensors. This approach is designed to efficiently and automatically balance the trade-off between early and late fusion (i.e. between the fusion of low-level vs high-level information). More specifically, at each level of abstraction-the different levels of deep networks-uni-modal representations of the data are fed to a central neural network which combines them into a common embedding. In addition, a multi-objective regularization is also introduced, helping to both optimize the central network and the unimodal networks. Experiments on four multimodal datasets not only show state-of-the-art performance, but also demonstrate that CentralNet can actually choose the best possible fusion strategy for a given problem.
Strong convex relaxations and mixed-integer programming formulations for trained neural networks
We present strong convex relaxations for high-dimensional piecewise linear functions that correspond to trained neural networks. These convex relaxations can be used for a number of important tasks, such as verifying that an image classification network is robust to adversarial inputs, or providing optimality guarantees for decision problems with machine learning models embedded inside (i.e. the `predict, then optimize’ paradigm). Our convex relaxations arise from mixed-integer programming (MIP) formulations, and so they can be paired with existing MIP technology to produce provably optimal primal solutions, or to further strengthen the relaxations via cutting planes. We provide convex relaxations for networks with many of the most popular nonlinear operations (e.g. ReLU and max pooling) that are strictly stronger than other approaches from the literature. We corroborate this computationally on image classification verification tasks on the MNIST digit data set, where we show that our relaxations are able to match the bound improvement provided by state-of-the-art MIP solvers, in orders of magnitude less time.
Sufficient Dimension Reduction for Feasible and Robust Estimation of Average Causal Effect
When estimating the treatment effect in an observational study, we use a semiparametric locally efficient dimension reduction approach to assess both the treatment assignment mechanism and the average responses in both treated and nontreated groups. We then integrate all results through imputation, inverse probability weighting and doubly robust augmentation estimators. Doubly robust estimators are locally efficient while imputation estimators are super-efficient when the response models are correct. To take advantage of both procedures, we introduce a shrinkage estimator to automatically combine the two, which retains the double robustness property while improving on the variance when the response model is correct. We demonstrate the performance of these estimators through simulated experiments and a real dataset concerning the effect of maternal smoking on baby birth weight. Key words and phrases: Average Treatment Effect, Doubly Robust Estimator, Efficiency, Inverse Probability Weighting, Shrinkage Estimator.
Model Extraction and Active Learning
Machine learning is being increasingly used by individuals, research institutions, and corporations. This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) – cloud services that provide (a) tools and resources to learn the model, and (b) a user-friendly query interface to access the model. However, such MLaaS systems raise privacy concerns, one being model extraction. Adversaries maliciously exploit the query interface to steal the model. More precisely, in a model extraction attack, a good approximation of a sensitive or proprietary model held by the server is extracted (i.e. learned) by a dishonest user. Such a user only sees the answers to select queries sent using the query interface. This attack was recently introduced by Tramer et al. at the 2016 USENIX Security Symposium, where practical attacks for different models were shown. We believe that better understanding the efficacy of model extraction attacks is paramount in designing better privacy-preserving MLaaS systems. To that end, we take the first step by (a) formalizing model extraction and proposing the first definition of extraction defense, and (b) drawing parallels between model extraction and the better investigated active learning framework. In particular, we show that recent advancements in the active learning domain can be used to implement both model extraction, and defenses against such attacks.
A Recurrent Graph Neural Network for Multi-Relational Data
The era of data deluge has sparked the interest in graph-based learning methods in a number of disciplines such as sociology, biology, neuroscience, or engineering. In this paper, we introduce a graph recurrent neural network (GRNN) for scalable semi-supervised learning from multi-relational data. Key aspects of the novel GRNN architecture are the use of multi-relational graphs, the dynamic adaptation to the different relations via learnable weights, and the consideration of graph-based regularizers to promote smoothness and alleviate over-parametrization. Our ultimate goal is to design a powerful learning architecture able to: discover complex and highly non-linear data associations, combine (and select) multiple types of relations, and scale gracefully with respect to the size of the graph. Numerical tests with real data sets corroborate the design goals and illustrate the performance gains relative to competing alternatives.
Generalization Bounds for Neural Networks: Kernels, Symmetry, and Sample Compression
Though Deep Neural Networks (DNNs) are widely celebrated for their practical performance, they demonstrate many intriguing phenomena related to depth that are difficult to explain both theoretically and intuitively. Understanding how weights in deep networks coordinate together across layers to form useful learners has proven somewhat intractable, in part because of the repeated composition of nonlinearities induced by depth. We present a reparameterization of DNNs as a linear function of a particular feature map that is locally independent of the weights. This feature map transforms depth-dependencies into simple {\em tensor} products and maps each input to a discrete subset of the feature space. Then, in analogy with logistic regression, we propose a max-margin assumption that enables us to present a so-called {\em sample compression} representation of the neural network in terms of the discrete activation state of neurons induced by s ‘support vectors’. We show how the number of support vectors relate to learning guarantees for neural networks through sample compression bounds, yielding a sample complexity O(ns/\epsilon) for networks with n neurons. Additionally, this number of support vectors has monotonic dependence on width, depth, and label noise for simple networks trained on the MNIST dataset.
QUOTA: The Quantile Option Architecture for Reinforcement Learning
In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL). In QUOTA, decision making is based on quantiles of a value distribution, not only the mean. QUOTA provides a new dimension for exploration via making use of both optimism and pessimism of a value distribution. We demonstrate the performance advantage of QUOTA in both challenging video games and physical robot simulators.
Mesh-TensorFlow: Deep Learning for Supercomputers
Simple, Distributed, and Accelerated Probabilistic Programming
We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction—the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64×64 ImageNet and Image Transformer on 256×256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.
Student’s t-Generative Adversarial Networks
Generative Adversarial Networks (GANs) have a great performance in image generation, but they need a large scale of data to train the entire framework, and often result in nonsensical results. We propose a new method referring to conditional GAN, which equipments the latent noise with mixture of Student’s t-distribution with attention mechanism in addition to class information. Student’s t-distribution has long tails that can provide more diversity to the latent noise. Meanwhile, the discriminator in our model implements two tasks simultaneously, judging whether the images come from the true data distribution, and identifying the class of each generated images. The parameters of the mixture model can be learned along with those of GANs. Moreover, we mathematically prove that any multivariate Student’s t-distribution can be obtained by a linear transformation of a normal multivariate Student’s t-distribution. Experiments comparing the proposed method with typical GAN, DeliGAN and DCGAN indicate that, our method has a great performance on generating diverse and legible objects with limited data.
Extended Isolation Forest
We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), improves the consistency and reliability of the anomaly score produced for a given data point. We show that the standard Isolation Forest produces inconsistent scores using score maps. The score maps suffer from an artifact generated as a result of how the criteria for branching operation of the binary tree is selected. We propose two different approaches for improving the situation. First we propose transforming the data randomly before creation of each tree, which results in averaging out the bias introduced in the algorithm. Second, which is the preferred way, is to allow the slicing of the data to use hyperplanes with random slopes. This approach results in improved score maps. We show that the consistency and reliability of the algorithm is much improved using this method by looking at the variance of scores of data points distributed along constant score lines. We find no appreciable difference in the rate of convergence nor in computation time between the standard Isolation Forest and EIF, which highlights its potential as anomaly detection algorithm.
Adaptive Stress Testing: Finding Failure Events with Reinforcement Learning
Finding the most likely path to a set of failure states is important to the analysis of safety-critical dynamic systems. While efficient solutions exist for certain classes of systems, a scalable general solution for stochastic, partially-observable, and continuous-valued systems remains challenging. Existing approaches in formal and simulation-based methods either cannot scale to large systems or are computationally inefficient. This paper presents adaptive stress testing (AST), a framework for searching a simulator for the most likely path to a failure event. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulation-based and does not require internal knowledge of the system. As a result, the approach is very suitable for black box testing of large systems. We present formulations for both systems where the state is fully-observable and partially-observable. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can be used to find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where one is concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where we stress test a prototype aircraft collision avoidance system to find high-probability scenarios of near mid-air collisions.
Credit Card Fraud Detection in e-Commerce: An Outlier Detection Approach
Often the challenge associated with tasks like fraud and spam detection is the lack of all likely patterns needed to train suitable supervised learning models. This problem accentuates when the fraudulent patterns are not only scarce, they also change over time. Change in fraudulent pattern is because fraudsters continue to innovate novel ways to circumvent measures put in place to prevent fraud. Limited data and continuously changing patterns makes learning significantly difficult. We hypothesize that good behavior does not change with time and data points representing good behavior have consistent spatial signature under different groupings. Based on this hypothesis we are proposing an approach that detects outliers in large data sets by assigning a consistency score to each data point using an ensemble of clustering methods. Our main contribution is proposing a novel method that can detect outliers in large datasets and is robust to changing patterns. We also argue that area under the ROC curve, although a commonly used metric to evaluate outlier detection methods is not the right metric. Since outlier detection problems have a skewed distribution of classes, precision-recall curves are better suited because precision compares false positives to true positives (outliers) rather than true negatives (inliers) and therefore is not affected by the problem of class imbalance. We show empirically that area under the precision-recall curve is a better than ROC as an evaluation metric. The proposed approach is tested on the modified version of the Landsat satellite dataset, the modified version of the ann-thyroid dataset and a large real world credit card fraud detection dataset available through Kaggle where we show significant improvement over the baseline methods.
Collaborative Filtering with Stability
Collaborative filtering (CF) is a popular technique in today’s recommender systems, and matrix approximation-based CF methods have achieved great success in both rating prediction and top-N recommendation tasks. However, real-world user-item rating matrices are typically sparse, incomplete and noisy, which introduce challenges to the algorithm stability of matrix approximation, i.e., small changes in the training data may significantly change the models. As a result, existing matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training data, and minimizing the training error may not guarantee error reduction on the test data. This paper investigates the algorithm stability problem of matrix approximation methods and how to achieve stable collaborative filtering via stable matrix approximation. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed method can achieve better accuracy compared with state-of-the-art matrix approximation methods and ensemble methods in both rating prediction and top-N recommendation tasks.
DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking
In recent years, Discriminative Correlation Filter (DCF) based tracking methods have achieved great success in visual tracking. However, the multi-resolution convolutional feature maps trained from other tasks like image classification, cannot be naturally used in the conventional DCF formulation. Furthermore, these high-dimensional feature maps significantly increase the tracking complexity and thus limit the tracking speed. In this paper, we present a deep and shallow feature learning network, namely DSNet, to learn the multi-level same-resolution compressed (MSC) features for efficient online tracking, in an end-to-end offline manner. Specifically, the proposed DSNet compresses multi-level convolutional features to uniform spatial resolution features. The learned MSC features effectively encode both appearance and semantic information of objects in the same-resolution feature maps, thus enabling an elegant combination of the MSC features with any DCF-based methods. Additionally, a channel reliability measurement (CRM) method is presented to further refine the learned MSC features. We demonstrate the effectiveness of the MSC features learned from the proposed DSNet on two DCF tracking frameworks: the basic DCF framework and the continuous convolution operator framework. Extensive experiments show that the learned MSC features have the appealing advantage of allowing the equipped DCF-based tracking methods to perform favorably against the state-of-the-art methods while running at high frame rates.
Hybrid Approach to Automation, RPA and Machine Learning: a Method for the Human-centered Design of Software Robots
One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach. The full implementation of RPA is riddled with challenges relating both to the reality of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centered approach to the development of software robots. This design and implementation method combines the Living Lab approach with empowerment through participatory design to kick-start the co-development and co-maintenance of hybrid software robots which, supported by variety of AI methods and tools, including interactive and collaborative ML in the cloud, transform menial job posts into higher-skilled positions, allowing former employees to stay on as robot co-designers and maintainers, i.e. as co-programmers who supervise the machine learning processes with the use of tailored high-level RPA Domain Specific Languages (DSLs) to adjust the functioning of the robots and maintain operational flexibility.
Day-ahead time series forecasting: application to capacity planning
In the context of capacity planning, forecasting the evolution of informatics servers usage enables companies to better manage their computational resources. We address this problem by collecting key indicator time series and propose to forecast their evolution a day-ahead. Our method assumes that data is structured by a daily seasonality, but also that there is typical evolution of indicators within a day. Then, it uses the combination of a clustering algorithm and Markov Models to produce day-ahead forecasts. Our experiments on real datasets show that the data satisfies our assumption and that, in the case study, our method outperforms classical approaches (AR, Holt-Winters).
Comparison of Discrete Choice Models and Artificial Neural Networks in Presence of Missing Variables
Classification, the process of assigning a label (or class) to an observation given its features, is a common task in many applications. Nonetheless in most real-life applications, the labels can not be fully explained by the observed features. Indeed there can be many factors hidden to the modellers. The unexplained variation is then treated as some random noise which is handled differently depending on the method retained by the practitioner. This work focuses on two simple and widely used supervised classification algorithms: discrete choice models and artificial neural networks in the context of binary classification. Through various numerical experiments involving continuous or discrete explanatory features, we present a comparison of the retained methods’ performance in presence of missing variables. The impact of the distribution of the two classes in the training data is also investigated. The outcomes of those experiments highlight the fact that artificial neural networks outperforms the discrete choice models, except when the distribution of the classes in the training data is highly unbalanced. Finally, this work provides some guidelines for choosing the right classifier with respect to the training data.
A Description Logic Framework for Commonsense Conceptual Combination Integrating Typicality, Probabilities and Cognitive Heuristics
We propose a nonmonotonic Description Logic of typicality able to account for the phenomenon of concept combination of prototypical concepts. The proposed logic relies on the logic of typicality ALC TR, whose semantics is based on the notion of rational closure, as well as on the distributed semantics of probabilistic Description Logics, and is equipped with a cognitive heuristic used by humans for concept composition. We first extend the logic of typicality ALC TR by typicality inclusions whose intuitive meaning is that there is probability p about the fact that typical Cs are Ds. As in the distributed semantics, we define different scenarios containing only some typicality inclusions, each one having a suitable probability. We then focus on those scenarios whose probabilities belong to a given and fixed range, and we exploit such scenarios in order to ascribe typical properties to a concept C obtained as the combination of two prototypical concepts. We also show that reasoning in the proposed Description Logic is EXPTIME-complete as for the underlying ALC.
Synaptic Strength For Convolutional Neural Network
Convolutional Neural Networks(CNNs) are both computation and memory intensive which hindered their deployment in mobile devices. Inspired by the relevant concept in neural science literature, we propose Synaptic Pruning: a data-driven method to prune connections between input and output feature maps with a newly proposed class of parameters called Synaptic Strength. Synaptic Strength is designed to capture the importance of a connection based on the amount of information it transports. Experiment results show the effectiveness of our approach. On CIFAR-10, we prune connections for various CNN models with up to 96% , which results in significant size reduction and computation saving. Further evaluation on ImageNet demonstrates that synaptic pruning is able to discover efficient models which is competitive to state-of-the-art compact CNNs such as MobileNet-V2 and NasNet-Mobile. Our contribution is summarized as following: (1) We introduce Synaptic Strength, a new class of parameters for CNNs to indicate the importance of each connections. (2) Our approach can prune various CNNs with high compression without compromising accuracy. (3) Further investigation shows, the proposed Synaptic Strength is a better indicator for kernel pruning compared with the previous approach in both empirical result and theoretical analysis.
A Novel Variational Family for Hidden Nonlinear Markov Models
Latent variable models have been widely applied for the analysis and visualization of large datasets. In the case of sequential data, closed-form inference is possible when the transition and observation functions are linear. However, approximate inference techniques are usually necessary when dealing with nonlinear dynamics and observation functions. Here, we propose a novel variational inference framework for the explicit modeling of time series, Variational Inference for Nonlinear Dynamics (VIND), that is able to uncover nonlinear observation and transition functions from sequential data. The framework includes a structured approximate posterior, and an algorithm that relies on the fixed-point iteration method to find the best estimate for latent trajectories. We apply the method to several datasets and show that it is able to accurately infer the underlying dynamics of these systems, in some cases substantially outperforming state-of-the-art methods.
Concept Learning with Energy-Based Models
Many hallmarks of human intelligence, such as generalizing from limited experience, abstract reasoning and planning, analogical reasoning, creative problem solving, and capacity for language require the ability to consolidate experience into concepts, which act as basic building blocks of understanding and reasoning. We present a framework that defines a concept by an energy function over events in the environment, as well as an attention mask over entities participating in the event. Given few demonstration events, our method uses inference-time optimization procedure to generate events involving similar concepts or identify entities involved in the concept. We evaluate our framework on learning visual, quantitative, relational, temporal concepts from demonstration events in an unsupervised manner. Our approach is able to successfully generate and identify concepts in a few-shot setting and resulting learned concepts can be reused across environments. Example videos of our results are available at sites.google.com/site/energyconceptmodels
Computing Entity Semantic Similarity by Features Ranking
This article presents a novel approach to estimate semantic entity similarity using entity features available as Linked Data. The key idea is to exploit ranked lists of features, extracted from Linked Data sources, as a representation of the entities to be compared. The similarity between two entities is then estimated by comparing their ranked lists of features. The article describes experiments with museum data from DBpedia, with datasets from a LOD catalog, and with computer science conferences from the DBLP repository. The experiments demonstrate that entity similarity, computed using ranked lists of features, achieves better accuracy than state-of-the-art measures.
Double Adaptive Stochastic Gradient Optimization
Adaptive moment methods have been remarkably successful in deep learning optimization, particularly in the presence of noisy and/or sparse gradients. We further the advantages of adaptive moment techniques by proposing a family of double adaptive stochastic gradient methods~\textsc{DASGrad}. They leverage the complementary ideas of the adaptive moment algorithms widely used by deep learning community, and recent advances in adaptive probabilistic algorithms.We analyze the theoretical convergence improvements of our approach in a stochastic convex optimization setting, and provide empirical validation of our findings with convex and non convex objectives. We observe that the benefits of~\textsc{DASGrad} increase with the model complexity and variability of the gradients, and we explore the resulting utility in extensions of distribution-matching multitask learning.
A Model for General Intelligence
The overarching problem in artificial intelligence (AI) is that we do not understand the intelligence process well enough to enable the development of adequate computational models. Much work has been done in AI over the years at lower levels, but a big part of what has been missing involves the high level, abstract, general nature of intelligence. We address this gap by developing a model for general intelligence. To accomplish this, we focus on three basic aspects of intelligence. First, we must realize the general order and nature of intelligence at a high level. Second, we must come to know what these realizations mean with respect to the overall intelligence process. Third, we must describe these realizations as clearly as possible. We propose a hierarchical model to help capture and exploit the order within intelligence. The underlying order involves patterns of signals that become organized, stored and activated in space and time. These patterns can be described using a simple, general hierarchy, with physical signals at the lowest level, information in the middle, and abstract signal representations at the top. This high level perspective provides a big picture that literally helps us see the intelligence process, thereby enabling fundamental realizations, a better understanding and clear descriptions of the intelligence process. The resulting model can be used to support all kinds of information processing across multiple levels of abstraction. As computer technology improves, and as cooperation increases between humans and computers, people will become more efficient and more productive in performing their information processing tasks.
Language GANs Falling Short
Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have constantly been reported as weak baselines, where poor performance is attributed to exposure bias; at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, wake make several surprising observations with contradict common beliefs. We first revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model’s conditional distributions. Second, we leverage the control over the quality / diversity tradeoff given by this parameter to evaluate models over the whole quality-diversity spectrum, and find MLE models constantly outperform the proposed GAN variants, over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade off than adversarial training, while being easier to train, easier to cross-validate, and less computationally expensive.
• Convolutional LSTMs for Cloud-Robust Segmentation of Remote Sensing Imagery• Recent advances in methodology for clinical trials in small populations: the InSPiRe project• The Hausdorff dimension function of the family of conformal iterated function systems of generalized complex continued fractions• Design of non-uniformly spaced phase-stepped algorithms using their frequency transfer function• Optical Wireless Cochlear Implants• List Coloring a Cartesian Product with a Complete Bipartite Factor• Low-Rank Tensor Modeling for Hyperspectral Unmixing Accounting for Spectral Variability• Mobile Edge Cloud: Opportunities and Challenges• Variational Bayes Inference in Digital Receivers• Quantifying Uncertainty in High Dimensional Inverse Problems by Convex Optimisation• A Unified Adaptive Tensor Approximation Scheme to Accelerate Composite Convex Optimization• Superregular grammars do not provide additional explanatory power but allow for a compact analysis of animal song• A personal model of trumpery: Deception detection in a real-world high-stakes setting• Dynamic Programming Deconstructed• Trainable Adaptive Window Switching for Speech Enhancement• ADMM for ND Line Spectral Estimation using Grid-Free Compressive Sensing from Multiple Measurements with Applications to DOA Estimation• A practical method for the consistent identification of a module in a dynamical network• Blameworthiness in Games with Imperfect Information• Chaotic Quantum Double Delta Swarm Algorithm using Chebyshev Maps: Theoretical Foundations, Performance Analyses and Convergence Issues• A Differential Volumetric Approach to Multi-View Photometric Stereo• Compact Personalized Models for Neural Machine Translation• On the neighborliness of dual flow polytopes of quivers• The Sparsest Additive Spanner via Multiple Weighted BFS Trees• Continuously Differentiable Analytical Models for Implicit Control within Power Flow• Hardware Distortion Correlation Has Negligible Impact on UL Massive MIMO Spectral Efficiency• Towards a Unified Theory of Sparsification for Matching Problems• A Unified Perspective of Evolutionary Game Dynamics Using Generalized Growth Transforms• SkyLogic – A proposal for a skyrmion logic device• A General Theory of Equivariant CNNs on Homogeneous Spaces• Expected Chromatic Number of Random Subgraphs• Using GitHub Classroom To Teach Statistics• Limits of Ordered Graphs and Images• The free-fermion eight-vertex model: couplings, bipartite dimers and Z-invariance• Physics-Informed Generative Adversarial Networks for Stochastic Differential Equations• Random walks generated by Ewens distribution on the symmetric group• An improved exact algorithm and an NP-completeness proof for sparse matrix bipartitioning• Non-Local Compressive Sensing Based SAR Tomography• Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation• Managing engineering systems with large state and action spaces through deep reinforcement learning• Throughput-based Design for Polar Coded-Modulation• The Marchex 2018 English Conversational Telephone Speech Recognition System• STAR: Scaling Transactions through Asymmetrical Replication• End-to-End Monaural Multi-speaker ASR System without Pretraining• When CTC Training Meets Acoustic Landmarks• How to Improve Your Speaker Embeddings Extractor in Generic Toolkits• False Analog Data Injection Attack Towards Topology Errors: Formulation and Feasibility Analysis• On the asymptotics of Maronna’s robust PCA• Blind Two-Dimensional Super-Resolution and Its Performance Guarantee• Scale-free Networks Well Done• Leveraging Virtual and Real Person for Unsupervised Person Re-identification• Improving Span-based Question Answering Systems with Coarsely Labeled Data• Optimal Succinct Rank Data Structure via Approximate Nonnegative Tensor Decomposition• Motif and Hypergraph Correlation Clustering• Classification of 12-Lead ECG Signals with Bi-directional LSTM Network• Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement• Scale calibration for high-dimensional robust regression• Software Defined Radio Implementation of Carrier and Timing Synchronization for Distributed Arrays• Image-Based Reconstruction for a 3D-PFHS Heat Transfer Problem by ReConNN• Randomization Tests for Equality in Dependence Structure• The impact of air transport availability on research collaboration: A case study of four universities• On the role of neurogenesis in overcoming catastrophic forgetting• DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences• Modeling and Predicting Popularity Dynamics via Deep Learning Attention Mechanism• Robust and fine-grained prosody control of end-to-end speech synthesis• Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures• Transfer learning of language-independent end-to-end ASR with language model fusion• Distributed UAV Placement Optimization for Cooperative Line-of-Sight MIMO Communications• Properties of Norms on Sets• Erasure coding for distributed matrix multiplication for matrices with bounded entries• TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents• A New Analysis for Support Recovery with Block Orthogonal Matching Pursuit• Solution Refinement at Regular Points of Conic Problems• How Many Pairwise Preferences Do We Need to Rank A Graph Consistently?• DIAG-NRE: A Deep Pattern Diagnosis Framework for Distant Supervision Neural Relation Extraction• Knuth’s Moves on Timed Words• Comments Regarding On the Identifiability of the Influence Model for Stochastic Spatiotemporal Spread Processes’• Neural Phrase-to-Phrase Machine Translation• The entropy of lies: playing twenty questions with a liar• Fast OBDD Reordering using Neural Message Passing on Hypergraph• Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition• A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning• Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators• BLP – Boundary Likelihood Pinpointing Networks for Accurate Temporal Action Localization• 3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds• In-the-wild Facial Expression Recognition in Extreme Poses• Optimal singular value shrinkage with noise homogenization• On-the-fly Large-scale Channel-Gain Estimation for Massive Antenna-Array Base Stations• On-the-fly Uplink Training and Pilot Code Sequence Design for Cellular Networks• Cuffless Blood Pressure Estimation from Electrocardiogram and Photoplethysmogram Using Waveform Based ANN-LSTM Network• An Optimal Itinerary Generation in a Configuration Space of Large Intellectual Agent Groups with Linear Logic• Real-Time Prediction for Fine-Grained Air Quality Monitoring System with Asynchronous Sensing• contextual: Evaluating Contextual Multi-Armed Bandit Problems in R• Direct families of polytopes with nontrivial Massey products• A Quasi-Newton algorithm on the orthogonal manifold for NMF with transform learning• The heavy range of randomly biased walks on trees• Kernel Exponential Family Estimation via Doubly Dual Embedding• CIS at TAC Cold Start 2015: Neural Networks and Coreference Resolution for Slot Filling• Weakly Supervised Scene Parsing with Point-based Distance Metric Learning• Semantic bottleneck for computer vision tasks• Limits of Order Types• SparseFool: a few pixels make a big difference• On the Resource Consumption of M2M Random Access: Efficiency and Pareto Optimality• On partition identities of Capparelli and Primc• Vine copula based post-processing of ensemble forecasts for temperature• A sharp inequality for Kendall’s $τ$ and Spearman’s $ρ$ of Extreme-Value Copulas• A Lattice Isomorphism Theorem for Cluster Groups of Mutation-Dynkin Type $A_{n}$• Characterizations and Directed Path-Width of Sequence Digraphs• A Novel Square Wave Generator Based on the Translinear Circuit Scheme of Second Generation Current Controlled Current Conveyor-CCCII• Some Remarks on the Dirichlet Problem on Infinite Trees• A Parallel MOEA with Criterion-based Selection Applied to the Knapsack Problem• Off-the-Shelf Unsupervised NMT• Revealing Fine Structures of the Retinal Receptive Field by Deep Learning Networks• Infrared and visible image fusion using a novel deep decomposition method• A Backstepping control strategy for constrained tendon driven robotic finger• Modular Materialisation of Datalog Programs• Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning• Fast Adaptive Bilateral Filtering• An Enhanced Multi-Objective Biogeography-Based Optimization Algorithm for Automatic Detection of Overlapping Communities in a Social Network with Node Attributes• Kernel Regression for Graph Signal Prediction in Presence of Sparse Noise• Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning• Statistical model of the human RF exposure in Small cells environment• Recurrent Skipping Networks for Entity Alignment• Fast Hyperparameter Optimization of Deep Neural Networks via Ensembling Multiple Surrogates• Hierarchical Neural Network Architecture In Keyword Spotting• Elastic CoCoA: Scaling In to Improve Convergence• Super-Identity Convolutional Neural Network for Face Hallucination• The Eternal Game Chromatic Number of a Graph• Learning to Embed Sentences Using Attentive Recursive Trees• Large deviation theory of percolation on multiplex networks• New degenerated polynomials arising from non-classical Umbral Calculus• Scanning integer points with lex-cuts: A finite cutting plane algorithm for integer programming with linear objective• An Incentive Analysis of some Bitcoin Fee Design• Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation• Object 3D Reconstruction based on Photometric Stereo and Inverted Rendering• Reinforcement learning-based waveform optimization for MIMO multi-target detection• Micro-Attention for Micro-Expression recognition• Kalman Filter Modifier for Neural Networks in Non-stationary Environments• Fast High-Dimensional Bilateral and Nonlocal Means Filtering• Effective Subword Segmentation for Text Comprehension• Identificação automática de pichação a partir de imagens urbanas• Sets of autoencoders with shared latent spaces• Robust Bhattacharyya bound linear discriminant analysis through adaptive algorithm• Fine-grained Apparel Classification and Retrieval without rich annotations• Local-Encoding-Preserving Secure Network Coding—Part I: Fixed Security Level• Local-Encoding-Preserving Secure Network Coding—Part II: Flexible Rate and Security Level• DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization• A
Little Bit’ Too Much? High Speed Imaging from Sparse Photon Counts• Realisation of Highly Precise and Low Power Tunable Voltage Amplifier Based on the Translinear Circuit Scheme of CCCII+• Architecture of Distributed Data Storage for Astroparticle Physics• The CDE property for skew vexillary permutations• Copula-based robust optimal block designs• Universality for persistence exponents of local times of self-similar processes with stationary increments• On the Turnpike Property and the Receding-Horizon Method for Linear-Quadratic Optimal Control Problems• The Effect of the Terminal Penalty in Receding Horizon Control for a Class of Stabilization Problems• WordNet-feelings: A linguistic categorisation of human feelings• Meta Distribution of Downlink Non-Orthogonal Multiple Access (NOMA) in Poisson Networks• Tensor norms on ordered normed spaces, polarization constants, and exchangeable distributions• On the Number of Order Types in Integer Grids of Small Size• Semantic Term ‘Blurring’ and Stochastic ‘Barcoding’ for Improved Unsupervised Text Classification• Tunneling on Wheeler Graphs• User equilibrium with a policy-based link transmission model for stochastic time-dependent traffic networks• Evolvement Constrained Adversarial Learning for Video Style Transfer• Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments• Mesh-Based Affine Abstraction of Nonlinear Systems with Tighter Bounds• Deep Reinforcement Learning for Green Security Games with Real-Time Information• Radio resource management for high-speed wireless cellular networks• Unifying Probabilistic Models for Time-Frequency Analysis• $k$-Schur expansions of Catalan functions• Towards continual learning in medical imaging• Unified Low Complexity Radix-2 Architectures for Time and Frequency-domain GFDM Modem• Searching for a source of difference in Gaussian graphical models• UAlacant machine translation quality estimation at WMT 2018: a simple approach using phrase tables and feed-forward neural networks• From the flat-space S-matrix to the Wavefunction of the Universe• Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE• Duality for the robust sum of functions• Solving SAT and MaxSAT with a Quantum Annealer: Foundations, Encodings, and Preliminary Results• Interactive coding resilient to an unknown number of erasures• Discriminative training of RNNLMs with the average word error criterion• Billiards with Markovian reflection laws• The Role of Demand-Side Flexibility in Hedging Electricity Price Volatility in Distribution Grids• Deep feature transfer between localization and segmentation tasks• Composability of Regret Minimizers• Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond• Debiased Inference of Average Partial Effects in Single-Index Models• Strange Expectations and the Winnie-the-Pooh Problem• Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?• Quantizers with Parameterized Distortion Measures
Like this:
Like Loading…
Related