Path Space Cochains and Population Time Series Analysis
One of the core advantages topological methods for data analysis provide is that the language of (co)chains can be mapped onto the semantics of the data, providing a natural avenue for human understanding of the results. Here, we describe such a semantic structure on Chen’s classical iterated integral cochain model for paths in Euclidean space. Specifically, in the context of population time series data, we observe that iterated integrals provide a model-free measure of pairwise influence that can be used for causality inference. Along the way, we survey the construction of the iterated integral model, including recent results and applications, and briefly survey the current standard methods for causality inference.
Data Science as Political Action: Grounding Data Science in a Politics of Justice
In response to recent controversies, the field of data science has rushed to adopt codes of ethics. Such professional codes, however, are ill-equipped to address broad matters of social justice. Instead of ethics codes, I argue, the field must embrace politics. Data scientists must recognize themselves as political actors engaged in normative constructions of society and, as befits political work, evaluate their work according to its downstream material impacts on people’s lives. I justify this notion in two parts: first, by articulating why data scientists must recognize themselves as political actors, and second, by describing how the field can evolve toward a deliberative and rigorous grounding in a politics of social justice. Part 1 responds to three common arguments that have been invoked by data scientists when they are challenged to take political positions regarding their work. In confronting these arguments, I will demonstrate why attempting to remain apolitical is itself a political stance–a fundamentally conservative one–and why the field’s current attempts to promote ‘social good’ dangerously rely on vague and unarticulated political assumptions. Part 2 proposes a framework for what a politically-engaged data science could look like and how to achieve it, recognizing the challenge of reforming the field in this manner. I conceptualize the process of incorporating politics into data science in four stages: becoming interested in directly addressing social issues, recognizing the politics underlying these issues, redirecting existing methods toward new applications, and, finally, developing new practices and methods that orient data science around a mission of social justice. The path ahead does not require data scientists to abandon their technical expertise, but it does entail expanding their notions of what problems to work on and how to engage with society.
Optimized Hidden Markov Model based on Constrained Particle Swarm Optimization
As one of Bayesian analysis tools, Hidden Markov Model (HMM) has been used to in extensive applications. Most HMMs are solved by Baum-Welch algorithm (BWHMM) to predict the model parameters, which is difficult to find global optimal solutions. This paper proposes an optimized Hidden Markov Model with Particle Swarm Optimization (PSO) algorithm and so is called PSOHMM (Particle Swarm Optimization Hidden Markov Model). In order to overcome the statistical constraints in HMM, the paper develops re-normalization and re-mapping mechanisms to ensure the constraints in HMM. The experiments have shown that PSOHMM can search better solution than BWHMM, and has faster convergence speed.
Explaining Deep Learning Models – A Bayesian Non-parametric Approach
Understanding and interpreting how machine learning (ML) models make decisions have been a big challenge. While recent research has proposed various technical approaches to provide some clues as to how an ML model makes individual predictions, they cannot provide users with an ability to inspect a model as a complete entity. In this work, we propose a novel technical approach that augments a Bayesian non-parametric regression mixture model with multiple elastic nets. Using the enhanced mixture model, we can extract generalizable insights for a target model through a global approximation. To demonstrate the utility of our approach, we evaluate it on different ML models in the context of image recognition. The empirical results indicate that our proposed approach not only outperforms the state-of-the-art techniques in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of the target ML models.
Blockwise Parallel Decoding for Deep Autoregressive Models
Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make different trade-offs between the amount of computation needed per layer and the length of the critical path at training time, generation still remains an inherently sequential process. To overcome this limitation, we propose a novel blockwise parallel decoding scheme in which we make predictions for multiple time steps in parallel then back off to the longest prefix validated by a scoring model. This allows for substantial theoretical improvements in generation speed when applied to architectures that can process output sequences in parallel. We verify our approach empirically through a series of experiments using state-of-the-art self-attention models for machine translation and image super-resolution, achieving iteration reductions of up to 2x over a baseline greedy decoder with no loss in quality, or up to 7x in exchange for a slight decrease in performance. In terms of wall-clock time, our fastest models exhibit real-time speedups of up to 4x over standard greedy decoding.
Time Series Classification to Improve Poultry Welfare
Poultry farms are an important contributor to the human food chain. Worldwide, humankind keeps an enormous number of domesticated birds (e.g. chickens) for their eggs and their meat, providing rich sources of low-fat protein. However, around the world, there have been growing concerns about the quality of life for the livestock in poultry farms; and increasingly vocal demands for improved standards of animal welfare. Recent advances in sensing technologies and machine learning allow the possibility of automatically assessing the health of some individual birds, and employing the lessons learned to improve the welfare for all birds. This task superficially appears to be easy, given the dramatic progress in recent years in classifying human behaviors, and given that human behaviors are presumably more complex. However, as we shall demonstrate, classifying chicken behaviors poses several unique challenges, chief among which is creating a generalizable dictionary of behaviors from sparse and noisy data. In this work we introduce a novel time series dictionary learning algorithm that can robustly learn from weakly labeled data sources.
Contrastive Explanation: A Structural-Model Approach
The topic of causal explanation in artificial intelligence has gathered interest in recent years as researchers and practitioners aim to increase trust and understanding of intelligent decision-making and action. While different sub-fields have looked into this problem with a sub-field-specific view, there are few models that aim to capture explanation in AI more generally. One general model is based on structural causal models. It defines an explanation as a fact that, if found to be true, would constitute an actual cause of a specific event. However, research in philosophy and social sciences shows that explanations are contrastive: that is, when people ask for an explanation of an event — the fact — they (sometimes implicitly) are asking for an explanation relative to some contrast case; that is, ‘Why P rather than Q?’. In this paper, we extend the structural causal model approach to define two complementary notions of contrastive explanation, and demonstrate them on two classical AI problems: classification and planning. We believe that this model can be used to define contrastive explanation of other subfield-specific AI models.
Attention Fusion Networks: Combining Behavior and E-mail Content to Improve Customer Support
Customer support is a central objective at Square as it helps us build and maintain great relationships with our sellers. In order to provide the best experience, we strive to deliver the most accurate and quasi-instantaneous responses to questions regarding our products. In this work, we introduce the Attention Fusion Network model which combines signals extracted from seller interactions on the Square product ecosystem, along with submitted email questions, to predict the most relevant solution to a seller’s inquiry. We show that the innovative combination of two very different data sources that are rarely used together, using state-of-the-art deep learning systems outperforms, candidate models that are trained only on a single source.
Kinetic Euclidean Distance Matrices
Euclidean distance matrices (EDMs) are a major tool for localization from distances, with applications ranging from protein structure determination to global positioning and manifold learning. They are, however, static objects which serve to localize points from a snapshot of distances. If the objects move, one expects to do better by modeling the motion. In this paper, we introduce Kinetic Euclidean Distance Matrices (KEDMs)—a new kind of time-dependent distance matrices that incorporate motion. The entries of KEDMs become functions of time, the squared time-varying distances. We study two smooth trajectory models—polynomial and bandlimited trajectories—and show that these trajectories can be reconstructed from incomplete, noisy distance observations, scattered over multiple time instants. Our main contribution is a semidefinite relaxation (SDR), inspired by SDRs for static EDMs. Similarly to the static case, the SDR is followed by a spectral factorization step; however, because spectral factorization of polynomial matrices is more challenging than for constant matrices, we propose a new factorization method that uses anchor measurements. Extensive numerical experiments show that KEDMs and the new semidefinite relaxation accurately reconstruct trajectories from noisy, incomplete distance data and that, in fact, motion improves rather than degrades localization if properly modeled. This makes KEDMs a promising tool for problems in geometry of dynamic points sets.
Private Continual Release of Real-Valued Data Streams
We present a differentially private mechanism to display statistics (e.g., the moving average) of a stream of real valued observations where the bound on each observation is either too conservative or unknown in advance. This is particularly relevant to scenarios of real-time data monitoring and reporting, e.g., energy data through smart meters. Our focus is on real-world data streams whose distribution is light-tailed, meaning that the tail approaches zero at least as fast as the exponential distribution. For such data streams, individual observations are expected to be concentrated below an unknown threshold. Estimating this threshold from the data can potentially violate privacy as it would reveal particular events tied to individuals [1]. On the other hand an overly conservative threshold may impact accuracy by adding more noise than necessary. We construct a utility optimizing differentially private mechanism to release this threshold based on the input stream. Our main advantage over the state-of-the-art algorithms is that the resulting noise added to each observation of the stream is scaled to the threshold instead of a possibly much larger bound; resulting in considerable gain in utility when the difference is significant. Using two real-world datasets, we demonstrate that our mechanism, on average, improves the utility by a factor of 3.5 on the first dataset, and 9 on the other. While our main focus is on continual release of statistics, our mechanism for releasing the threshold can be used in various other applications where a (privacy-preserving) measure of the scale of the input distribution is required.
Confusion2Vec: Towards Enriching Vector Space Word Representations with Representational Ambiguities
Word vector representations are a crucial part of Natural Language Processing (NLP) and Human Computer Interaction. In this paper, we propose a novel word vector representation, Confusion2Vec, motivated from the human speech production and perception that encodes representational ambiguity. Humans employ both acoustic similarity cues and contextual cues to decode information and we focus on a model that incorporates both sources of information. The representational ambiguity of acoustics, which manifests itself in word confusions, is often resolved by both humans and machines through contextual cues. A range of representational ambiguities can emerge in various domains further to acoustic perception, such as morphological transformations, paraphrasing for NLP tasks like machine translation etc. In this work, we present a case study in application to Automatic Speech Recognition (ASR), where the word confusions are related to acoustic similarity. We present several techniques to train an acoustic perceptual similarity representation ambiguity. We term this Confusion2Vec and learn on unsupervised-generated data from ASR confusion networks or lattice-like structures. Appropriate evaluations for the Confusion2Vec are formulated for gauging acoustic similarity in addition to semantic-syntactic and word similarity evaluations. The Confusion2Vec is able to model word confusions efficiently, without compromising on the semantic-syntactic word relations, thus effectively enriching the word vector space with extra task relevant ambiguity information. We provide an intuitive exploration of the 2-dimensional Confusion2Vec space using Principal Component Analysis of the embedding and relate to semantic, syntactic and acoustic relationships. The potential of Confusion2Vec in the utilization of uncertainty present in lattices is demonstrated through small examples relating to ASR error correction.
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
An activation boundary for a neuron refers to a separating hyperplane that determines whether the neuron is activated or deactivated. It has been long considered in neural networks that the activations of neurons, rather than their exact output values, play the most important role in forming classification friendly partitions of the hidden feature space. However, as far as we know, this aspect of neural networks has not been considered in the literature of knowledge transfer. In this paper, we propose a knowledge transfer method via distillation of activation boundaries formed by hidden neurons. For the distillation, we propose an activation transfer loss that has the minimum value when the boundaries generated by the student coincide with those by the teacher. Since the activation transfer loss is not differentiable, we design a piecewise differentiable loss approximating the activation transfer loss. By the proposed method, the student learns a separating boundary between activation region and deactivation region formed by each neuron in the teacher. Through the experiments in various aspects of knowledge transfer, it is verified that the proposed method outperforms the current state-of-the-art.
Short Term Load Forecasting Using Deep Neural Networks
Electricity load forecasting plays an important role in the energy planning such as generation and distribution. However, the nonlinearity and dynamic uncertainties in the smart grid environment are the main obstacles in forecasting accuracy. Deep Neural Network (DNN) is a set of intelligent computational algorithms that provide a comprehensive solution for modelling a complicated nonlinear relationship between the input and output through multiple hidden layers. In this paper, we propose DNN based electricity load forecasting system to manage the energy consumption in an efficient manner. We investigate the applicability of two deep neural network architectures Feed-forward Deep Neural Network (Deep-FNN) and Recurrent Deep Neural Network (Deep-RNN) to the New York Independent System Operator (NYISO) electricity load forecasting task. We test our algorithm with various activation functions such as Sigmoid, Hyperbolic Tangent (tanh) and Rectifier Linear Unit (ReLU). The performance measurement of two network architectures is compared in terms of Mean Absolute Percentage Error (MAPE) metric.
Towards Compositional Distributional Discourse Analysis
Categorical compositional distributional semantics provide a method to derive the meaning of a sentence from the meaning of its individual words: the grammatical reduction of a sentence automatically induces a linear map for composing the word vectors obtained from distributional semantics. In this paper, we extend this passage from word-to-sentence to sentence-to-discourse composition. To achieve this we introduce a notion of basic anaphoric discourses as a mid-level representation between natural language discourse formalised in terms of basic discourse representation structures (DRS); and knowledge base queries over the Semantic Web as described by basic graph patterns in the Resource Description Framework (RDF). This provides a high-level specification for compositional algorithms for question answering and anaphora resolution, and allows us to give a picture of natural language understanding as a process involving both statistical and logical resources.
Doc2Im: document to image conversion through self-attentive embedding
Text classification is a fundamental task in NLP applications. Latest research in this field has largely been divided into two major sub-fields. Learning representations is one sub-field and learning deeper models, both sequential and convolutional, which again connects back to the representation is the other side. We posit the idea that the stronger the representation is, the simpler classifier models are needed to achieve higher performance. In this paper we propose a completely novel direction to text classification research, wherein we convert text to a representation very similar to images, such that any deep network able to handle images is equally able to handle text. We take a deeper look at the representation of documents as an image and subsequently utilize very simple convolution based models taken as is from computer vision domain. This image can be cropped, re-scaled, re-sampled and augmented just like any other image to work with most of the state-of-the-art large convolution based models which have been designed to handle large image datasets. We show impressive results with some of the latest benchmarks in the related fields. We perform transfer learning experiments, both from text to text domain and also from image to text domain. We believe this is a paradigm shift from the way document understanding and text classification has been traditionally done, and will drive numerous novel research ideas in the community.
Linear Memory Networks
Recurrent neural networks can learn complex transduction problems that require maintaining and actively exploiting a memory of their inputs. Such models traditionally consider memory and input-output functionalities indissolubly entangled. We introduce a novel recurrent architecture based on the conceptual separation between the functional input-output transformation and the memory mechanism, showing how they can be implemented through different neural components. By building on such conceptualization, we introduce the Linear Memory Network, a recurrent model comprising a feedforward neural network, realizing the non-linear functional transformation, and a linear autoencoder for sequences, implementing the memory component. The resulting architecture can be efficiently trained by building on closed-form solutions to linear optimization problems. Further, by exploiting equivalence results between feedforward and recurrent neural networks we devise a pretraining schema for the proposed architecture. Experiments on polyphonic music datasets show competitive results against gated recurrent networks and other state of the art models.
Meta-Learning for Multi-objective Reinforcement Learning
Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found, each optimizing a preference of the objectives. In this paper, we introduce a novel MORL approach by training a meta-policy, a policy simultaneously trained with multiple tasks sampled from a task distribution, for a number of randomly sampled Markov decision processes (MDPs). In other words, the MORL is framed as a meta-learning problem, with the task distribution given by a distribution over the preferences. We demonstrate that such a formulation results in a better approximation of the Pareto optimal solutions, in terms of both the optimality and the computational efficiency. We evaluated our method on obtaining Pareto optimal policies using a number of continuous control problems with high degrees of freedom.
Activation Functions: Comparison of trends in Practice and Research for Deep Learning
Deep neural networks have been successfully used in diverse emerging domains to solve real world complex problems with may more deep learning(DL) architectures, being developed to date. To achieve these state-of-the-art performances, the DL architectures use activation functions (AFs), to perform diverse computations between the hidden layers and the output layers of any given DL architecture. This paper presents a survey on the existing AFs used in deep learning applications and highlights the recent trends in the use of the activation functions for deep learning applications. The novelty of this paper is that it compiles majority of the AFs used in DL and outlines the current trends in the applications and usage of these functions in practical deep learning deployments against the state-of-the-art research results. This compilation will aid in making effective decisions in the choice of the most suitable and appropriate activation function for any given application, ready for deployment. This paper is timely because most research papers on AF highlights similar works and results while this paper will be the first, to compile the trends in AF applications in practice against the research results from literature, found in deep learning research to date.
Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing
Knowledge tracing is a sequence prediction problem where the goal is to predict the outcomes of students over questions as they are interacting with a learning platform. By tracking the evolution of the knowledge of some student, one can optimize instruction. Existing methods are either based on temporal latent variable models, or factor analysis with temporal features. We here show that factorization machines (FMs), a model for regression or classification, encompass several existing models in the educational literature as special cases, notably additive factor model, performance factor model, and multidimensional item response theory. We show, using several real datasets of tens of thousands of users and items, that FMs can estimate student knowledge accurately and fast even when student data is sparsely observed, and handle side information such as multiple knowledge components and number of attempts at item or skill level. Our approach allows to fit student models of higher dimension than existing models, and provides a testbed to try new combinations of features in order to improve existing models.
Transformative Machine Learning
The key to success in machine learning (ML) is the use of effective data representations. Traditionally, data representations were hand-crafted. Recently it has been demonstrated that, given sufficient data, deep neural networks can learn effective implicit representations from simple input representations. However, for most scientific problems, the use of deep learning is not appropriate as the amount of available data is limited, and/or the output models must be explainable. Nevertheless, many scientific problems do have significant amounts of data available on related tasks, which makes them amenable to multi-task learning, i.e. learning many related problems simultaneously. Here we propose a novel and general representation learning approach for multi-task learning that works successfully with small amounts of data. The fundamental new idea is to transform an input intrinsic data representation (i.e., handcrafted features), to an extrinsic representation based on what a pre-trained set of models predict about the examples. This transformation has the dual advantages of producing significantly more accurate predictions, and providing explainable models. To demonstrate the utility of this transformative learning approach, we have applied it to three real-world scientific problems: drug-design (quantitative structure activity relationship learning), predicting human gene expression (across different tissue types and drug treatments), and meta-learning for machine learning (predicting which machine learning methods work best for a given problem). In all three problems, transformative machine learning significantly outperforms the best intrinsic representation.
Untangling the GDPR Using ConRelMiner
The General Data Protection Regulation (GDPR) poses enormous challenges on companies and organizations with respect to understanding, implementing, and maintaining the contained constraints. We report on how the ConRelMiner method can be used for untangling the GDPR. For this, the GDPR is filtered and grouped along the roles mentioned by the GDPR and the reduction of sentences to be read by analysts is shown. Moreover, the output of the ConRelMiner – a cluster graph with relations between the sentences – is displayed and interpreted. Overall the goal is to illustrate how the effort for implementing the GDPR can be reduced and a structured and meaningful representation of the relevant GDPR sentences can be found.
A Survey on Data Collection for Machine Learning: a Big Data – AI Integration Perspective
Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning where feature engineering is the bottleneck, deep learning techniques automatically generate features, but instead require large amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.
ExGate: Externally Controlled Gating for Feature-based Attention in Artificial Neural Networks
Perceptual capabilities of artificial systems have come a long way since the advent of deep learning. These methods have proven to be effective, however they are not as efficient as their biological counterparts. Visual attention is a set of mechanisms that are employed in biological visual systems to ease computational load by only processing pertinent parts of the stimuli. This paper addresses the implementation of top-down, feature-based attention in an artificial neural network by use of externally controlled neuron gating. Our results showed a 5% increase in classification accuracy on the CIFAR-10 dataset versus a non-gated version, while adding very few parameters. Our gated model also produces more reasonable errors in predictions by drastically reducing prediction of classes that belong to a different category to the true class.
Alpha-Pooling for Convolutional Neural Networks
Disentangling Latent Factors with Whitening
After the success of deep generative models in image generation tasks, learning disentangled latent variable of data has become a major part of deep learning research. Many models have been proposed to learn an interpretable and factorized representation of latent variable by modifying their objective function or model architecture. While disentangling the latent variable, some models show lower quality of reconstructed images and others increase the model complexity which is hard to train. In this paper, we propose a simple disentangling method with traditional principle component analysis (PCA) which is applied to the latent variables of variational auto-encoder (VAE). Our method can be applied to any generative models. In experiment, we apply our proposed method to simple VAE models and experimental results confirm that our method finds more interpretable factors from the latent space while keeping the reconstruction error the same.
Iterative Classroom Teaching
We consider the machine teaching problem in a classroom-like setting wherein the teacher has to deliver the same examples to a diverse group of students. Their diversity stems from differences in their initial internal states as well as their learning rates. We prove that a teacher with full knowledge about the learning dynamics of the students can teach a target concept to the entire classroom using O(min{d,N} log(1/eps)) examples, where d is the ambient dimension of the problem, N is the number of learners, and eps is the accuracy parameter. We show the robustness of our teaching strategy when the teacher has limited knowledge of the learners’ internal dynamics as provided by a noisy oracle. Further, we study the trade-off between the learners’ workload and the teacher’s cost in teaching the target concept. Our experiments validate our theoretical results and suggest that appropriately partitioning the classroom into homogenous groups provides a balance between these two objectives.
Measuring the Effects of Data Parallelism on Neural Network Training
Recent hardware developments have made unprecedented amounts of data parallelism available for accelerating neural network training. Among the simplest ways to harness next-generation accelerators is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured in the number of steps necessary to reach a goal out-of-sample error. Eventually, increasing the batch size will no longer reduce the number of training steps required, but the exact relationship between the batch size and how many training steps are necessary is of critical importance to practitioners, researchers, and hardware designers alike. We study how this relationship varies with the training algorithm, model, and dataset and find extremely large variation between workloads. Along the way, we reconcile disagreements in the literature on whether batch size affects model quality. Finally, we discuss the implications of our results for efforts to train neural networks much faster in the future.
Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training
Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. Classical distributed approaches, synchronous or asynchronous, are based on the parameter server architecture, i.e., worker nodes compute gradients which are communicated to the parameter server while updated parameters are returned. Recently, distributed training with AllReduce operations gained popularity as well. While many of those operations seem appealing, little is reported about wall-clock training time improvements. In this paper, we carefully analyze the AllReduce based setup, propose timing models which include network latency, bandwidth, cluster size and compute time, and demonstrate that a pipelined training with a width of two combines the best of both synchronous and asynchronous training. Specifically, for a setup consisting of a four-node GPU cluster we show wall-clock time training improvements of up to 5.4x compared to conventional approaches.
• Decrypting Distributed Ledger Design – Taxonomy, Classification and Blockchain Community Evaluation• dAIrector: Automatic Story Beat Generation through Knowledge Synthesis• Integrating Project Spatial Coordinates into Pavement Management Prioritization• Deep BV: A Fully Automated System for Brain Ventricle Localization and Segmentation in 3D Ultrasound Images of Embryonic Mice• Phenotyping Endometriosis through Mixed Membership Models of Self-Tracking Data• Multi-channel discourse as an indicator for Bitcoin price and volume movements• An Answer to a Question of Zeilberger and Zeilberger about Fractional Counting of Partitions• Integrative Biological Simulation, Neuropsychology, and AI Safety• On Convex Envelopes and Regularization of Non-Convex Functionals without moving Global Minima• Optimal control of a large dam with compound Poisson input and costs depending on water levels• CAAD 2018: Iterative Ensemble Adversarial Attack• Analysis of Multilingual Sequence-to-Sequence speech recognition systems• Reframing the S\&P500 Network of Stocks along the \nth{21} Century• A Generalized Multifractal Formalism for the Estimation of Nonconcave Multifractal Spectra• Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation• Degree-$d$ Chow Parameters Robustly Determine Degree-$d$ PTFs (and Algorithmic Applications)• An Optimal Approximation for Submodular Maximization under a Matroid Constraint in the Adaptive Complexity Model• Kosterlitz-Thouless scaling at many-body localization phase transitions• On the Complexity of Reconnaissance Blind Chess• ColorUNet: A convolutional classification approach to colorization• Best approximations of non-linear mappings: Method of optimal injections• Approximability of the Eight-vertex Model• Global Optimality in Distributed Low-rank Matrix Factorization• Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls• Strategic Availability and Cost Effective UAV-based Flying Access Networks: S-Modular Game Analysis• Asymptotic conditional inference via a Steining of selection probabilities• DragonPaint: Rule based bootstrapping for small data with an application to cartoon coloring• Poisson Multi-Bernoulli Mapping Using Gibbs Sampling• Incidence dimension and 2-packing number in graphs• Forensic Discrimination between Traditional and Compressive Imaging Systems• Election with Bribed Voter Uncertainty: Hardness and Approximation Algorithm• Role of self-loop in cell-cycle network of budding yeast• SRP: Efficient class-aware embedding learning for large-scale data via supervised random projections• The mode-coupling crossover of glasses is a localization transition• Automatic Thresholding of SIFT Descriptors• How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories• On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results• Solving Jigsaw Puzzles By The Graph Connection Laplacian• Towards Fluent Translations from Disfluent Speech• Accounting for Skill in Nonlinear Trend, Variability, and Autocorrelation Facilitates Better Multi-Model Projections• Ad-versarial: Defeating Perceptual Ad-Blocking• Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering• Correlation Filter Selection for Visual Tracking Using Reinforcement Learning• An Efficient Algorithm for High-Dimensional Log-Concave Maximum Likelihood• Robustness of Conditional GANs to Noisy Labels• Reconstruction-Cognizant Graph Sampling using Gershgorin Disc Alignment• Deep Semantic Instance Segmentation of Tree-like Structures Using Synthetic Data• Facial Landmark Detection for Manga Images• Differential Games Based on Invariant Generation• RGB-D SLAM in Dynamic Environments Using Points Correlations• Advanced machine learning informatics modeling using clinical and radiological imaging metrics for characterizing breast tumor characteristics with the OncotypeDX gene array• Secrecy Outage Analysis for Cooperative NOMA Systems with Relay Selection Scheme• Stochastic Matching with Few Queries: New Algorithms and Tools• High Speed Tracking With A Fourier Domain Kernelized Correlation Filter• Evaluating the Complementarity of Taxonomic Relation Extraction Methods Across Different Languages• Efficient Identification of Approximate Best Configuration of Training in Large Datasets• Model Selection for Generalized Zero-shot Learning• (Near) Optimal Parallelism Bound for Fully Asynchronous Coordinate Descent with Linear Speedup• Phonetic-attention scoring for deep speaker features in speaker verification• Gaussian-Constrained training for speaker verification• Bias and Generalization in Deep Generative Models: An Empirical Study• Using Passivity Theory to Interpret the Dissipating Energy Flow Method• Calibration Wizard: A Guidance System for Camera Calibration• Ordinal Regression using Noisy Pairwise Comparisons for Body Mass Index Range Estimation• Hardware-Constrained Millimeter Wave Systems for 5G: Challenges, Opportunities, and Solutions• An Optimal Transport View on Generalization• A two-stage stochastic approach for the asset protection problem during escaped wildfires with uncertain timing of a wind change• Information Flow in Pregroup Models of Natural Language• Applying Distributional Compositional Categorical Models of Meaning to Language Translation• Quantum Semantic Correlations in Hate and Non-Hate Speeches• Classical Copying versus Quantum Entanglement in Natural Language: The Case of VP-ellipsis• Approximate Neighbor Counting in Radio Networks• A Retinex-based Image Enhancement Scheme with Noise Aware Shadow-up Function• A New Count Regression Model including Gauss Hypergeometric Function with an application to model demand of health services• Interference Exploitation Precoding for Multi-Level Modulations: Closed-Form Solutions• Microwave absorption in a 2D topological insulators with a developed network of edge states• Dynamic Security Analysis of Power Systems by a Sampling-Based Algorithm• BAR: Bayesian Activity Recognition using variational inference• Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge• Speaker-adaptive neural vocoders for statistical parametric speech synthesis systems• Structured Turbo Compressed Sensing for Downlink Massive MIMO-OFDM Channel Estimation• Using Known Information to Accelerate HyperParameters Optimization Based on SMBO• Fundamental Asymptotic Behavior of (Two-User) Distributed Massive MIMO• Discovering Power Laws in Entity Length• Nonparametric maximum likelihood methods for binary response models with random coefficients• Improving Multi-Person Pose Estimation using Label Correction• A global-local approach for detecting hotspots in multiple-response regression• A spatially resolved network spike in model neuronal cultures reveals nucleation centers, circular traveling waves and drifting spiral waves• Distributed Exact Weighted All-Pairs Shortest Paths in Near-Linear Time• Modelling Opinion Dynamics in the Age of Algorithmic Personalisation• Repetitive Motion Estimation Network: Recover cardiac and respiratory signal from thoracic imaging• Complex dynamics in a vehicle platoon with nonlinear drag and ACC controllers• Digital Radio-over-Multicore-Fiber System with Self-Homodyne Coherent Detection and Entropy Coding for Mobile Fronthaul• On the Graded Acceptability of Arguments in Abstract and Instantiated Argumentation• Enumeration of lattice polytopes by their volume• On inverse product cannibalisation: a new Lotka-Volterra model for asymmetric competition in the ICTs• Codes correcting restricted errors• Spectral Simplicial Theory for Feature Selection and Applications to Genomics• Active Learning using Deep Bayesian Networks for Surgical Workflow Analysis• Prediction of laparoscopic procedure duration using unlabeled, multimodal sensor data• Bayesian Deep Learning for Exoplanet Atmospheric Retrieval• The biclique covering number of grids• A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms• Quantifying Link Stability in Ad Hoc Wireless Networks Subject to Ornstein-Uhlenbeck Mobility• Numerical study of barriers and valleys in the free-energy landscape of spin glasses• Dual Circumference and Collinear Sets• Every Collinear Set in a Planar Graph Is Free• Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow• Distributed Filtering for Nonlinear Multi-Agent Systems with Biased Observations• Microscopic Nuclei Classification, Segmentation and Detection with improved Deep Convolutional Neural Network (DCNN) Approaches• Optimal Designs for Minimax-Criteria in Random Coefficient Regression Models• Multi-view Laplacian Eigenmaps Based on Bag-of-Neighbors For RGBD Human Emotion Recognition• Discrete splicing theorem for noise sensitivity of invasion percolation• A Noether theorem for random locations• Triple consistency loss for pairing distributions in GAN-based face synthesis• Rotational Diversity in Multi-Cycle Assignment Problems• On Hamilton cycles in Erdős-Rényi subgraphs of large graphs• A simple yet effective baseline for non-attribute graph classification• Effective Subtree Encoding for Easy-First Dependency Parsing• Deep Neural Networks for Query Expansion using Word Embeddings• Learning from Demonstration in the Wild• Few-shot learning with attention-based sequence-to-sequence models• Cutoff for the mean-field zero-range process with bounded monotone rates• Orthogonal Trace-Sum Maximization: Applications, Local Algorithms, and Global Optimality• A Local Limit Theorem for Cliques in G(n,p)• Memorable Maps: A Framework for Re-defining Places in Visual Place Recognition• A Geometric Perspective on the Transferability of Adversarial Directions• Scalable Robust Kidney Exchange• Learning Dense Stereo Matching for Digital Surface Models from Satellite Imagery• Unveiling Swarm Intelligence with Network Science – the Metaphor Explained• Adaptive Semantic Segmentation with a Strategic Curriculum of Proxy Labels• On the Erdős Covering Problem: the density of the uncovered set• An End-to-end Approach to Semantic Segmentation with 3D CNN and Posterior-CRF in Medical Images• Implicit Argument Prediction as Reading Comprehension• Modular Architecture for StarCraft II with Deep Reinforcement Learning• Real-time Traffic Data Prediction with Basic Safety Messages using Kalman-Filter based Noise Reduction Model and Long Short-term Memory Neural Network• Biologically-plausible learning algorithms can scale to large datasets• A Geometric Approach of Gradient Descent Algorithms in Neural Networks• An Axiomatic Study of Query Terms Order in Ad-hoc Retrieval• Dynamics and stationary configurations of heterogeneous foams• Intrinsic Geometric Vulnerability of High-Dimensional Artificial Intelligence• Scale-variant topological information for characterizing complex networks• Large-Scale Visual Active Learning with Deep Probabilistic Ensembles• Labeling Bias in Galaxy Morphologies• Essential Collaboration Skills: The ASCCR Frame for Collaboration• Social media cluster dynamics create resilient global hate highways• Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extensions• An O^*(2.619^k) algorithm for 4-path vertex cover• Federated Learning for Mobile Keyboard Prediction• Communication and Information Theory of Single Action Potential Signals in Plants• GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training• Demonstrating Advantages of Neuromorphic Computation: A Pilot Study
Like this:
Like Loading…
Related