Whats new on arXiv

Inference on Functionals of Set-Identified Parameters Defined by Convex Moments

Many inference procedures in the literature on partial identification are designed for when the inferential object of interest is the entire (partially identified) vector of parameters. However, when the researcher’s inferential object of interest is a subvector or functional of the parameter vector, these inference procedures can be highly conservative, especially when the dimension of the parameter vector is large. This paper considers uniformly valid inference for continuous functionals of partially identified parameters in cases where the identified set is defined by convex (in the parameter) moment inequalities. We use a functional delta method and propose a method for constructing uniformly valid confidence sets for a (possibly stochastic) convex functional of a partially identified parameter. The proposed method amounts to bootstrapping the Lagrangian of a convex optimization problem, and subsumes subvector inference as a special case. Unlike other proposed subvector inference procedures, our procedure does not require the researcher to repeatedly invert a hypothesis test. Finally, we discuss sufficient conditions on the moment functions to ensure uniform validity.

Recycled ADMM: Improve Privacy and Accuracy with Less Computation in Distributed Algorithms

Alternating direction method of multiplier (ADMM) is a powerful method to solve decentralized convex optimization problems. In distributed settings, each node performs computation with its local data and the local results are exchanged among neighboring nodes in an iterative fashion. During this iterative process the leakage of data privacy arises and can accumulate significantly over many iterations, making it difficult to balance the privacy-utility tradeoff. In this study we propose Recycled ADMM (R-ADMM), where a linear approximation is applied to every even iteration, its solution directly calculated using only results from the previous, odd iteration. It turns out that under such a scheme, half of the updates incur no privacy loss and require much less computation compared to the conventional ADMM. We obtain a sufficient condition for the convergence of R-ADMM and provide the privacy analysis based on objective perturbation.

Reinforcement Evolutionary Learning Method for self-learning

In statistical modelling the biggest threat is concept drift which makes the model gradually showing deteriorating performance over time. There are state of the art methodologies to detect the impact of concept drift, however general strategy considered to overcome the issue in performance is to rebuild or re-calibrate the model periodically as the variable patterns for the model changes significantly due to market change or consumer behavior change etc. Quantitative research is the most widely spread application of data science in Marketing or financial domain where applicability of state of the art reinforcement learning for auto-learning is less explored paradigm. Reinforcement learning is heavily dependent on having a simulated environment which is majorly available for gaming or online systems, to learn from the live feedback. However, there are some research happened on the area of online advertisement, pricing etc where due to the nature of the online learning environment scope of reinforcement learning is explored. Our proposed solution is a reinforcement learning based, true self-learning algorithm which can adapt to the data change or concept drift and auto learn and self-calibrate for the new patterns of the data solving the problem of concept drift. Keywords – Reinforcement learning, Genetic Algorithm, Q-learning, Classification modelling, CMA-ES, NES, Multi objective optimization, Concept drift, Population stability index, Incremental learning, F1-measure, Predictive Modelling, Self-learning, MCTS, AlphaGo, AlphaZero

Pre-Synaptic Pool Modification (PSPM): A Supervised Learning Procedure for Spiking Neural Networks

A central question in neuroscience is how to develop realistic models that predict output firing behavior based on provided external stimulus. Given a set of external inputs and a set of output spike trains, the objective is to discover a network structure which can accomplish the transformation as accurately as possible. Due to the difficulty of this problem in its most general form, approximations have been made in previous work. Past approximations have sacrificed network size, recurrence, allowed spiked count, or have imposed layered network structure. Here we present a learning rule without these sacrifices, which produces a weight matrix of a leaky integrate-and-fire (LIF) network to match the output activity of both deterministic LIF networks as well as probabilistic integrate-and-fire (PIF) networks. Inspired by synaptic scaling, our pre-synaptic pool modification (PSPM) algorithm outputs deterministic, fully recurrent spiking neural networks that can provide a novel generative model for given spike trains. Similarity in output spike trains is evaluated with a variety of metrics including a van-Rossum like measure and a numerical comparison of inter-spike interval distributions. Application of our algorithm to randomly generated networks improves similarity to the reference spike trains on both of these stated measures. In addition, we generated LIF networks that operate near criticality when trained on critical PIF outputs. Our results establish that learning rules based on synaptic homeostasis can be used to represent input-output relationships in fully recurrent spiking neural networks.

Spectral Subspace Sparsification

We introduce a new approach to spectral sparsification that approximates the quadratic form of the pseudoinverse of a graph Laplacian restricted to a subspace. We show that sparsifiers with a near-linear number of edges in the dimension of the subspace exist. Our setting generalizes that of Schur complement sparsifiers. Our approach produces sparsifiers by sampling a uniformly random spanning tree of the input graph and using that tree to guide an edge elimination procedure that contracts, deletes, and reweights edges. In the context of Schur complement sparsifiers, our approach has two benefits over prior work. First, it produces a sparsifier in almost-linear time with no runtime dependence on the desired error. We directly exploit this to compute approximate effective resistances for a small set of vertex pairs in faster time than prior work (Durfee-Kyng-Peebles-Rao-Sachdeva ’17). Secondly, it yields sparsifiers that are reweighted minors of the input graph. As a result, we give a near-optimal answer to a variant of the Steiner point removal problem. A key ingredient of our algorithm is a subroutine of independent interest: a near-linear time algorithm that, given a chosen set of vertices, builds a data structure from which we can query a multiplicative approximation to the decrease in the effective resistance between two vertices after identifying all vertices in the chosen set to a single vertex with inverse polynomial additional additive error in near-constant time.

Entity-Relationship Search over the Web

Entity-Relationship (E-R) Search is a complex case of Entity Search where the goal is to search for multiple unknown entities and relationships connecting them. We assume that a E-R query can be decomposed as a sequence of sub-queries each containing keywords related to a specific entity or relationship. We adopt a probabilistic formulation of the E-R search problem. When creating specific representations for entities (e.g. context terms) and for pairs of entities (i.e. relationships) it is possible to create a graph of probabilistic dependencies between sub-queries and entity plus relationship representations. To the best of our knowledge this represents the first probabilistic model of E-R search. We propose and develop a novel supervised Early Fusion-based model for E-R search, the Entity-Relationship Dependence Model (ERDM). It uses Markov Random Field to model term dependencies of E-R sub-queries and entity/relationship documents. We performed experiments with more than 800M entities and relationships extractions from ClueWeb-09-B with FACC1 entity linking. We obtained promising results using 3 different query collections comprising 469 E-R queries, with results showing that it is possible to perform E-R search without using fix and pre-defined entity and relationship types, enabling a wide range of queries to be addressed.

Task-Embedded Control Networks for Few-Shot Imitation Learning

Much like humans, robots should have the ability to leverage knowledge from previously learned tasks in order to learn new tasks quickly in new and unfamiliar environments. Despite this, most robot learning approaches have focused on learning a single task, from scratch, with a limited notion of generalisation, and no way of leveraging the knowledge to learn other tasks more efficiently. One possible solution is meta-learning, but many of the related approaches are limited in their ability to scale to a large number of tasks and to learn further tasks without forgetting previously learned ones. With this in mind, we introduce Task-Embedded Control Networks, which employ ideas from metric learning in order to create a task embedding that can be used by a robot to learn new tasks from one or more demonstrations. In the area of visually-guided manipulation, we present simulation results in which we surpass the performance of a state-of-the-art method when using only visual information from each demonstration. Additionally, we demonstrate that our approach can also be used in conjunction with domain randomisation to train our few-shot learning ability in simulation and then deploy in the real world without any additional training. Once deployed, the robot can learn new tasks from a single real-world demonstration.

Information Theoretic Analysis of the Fundamental Limits of Content Identification

We investigate the content identification problem from an information theoretic perspective and derive its fundamental limits. Here, a rights-holder company desires to keep track of illegal uses of its commercial content, by utilizing resources of a security company, while securing the privacy of its content. Due to privacy issues, the rights-holder company only reveals certain hash values of the original content to the security company. We view the commercial content of the rights-holder company as the codebook of an encoder and the hash values of the content (made available to the security company) as the codebook of a decoder, i.e., the corresponding codebooks of the encoder and the decoder are not the same. Hence, the content identification is modelled as a communication problem using asymmetric codebooks by an encoder and a decoder. We further address ‘the privacy issue’ in the content identification by adding ‘security’ constraints to the communication setup to prevent estimation of the encoder codewords given the decoder codewords. By this modeling, the proposed problem of reliable communication with asymmetric codebooks with security constraints provides the fundamental limits of the content identification problem. To this end, we introduce an information capacity and prove that this capacity is equal to the operation capacity of the system under i.i.d. encoder codewords providing the fundamental limits for content identification. As a well known and widely studied framework, we evaluate the capacity for a binary symmetric channel and provide closed form expressions.

Causal isotonic regression

In observational studies, potential confounders may distort the causal relationship between an exposure and an outcome. However, under some conditions, a causal dose-response curve can be recovered using the G-computation formula. Most classical methods for estimating such curves when the exposure is continuous rely on restrictive parametric assumptions, which carry significant risk of model misspecification. Nonparametric estimation in this context is challenging because in a nonparametric model these curves cannot be estimated at regular rates. Many available nonparametric estimators are sensitive to the selection of certain tuning parameters, and performing valid inference with such estimators can be difficult. In this work, we propose a nonparametric estimator of a causal dose-response curve known to be monotone. We show that our proposed estimation procedure generalizes the classical least-squares isotonic regression estimator of a monotone regression function. Specifically, it does not involve tuning parameters, and is invariant to strictly monotone transformations of the exposure variable. We describe theoretical properties of our proposed estimator, including its irregular limit distribution and the potential for doubly-robust inference. Furthermore, we illustrate its performance via numerical studies, and use it to assess the relationship between BMI and immune response in HIV vaccine trials.

Event History Analysis of Dynamic Communication Networks

Statistical analysis on networks has received growing attention due to demand from various emerging applications. In dynamic networks, one of the key interests is to model the event history of time-stamped interactions amongst nodes. We propose to model dynamic directed communication networks via multivariate counting processes. A pseudo partial likelihood approach is exploited to capture the network dependence structure. Asymptotic results of the resulting estimation are established. Numerical results are performed to demonstrate effectiveness of our proposal.

Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values

Explaining the output of a complicated machine learning model like a deep neural network (DNN) is a central challenge in machine learning. Several proposed local explanation methods address this issue by identifying what dimensions of a single input are most responsible for a DNN’s output. The goal of this work is to assess the sensitivity of local explanations to DNN parameter values. Somewhat surprisingly, we find that DNNs with randomly-initialized weights produce explanations that are both visually and quantitatively similar to those produced by DNNs with learned weights. Our conjecture is that this phenomenon occurs because these explanations are dominated by the lower level features of a DNN, and that a DNN’s architecture provides a strong prior which significantly affects the representations learned at these lower layers. NOTE: This work is now subsumed by our recent manuscript, Sanity Checks for Saliency Maps (to appear NIPS 2018), where we expand on findings and address concerns raised in Sundararajan et. al. (2018).

Detecting Memorization in ReLU Networks

We propose a new notion of `non-linearity’ of a network layer with respect to an input batch that is based on its proximity to a linear system, which is reflected in the non-negative rank of the activation matrix. We measure this non-linearity by applying non-negative factorization to the activation matrix. Considering batches of similar samples, we find that high non-linearity in deep layers is indicative of memorization. Furthermore, by applying our approach layer-by-layer, we find that the mechanism for memorization consists of distinct phases. We perform experiments on fully-connected and convolutional neural networks trained on several image and audio datasets. Our results demonstrate that as an indicator for memorization, our technique can be used to perform early stopping.

Probabilistic Linear Solvers: A Unifying View

Several recent works have developed a new, probabilistic interpretation for numerical algorithms solving linear systems in which the solution is inferred in a Bayesian framework, either directly or by inferring the unknown action of the matrix inverse. These approaches have typically focused on replicating the behavior of the conjugate gradient method as a prototypical iterative method. In this work surprisingly general conditions for equivalence of these disparate methods are presented. We also describe connections between probabilistic linear solvers and projection methods for linear systems, providing a probabilistic interpretation of a far more general class of iterative methods. In particular, this provides such an interpretation of the generalised minimum residual method. A probabilistic view of preconditioning is also introduced. These developments unify the literature on probabilistic linear solvers, and provide foundational connections to the literature on iterative solvers for linear systems.

Deep LDA Hashing

The conventional supervised hashing methods based on classification do not entirely meet the requirements of hashing technique, but Linear Discriminant Analysis (LDA) does. In this paper, we propose to perform a revised LDA objective over deep networks to learn efficient hashing codes in a truly end-to-end fashion. However, the complicated eigenvalue decomposition within each mini-batch in every epoch has to be faced with when simply optimizing the deep network w.r.t. the LDA objective. In this work, the revised LDA objective is transformed into a simple least square problem, which naturally overcomes the intractable problems and can be easily solved by the off-the-shelf optimizer. Such deep extension can also overcome the weakness of LDA Hashing in the limited linear projection and feature learning. Amounts of experiments are conducted on three benchmark datasets. The proposed Deep LDA Hashing shows nearly 70 points improvement over the conventional one on the CIFAR-10 dataset. It also beats several state-of-the-art methods on various metrics.

Dense Multimodal Fusion for Hierarchically Joint Representation

Multiple modalities can provide more valuable information than single one by describing the same contents in various ways. Hence, it is highly expected to learn effective joint representation by fusing the features of different modalities. However, previous methods mainly focus on fusing the shallow features or high-level representations generated by unimodal deep networks, which only capture part of the hierarchical correlations across modalities. In this paper, we propose to densely integrate the representations by greedily stacking multiple shared layers between different modality-specific networks, which is named as Dense Multimodal Fusion (DMF). The joint representations in different shared layers can capture the correlations in different levels, and the connection between shared layers also provides an efficient way to learn the dependence among hierarchical correlations. These two properties jointly contribute to the multiple learning paths in DMF, which results in faster convergence, lower training loss, and better performance. We evaluate our model on three typical multimodal learning tasks, including audiovisual speech recognition, cross-modal retrieval, and multimodal classification. The noticeable performance in the experiments demonstrates that our model can learn more effective joint representation.

POLO: a POLicy-based Optimization library

Probabilistic Argumentation and Information Algebras of Probability Potentials on Families of Compatible Frames

Probabilistic argumentation is an alternative to causal modeling with Bayesian networks. Probabilistic argumentation structures (PAS) are defined on families of compatible frames (f.c.f). This is a generalization of the usual multivariate models based on families of variables. The crucial relation of conditional independence between frames of a f.c.f is introduced and shown to form a quasi-separoid, a weakening of the well-known structure of a separoid. It is shown that PAS generate probability potentials on the frames of the f.c.f. The operations of aggregating different PAS and of transport of a PAS from one frame to another induce an algebraic structure on the family of potentials on the f.c.f, an algebraic structure which is similar to valuation algebras related to Bayesian networks, but more general. As a consequence the well-known local computation architectures of Bayesian networks for inference apply also for the potentials on f.c.f. Conditioning and conditionals can be defined for potentials and it is shown that these concepts satisfy similar properties as conditional probability distributions. Finally a max/prod algebra between potentials is defined and applied to find most probable configurations for a factorization of potentials.

NSGA-NET: A Multi-Objective Genetic Algorithm for Neural Architecture Search

CHOPT : Automated Hyperparameter Optimization Framework for Cloud-Based Machine Learning Platforms

Many hyperparameter optimization (HyperOpt) methods assume restricted computing resources and mainly focus on enhancing performance. Here we propose a novel cloud-based HyperOpt (CHOPT) framework which can efficiently utilize shared computing resources while supporting various HyperOpt algorithms. We incorporate convenient web-based user interfaces, visualization, and analysis tools, enabling users to easily control optimization procedures and build up valuable insights with an iterative analysis procedure. Furthermore, our framework can be incorporated with any cloud platform, thus complementarily increasing the efficiency of conventional deep learning frameworks. We demonstrate applications of CHOPT with tasks such as image recognition and question-answering, showing that our framework can find hyperparameter configurations competitive with previous work. We also show CHOPT is capable of providing interesting observations through its analysing tools

Stein Neural Sampler

We propose two novel samplers to produce high-quality samples from a given (un-normalized) probability density. The sampling is achieved by transforming a reference distribution to the target distribution with neural networks, which are trained separately by minimizing two kinds of Stein Discrepancies, and hence our method is named as Stein neural sampler. Theoretical and empirical results suggest that, compared with traditional sampling schemes, our samplers share the following three advantages: 1. Being asymptotically correct; 2. Experiencing less convergence issue in practice; 3. Generating samples instantaneously.

Meta-Learning: A Survey

Meta-learning, or learning to learn, is the science of systematically observing how different machine learning approaches perform on a wide range of learning tasks, and then learning from this experience, or meta-data, to learn new tasks much faster than otherwise possible. Not only does this dramatically speed up and improve the design of machine learning pipelines or neural architectures, it also allows us to replace hand-engineered algorithms with novel approaches learned in a data-driven way. In this chapter, we provide an overview of the state of the art in this fascinating and continuously evolving field.

Towards Robot-Centric Conceptual Knowledge Acquisition

Robots require knowledge about objects in order to efficiently perform various household tasks involving objects. The existing knowledge bases for robots acquire symbolic knowledge about objects from manually-coded external common sense knowledge bases such as ConceptNet, Word-Net etc. The problem with such approaches is the discrepancy between human-centric symbolic knowledge and robot-centric object perception due to its limited perception capabilities. Ultimately, significant portion of knowledge in the knowledge base remains ungrounded into robot’s perception. To overcome this discrepancy, we propose an approach to enable robots to generate robot-centric symbolic knowledge about objects from their own sensory data, thus, allowing them to assemble their own conceptual understanding of objects. With this goal in mind, the presented paper elaborates on the work-in-progress of the proposed approach followed by the preliminary results.

Parallelisation of a Common Changepoint Detection Method

In recent years, various means of efficiently detecting changepoints in the univariate setting have been proposed, with one popular approach involving minimising a penalised cost function using dynamic programming. In some situations, these algorithms can have an expected computational cost that is linear in the number of data points; however, the worst case cost remains quadratic. We introduce two means of improving the computational performance of these methods, both based on parallelising the dynamic programming approach. We establish that parallelisation can give substantial computational improvements: in some situations the computational cost decreases roughly quadratically in the number of cores used. These parallel implementations are no longer guaranteed to find the true minimum of the penalised cost; however, we show that they retain the same asymptotic guarantees in terms of their accuracy in estimating the number and location of the changes.

• Loop conditions with strongly connected graphs• Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources• Network Response Regression for Modeling Population of Networks with Covariates• Optimal Policies for Status Update Generation in a Wireless System with Heterogeneous Traffic• Image Completion on CIFAR-10• Replica symmetry breaking in multi-species Sherrington–Kirkpatrick model• Clustering indices and decay of correlations in non-Markovian models• Principled Deep Neural Network Training through Linear Programming• Recovering Quantized Data with Missing Information Using Bilinear Factorization and Augmented Lagrangian Method• Rethinking Recurrent Latent Variable Model for Music Composition• Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition• Towards Gradient Free and Projection Free Stochastic Optimization• A look at the topology of convolutional neural networks• Diagnosing Convolutional Neural Networks using their Spectral Response• High-quality Ellipse Detection Based on Arc-support Line Segments• Honeycomb tessellations and canonical bases for permutohedral blades• Triple Attention Mixed Link Network for Single Image Super Resolution• Deep Diffeomorphic Normalizing Flows• Visually Communicating and Teaching Intuition for Influence Functions• Modelling brain-wide neuronal morphology via rooted Cayley trees• Toward Understanding the Impact of Staleness in Distributed Machine Learning• Liouville type results for system of equations involving fractional Laplacian in the exterior domain• Point processes, hole events, and large deviations: random complex zeros and Coulomb gases• Patient-Specific 3D Volumetric Reconstruction of Bioresorbable Stents: A Method to Generate 3D Geometries for Computational Analysis of Coronaries Treated with Bioresorbable Stents• Light-Weight RefineNet for Real-Time Semantic Segmentation• Query Tracking for E-commerce Conversational Search: A Machine Comprehension Perspective• TV-regularized CT Reconstruction and Metal Artifact Reduction Using Inequality Constraints with Preconditioning• Optimizing Waiting Thresholds Within A State Machine• Modeling Brain Connectivity with Graphical Models on Frequency Domain• Noise-synchronizability of opinion dynamics• Guiding Intelligent Surveillance System by learning-by-synthesis gaze estimation• Sanity Checks for Saliency Maps• Multi-Stream Opportunistic Network Decoupling: Relay Selection and Interference Management• Spanning trees in random graphs• Complete minors and stability numbers• Remote State Estimation with Stochastic Event-triggered Sensor Schedule in the Presence of Packet Drops• Distributed Consensus over Markovian Packet Loss Channels• Tilting maximum Lq-Likelihood estimation for extreme values drawing on block maxima• Resistance distance and Kirchhoff index in generalized R-vertex and R-edge corona for graphs• Practical Implementation of Memristor-Based Threshold Logic Gates• Monitoring of Low Voltage Distribution Grid Considering the Neutral Conductor• Support Localization and the Fisher Metric for off-the-grid Sparse Regularization• On the semiclassical spectrum of the Dirichlet-Pauli operator• Bounded Collision Force by the Sobolev Norm: Compliance and Control for Interactive Robots• Strong Sard Conjecture and regularity of singular minimizing geodesics for analytic sub-Riemannian structures in dimension 3• Evaluating regulatory reform of network industries: a survey of empirical models based on categorical proxies• Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems• Multiply robust two-sample instrumental variable estimation• Survey of Consensus Protocols• Algorithms for local minimization of 3D molecules OPLS force field• A Survey on Periocular Biometrics Research• Two-Stage Consensus-Based Distributed MPC for Interconnected Microgrids• Split-Correctness in Information Extraction• Empirical Bounds on Linear Regions of Deep Rectifier Networks• Decentralized Multi-Antenna Coded Caching with Cyclic Exchanges• On the discrepancy of random low degree set systems• Training Passive Photonic Reservoirs with Integrated Optical Readout• Deep learning cardiac motion analysis for human survival prediction• Sharp threshold phenomena in statistical physics• Consistent Query Answering for Primary Keys in Logspace• On Breiman’s Dilemma in Neural Networks: Phase Transitions of Margin Dynamics• Constant Time Quantum search Algorithm Over A Datasets: An Experimental Study Using IBM Q Experience• Hierarchical clustering that takes advantage of both density-peak and density-connectivity• 1-Safe Petri nets and special cube complexes: equivalence and applications• BSDEs with monotone generator and two irregular reflecting barriers• Deep calibration of rough stochastic volatility models• PCI-MDR: Missing Data Recovery in Wireless Sensor Networks using Partial Canonical Identity Matrix• Reflected BSDEs with monotone generator• Singular Graphs on which the Dihedral Group Acts Vertex Transitively• On the Domination Number of Permutation Graphs and an Application to Strong Fixed Points• Robust 6D Object Pose Estimation in Cluttered Scenes using Semantic Segmentation and Pose Regression Networks• Non-equilibrium fluctuations for a reaction-diffusion model via relative entropy• Unique Metric for Health Analysis with Optimization of Clustering Activity and Cross Comparison of Results from Different Approach• Maximum reciprocal degree resistance distance index of unicyclic graphs• MRI Super-Resolution using Multi-Channel Total Variation• Distributed Hypothesis Testing with Collaborative Detection• The age-dependent random connection model• Cross Script Hindi English NER Corpus from Wikipedia• Characterization of minimizable Lagrangian action functionals and a dual Mather theorem• State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines• Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective• Plethysms of symmetric functions and highest weight representations• Compatible Matrices of Spearman’s Rank Correlation• Towards the Latent Transcriptome• Smallest cyclically covering subspaces of $\mathbb{F}_q^n$• Two-impurity-entanglement generation by electron scattering in zigzag phosphorene nanoribbons• Security Analysis of Deep Neural Networks Operating in the Presence of Cache Side-Channel Attacks• A Droplet Approach Based on Raptor Codes for Distributed Computing With Straggling Servers• An Improved Algorithm for Incremental Cycle Detection and Topological Ordering in Sparse Graphs• The Influence of Canyon Shadowing on Device-to-Device Connectivity in Urban Scenario• A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing• Fundamental Limits of Covert Bit Insertion in Packets• Time-Message Trade-Offs in Distributed Algorithms• Koszulness and supersolvability for Dirichlet arrangements• A Vertical PRF Architecture for Microblog Search• Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations• Effective Parallelisation for Machine Learning• Formal inverses of the generalized Thue-Morse sequences and variations of the Rudin-Shapiro sequence• Estimation of the weighted integrated square error of the Grenander estimator by the Kolmogorov-Smirnov statistic• Combinatorial Attacks on Binarized Neural Networks• The projected mass distribution and the transition to homogeneity• An AMR Aligner Tuned by Transition-based Parser• Approximate Online Pattern Matching in Sub-linear Time• Zero-Resource Multilingual Model Transfer: Learning What to Share• ISS Property with Respect to Boundary Disturbances for a Class of Riesz-Spectral Boundary Control Systems• The equivalence between two classic algorithms for the assignment problem• The convergence rate of a golden ratio algorithm for equilibrium problems• Bi-pruned Hurwitz numbers• Bootstrapped CNNs for Building Segmentation on RGB-D Aerial Imagery• An Upper Bound for Palindromic and Factor Complexity of Rich Words• Fine-scale spatiotemporal air pollution analysis using mobile monitors on Google Street View vehicles• Long ties accelerate noisy threshold-based contagions• Busemann functions and Gibbs measures in directed polymer models on $\mathbb{Z}^2$• Improving the Transformer Translation Model with Document-Level Context• Simultaneous Small Noise Limit for Singularly Perturbed Slow-Fast Coupled Diffusions• Hierarchical segmentation using equivalence test (HiSET): Application to DCE image sequences• Algorithmic Aspects of Inverse Problems Using Generative Models• ReLU Regression: Complexity, Exact and Approximation Algorithms• Proximal Online Gradient is Optimum for Dynamic Regret• End-to-End Text Classification via Image-based Embedding using Character-level Networks• SFV: Reinforcement Learning of Physical Skills from Videos

Like this:

Like Loading…

Related