Whats new on arXiv

An Introductory Survey on Attention Mechanisms in NLP Problems

First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summarization, question answering, dependency parsing, etc. In this paper, we survey through recent works and conduct an introductory summary of the attention mechanism in different NLP problems, aiming to provide our readers with basic knowledge on this widely used method, discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.

Interactive dimensionality reduction using similarity projections

Recent advances in machine learning allow us to analyze and describe the content of high-dimensional data like text, audio, images or other signals. In order to visualize that data in 2D or 3D, usually Dimensionality Reduction (DR) techniques are employed. Most of these techniques, e.g., PCA or t-SNE, produce static projections without taking into account corrections from humans or other data exploration scenarios. In this work, we propose the interactive Similarity Projection (iSP), a novel interactive DR framework based on similarity embeddings, where we form a differentiable objective based on the user interactions and perform learning using gradient descent, with an end-to-end trainable architecture. Two interaction scenarios are evaluated. First, a common methodology in multidimensional projection is to project a subset of data, arrange them in classes or clusters, and project the rest unseen dataset based on that manipulation, in a kind of semi-supervised interpolation. We report results that outperform competitive baselines in a wide range of metrics and datasets. Second, we explore the scenario of manipulating some classes, while enriching the optimization with high-dimensional neighbor information. Apart from improving classification precision and clustering on images and text documents, the new emerging structure of the projection unveils semantic manifolds. For example, on the Head Pose dataset, by just dragging the faces looking far left to the left and those looking far right to the right, all faces are re-arranged on a continuum even on the vertical axis (face up and down). This end-to-end framework can be used for fast, visual semi-supervised learning, manifold exploration, interactive domain adaptation of neural embeddings and transfer learning.

Cross-lingual Short-text Matching with Deep Learning

The problem of short text matching is formulated as follows: given a pair of sentences or questions, a matching model determines whether the input pair mean the same or not. Models that can automatically identify questions with the same meaning have a wide range of applications in question answering sites and modern chatbots. In this article, we describe the approach by team hahu to solve this problem in the context of the ‘CIKM AnalytiCup 2018 – Cross-lingual Short-text Matching of Question Pairs’ that is sponsored by Alibaba. Our solution is an end-to-end system based on current advances in deep learning which avoids heavy feature-engineering and achieves improved performance over traditional machine-learning approaches. The log-loss scores for the first and second rounds of the contest are 0.35 and 0.39 respectively. The team was ranked 7th from 1027 teams in the overall ranking scheme by the organizers that consisted of the two contest scores as well as: innovation and system integrity, understanding data as well as practicality of the solution for business.

Aequitas: A Bias and Fairness Audit Toolkit

Recent work has raised concerns on the risk of unintended bias in algorithmic decision making systems being used nowadays that can affect individuals unfairly based on race, gender or religion, among other possible characteristics. While a lot of bias metrics and fairness definitions have been proposed in recent years, there is no consensus on which metric/definition should be used and there are very few available resources to operationalize them. Therefore, despite recent awareness, auditing for bias and fairness when developing and deploying algorithmic decision making systems is not yet a standard practice. We present Aequitas, an open source bias and fairness audit toolkit that is an intuitive and easy to use addition to the machine learning workflow, enabling users to seamlessly test models for several bias and fairness metrics in relation to multiple population sub-groups. We believe Aequitas will facilitate informed and equitable decisions around developing and deploying algorithmic decision making systems for both data scientists, machine learning researchers and policymakers.

Emergence of Addictive Behaviors in Reinforcement Learning Agents

This paper presents a novel approach to the technical analysis of wireheading in intelligent agents. Inspired by the natural analogues of wireheading and their prevalent manifestations, we propose the modeling of such phenomenon in Reinforcement Learning (RL) agents as psychological disorders. In a preliminary step towards evaluating this proposal, we study the feasibility and dynamics of emergent addictive policies in Q-learning agents in the tractable environment of the game of Snake. We consider a slightly modified settings for this game, in which the environment provides a ‘drug’ seed alongside the original ‘healthy’ seed for the consumption of the snake. We adopt and extend an RL-based model of natural addiction to Q-learning agents in this settings, and derive sufficient parametric conditions for the emergence of addictive behaviors in such agents. Furthermore, we evaluate our theoretical analysis with three sets of simulation-based experiments. The results demonstrate the feasibility of addictive wireheading in RL agents, and provide promising venues of further research on the psychopathological modeling of complex AI safety problems.

Controllability, Multiplexing, and Transfer Learning in Networks using Evolutionary Learning

Networks are fundamental building blocks for representing data, and computations. Remarkable progress in learning in structurally defined (shallow or deep) networks has recently been achieved. Here we introduce evolutionary exploratory search and learning method of topologically flexible networks under the constraint of producing elementary computational steady-state input-output operations. Our results include; (1) the identification of networks, over four orders of magnitude, implementing computation of steady-state input-output functions, such as a band-pass filter, a threshold function, and an inverse band-pass function. Next, (2) the learned networks are technically controllable as only a small number of driver nodes are required to move the system to a new state. Furthermore, we find that the fraction of required driver nodes is constant during evolutionary learning, suggesting a stable system design. (3), our framework allows multiplexing of different computations using the same network. For example, using a binary representation of the inputs, the network can readily compute three different input-output functions. Finally, (4) the proposed evolutionary learning demonstrates transfer learning. If the system learns one function A, then learning B requires on average less number of steps as compared to learning B from tabula rasa. We conclude that the constrained evolutionary learning produces large robust controllable circuits, capable of multiplexing and transfer learning. Our study suggests that network-based computations of steady-state functions, representing either cellular modules of cell-to-cell communication networks or internal molecular circuits communicating within a cell, could be a powerful model for biologically inspired computing. This complements conceptualizations such as attractor based models, or reservoir computing.

An Introduction to Fuzzy & Annotated Semantic Web Languages

We present the state of the art in representing and reasoning with fuzzy knowledge in Semantic Web Languages such as triple languages RDF/RDFS, conceptual languages of the OWL 2 family and rule languages. We further show how one may generalise them to so-called annotation domains, that cover also e.g. temporal and provenance extensions.

Deep Bayesian Inversion

Characterizing statistical properties of solutions of inverse problems is essential for decision making. Bayesian inversion offers a tractable framework for this purpose, but current approaches are computationally unfeasible for most realistic imaging applications in the clinic. We introduce two novel deep learning based methods for solving large-scale inverse problems using Bayesian inversion: a sampling based method using a WGAN with a novel mini-discriminator and a direct approach that trains a neural network using a novel loss function. The performance of both methods is demonstrated on image reconstruction in ultra low dose 3D helical CT. We compute the posterior mean and standard deviation of the 3D images followed by a hypothesis test to assess whether a ‘dark spot’ in the liver of a cancer stricken patient is present. Both methods are computationally efficient and our evaluation shows very promising performance that clearly supports the claim that Bayesian inversion is usable for 3D imaging in time critical applications.

Multiscale change point detection for dependent data

In this paper we study the theoretical properties of the simultaneous multiscale change point estimator (SMUCE) proposed by Frick et al. (2014) in regression models with dependent error processes. Empirical studies show that in this case the change point estimate is inconsistent, but it is not known if alternatives suggested in the literature for correlated data are consistent. We propose a modification of SMUCE scaling the basic statistic by the long run variance of the error process, which is estimated by a difference-type variance estimator calculated from local means from different blocks. For this modification we prove model consistency for physical dependent error processes and illustrate the finite sample performance by means of a simulation study.

Age of Information Scaling in Large Networks

Composing Modeling and Inference Operations with Probabilistic Program Combinators

Probabilistic programs with dynamic computation graphs can define measures over sample spaces with unbounded dimensionality, and thereby constitute programmatic analogues to Bayesian nonparametrics. Owing to the generality of this model class, inference relies on ‘black-box’ Monte Carlo methods that are generally not able to take advantage of conditional independence and exchangeability, which have historically been the cornerstones of efficient inference. We here seek to develop a ‘middle ground’ between probabilistic models with fully dynamic and fully static computation graphs. To this end, we introduce a combinator library for the Probabilistic Torch framework. Combinators are functions that accept models and return transformed models. We assume that models are dynamic, but that model composition is static, in the sense that combinator application takes place prior to evaluating the model on data. Combinators provide primitives for both model and inference composition. Model combinators take the form of classic functional programming constructs such as map and reduce. These constructs define a computation graph at a coarsened level of representation, in which nodes correspond to models, rather than individual variables. Inference combinators – such as enumeration, importance resampling, and Markov Chain Monte Carlo operators – assume a sampling semantics for model evaluation, in which application of combinators preserves proper weighting. Owing to this property, models defined using combinators can be trained using stochastic methods that optimize either variational or wake-sleep style objectives. As a validation of this principle, we use combinators to implement black box inference for hidden Markov models.

• Char2char Generation with Reranking for the E2E NLG Challenge• Parser Extraction of Triples in Unstructured Text• Construction of an algebra corresponding to a statistical model of the square ladder (square lattice with two lines)• Internal Wiring of Cartesian Verbs and Prepositions• Native Language Identification using i-vector• Non-intrusive model reduction of static parametric non-linear systems and application to global optimization and uncertainty quantification• Conformal Bootstrap Analysis for Localization: Symplectic Case• Towards Neural Machine Translation for African Languages• Jointly identifying opinion mining elements and fuzzy measurement of opinion intensity to analyze product features• Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy• Two-stream convolutional networks for end-to-end learning of self-driving cars• Few-shot Learning for Named Entity Recognition in Medical Text• Optimal Scalar Linear Index Codes for Symmetric and Neighboring Side-information Problems• ML-Net: multi-label classification of biomedical texts with deep neural networks• On the number of sets with a given doubling constant• ROMAN: Reduced-Order Modeling with Artificial Neurons• Sampling from manifold-restricted distributions using tangent bundle projections• Phase transition for the frog model on biregular trees• Novel Inter-file Coded Placement and D2D Delivery for a Cache-aided Fog-RAN Architecture• A combinatorial $\mathfrak{sl}_2$-action and the Sperner property for the weak order• Random periodic solutions and ergodicity for stochastic differential equations• Many cusped hyperbolic 3-manifolds do not bound geometrically• A geometric study of Strassen’s asymptotic rank conjecture and its variants• Evaluating GANs via Duality• A survey on graphs with convex quadratic stability number• Wavelet Based Dictionaries for Dimensionality Reduction of ECG Signals• Deep Q learning for fooling neural networks• Semi-dual Regularized Optimal Transport• Heuristic Voting as Ordinal Dominance Strategies• Towards Characterising Bayesian Network Models under Selection• Robust Dynamic CPU Resource Provisioning in Virtualized Servers• Staging Human-computer Dialogs: An Application of the Futamura Projections• Data Driven Governing Equations Approximation Using Deep Neural Networks• Discourse in Multimedia: A Case Study in Information Extraction• An Analysis of the Semantic Annotation Task on the Linked Data Cloud• Neural Wavetable: a playable wavetable synthesizer using neural networks• Corpus Phonetics Tutorial• Identification of semiparametric discrete outcome models with bounded covariates• What is really needed to justify ignoring the response mechanism for modelling purposes?• A New SVDD-Based Multivariate Non-parametric Process Capability Index• Text Assisted Insight Ranking Using Context-Aware Memory Network• Region-Referenced Spectral Power Dynamics of EEG Signals: A Hierarchical Modeling Approach• Estimation of High-Dimensional Seemingly Unrelated Regression Models• Consensus and Sectioning-based ADMM with Norm-1 Regularization for Imaging with a Compressive Reflector Antenna• An Overview of Semiparametric Extensions of Finite Mixture Models• Improving constant in end-point Poincaré inequality on Hamming cube• YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers• TrolleyMod v1.0: An Open-Source Simulation and Data-Collection Platform for Ethical Decision Making in Autonomous Vehicles• Transform Methods for Heavy-Traffic Analysis• Extractive Summary as Discrete Latent Variables• Boundary Braids• Fokker-Planck equations for nonlinear dynamical systems driven by multiplicative $α$-stable Lévy motions• Central limit theorem and moderate deviations for a class of semilinear SPDES• Bayesian Reinforcement Learning in Factored POMDPs• SepNE: Bringing Separability to Network Embedding• Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector• Style and Content Disentanglement in Generative Adversarial Networks• A Game Theoretic Approach for Dynamic Information Flow Tracking to Detect Multi-Stage Advanced Persistent Threats• Multi-Winner Contests for Strategic Diffusion in Social Networks• How Drones Look: Crowdsourced Knowledge Transfer for Aerial Video Saliency Prediction• A framework for covert and secret key expansion over quantum channels• Memory-Efficient Quantum Circuit Simulation by Using Lossy Data Compression• Translating a Math Word Problem to an Expression Tree• On the Capacity of MISO Channels with One-Bit ADCs and DACs• Gaussian Reciprocal Sequences from the Viewpoint of Conditionally Markov Sequences• Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization• Fast Distribution Grid Line Outage Identification with $μ$PMU• Analysis of Gaussian Spatial Models with Covariate Measurement Error• Submodular Optimization Over Streams with Inhomogeneous Decays• Sample complexity of partition identification using multi-armed bandits• MT-CGCNN: Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction• Cutting resilient networks — complete binary trees• Saddlepoint adjusted inversion of characteristic functions• Off-grid Variational Bayesian Inference of Line Spectral Estimation from One-bit Samples• Modeling Coherence for Discourse Neural Machine Translation• Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation• Melodic Phrase Segmentation By Deep Neural Networks• Leveraging Aspect Phrase Embeddings for Cross-Domain Review Rating Prediction• Efficient and Scalable Multi-task Regression on Massive Number of Tasks• Generating Multiple Diverse Responses for Short-Text Conversation• Preventive Equipment Repair Planning Model• A Radiomics Approach to Traumatic Brain Injury Prediction in CT Scans• Plan-And-Write: Towards Better Automatic Storytelling• AMGCL: an Efficient, Flexible, and Extensible Algebraic Multigrid Implementation• An analysis of a fair division protocol for drawing legislative districts• Plateau Polycubes and Lateral Area• From Free Text to Clusters of Content in Health Records: An Unsupervised Graph Partitioning Approach• A Deterministic Algorithm for Bridging Anaphora Resolution• Universal Polarization for Processes with Memory• Acyclic subgraphs with high chromatic number• Lattice paths and submonoids of $\mathbb Z^2$• Optimal stopping of Brownian motion with broken drift• Neural Based Statement Classification for Biased Language• Stochastic Algorithmic Differentiation of (Expectations of) Discontinuous Functions (Indicator Functions)• Measuring Road Network Topology Vulnerability by Ricci Curvature• Rice-Marlin Codes: Tiny and Efficient Variable-to-Fixed Codes• Space-time localisation for the dynamic $Φ^4_3$ model• SLIM: Simultaneous Logic-in-Memory Computing Exploiting Bilayer Analog OxRAM Devices• LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks• Creatures great and SMAL: Recovering the shape and motion of animals from video• Robustness of spectral methods for community detection• Groups with few maximal sum-free sets• ProstateGAN: Mitigating Data Bias via Prostate Diffusion Imaging Synthesis with Generative Adversarial Networks• Distortion Robust Image Classification with Deep Convolutional Neural Network based on Discrete Cosine Transform• Statistical post-processing of dual-resolution ensemble forecasts• Revisiting Projection-Free Optimization for Strongly Convex Constraint Sets• Time-Varying Isotropic Vector Random Fields on Compact Two-Point Homogeneous Spaces• A Learning-Based Framework for Line-Spectra Super-resolution• A combinatorial classification of 2-regular simple modules for Nakayama algebras• A structural characterization of tree-based phylogenetic networks• Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization• Predicting the time-evolution of multi-physics systems with sequence-to-sequence models• Robust low-rank multilinear tensor approximation for a joint estimation of the multilinear rank and the loading matrices• Pitfalls of Graph Neural Network Evaluation• Large-scale Interactive Recommendation with Tree-structured Policy Gradient• Design of Spectrally Shaped Binary Sequences via Randomized Convex Relaxation• Experimental 3D Coherent Diffractive Imaging from photon-sparse random projections• Dependency Grammar Induction with a Neural Variational Transition-based Parser• Data-Enabled Predictive Control: In the Shallows of the DeePC• Reduced Order Controller Design for Robust Output Regulation of Parabolic Systems• Development of Real-time ADAS Object Detector for Deployment on CPU• QUENN: QUantization Engine for low-power Neural Networks• A Simulated Cyberattack on Twitter: Assessing Partisan Vulnerability to Spear Phishing and Disinformation ahead of the 2018 U.S. Midterm Elections• The ADAPT System Description for the IWSLT 2018 Basque to English Translation Task• The Greedy Dirichlet Process Filter – An Online Clustering Multi-Target Tracker• Structural and temporal heterogeneities on networks• Applications of mesoscopic CLTs in random matrix theory• A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest• Matrix rigidity and the ill-posedness of Robust PCA and matrix completion• Bandana: Using Non-volatile Memory for Storing Deep Learning Models• SCORE+ for Network Community Detection• Evolving intrinsic motivations for altruistic behavior• Streaming Network Embedding through Local Actions• Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems• Opinion dynamics with Lotka-Volterra type interactions• Domain Randomization for Scene-Specific Car Detection and Pose Estimation• Virtual Net: a Decentralized Architecture for Interaction in Mobile Virtual Worlds• Mayall: A Framework for Desktop JavaScript Auditing and Post-Exploitation Analysis• EdgeBench: Benchmarking Edge Computing Platforms• Jointly Learning to Label Sentences and Tokens• The exchange-driven growth model: basic properties and longtime behavior• Pulse radar with FPGA range compression for real time displacement and vibration monitoring• Strong Feller property for SDEs driven by multiplicative cylindrical stable noise• Tower Cranes and Supply Points Locating Problem Using CBO, ECBO, and VPS• No-Frills Human-Object Interaction Detection: Factorization, Appearance and Layout Encodings, and Training Techniques• Geometry of Gaussian free field sign clusters and random interlacements

Like this:

Like Loading…

Related