Whats new on arXiv

Correction of AI systems by linear discriminants: Probabilistic foundations

Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources involved. The important challenge is to develop fast methods to correct errors without damaging existing skills. We formulated the technical requirements to the ‘ideal’ correctors. Such correctors include binary classifiers, which separate the situations with high risk of errors from the situations where the AI systems work properly. Surprisingly, for essentially high-dimensional data such methods are possible: simple linear Fisher discriminant can separate the situations with errors from correctly solved tasks even for exponentially large samples. The paper presents the probabilistic basis for fast non-destructive correction of AI systems. A series of new stochastic separation theorems is proven. These theorems provide new instruments for fast non-iterative correction of errors of legacy AI systems. The new approaches become efficient in high-dimensions, for correction of high-dimensional systems in high-dimensional world (i.e. for processing of essentially high-dimensional data by large systems).

Learning From Positive and Unlabeled Data: A Survey

Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.

Assessing biological models using topological data analysis

We use topological data analysis as a tool to analyze the fit of mathematical models to experimental data. This study is built on data obtained from motion tracking groups of aphids in [Nilsen et al., PLOS One, 2013] and two random walk models that were proposed to describe the data. One model incorporates social interactions between the insects, and the second model is a control model that excludes these interactions. We compare data from each model to data from experiment by performing statistical tests based on three different sets of measures. First, we use time series of order parameters commonly used in collective motion studies. These order parameters measure the overall polarization and angular momentum of the group, and do not rely on a priori knowledge of the models that produced the data. Second, we use order parameter time series that do rely on a priori knowledge, namely average distance to nearest neighbor and percentage of aphids moving. Third, we use computational persistent homology to calculate topological signatures of the data. Analysis of the a priori order parameters indicates that the interactive model better describes the experimental data than the control model does. The topological approach performs as well as these a priori order parameters and better than the other order parameters, suggesting the utility of the topological approach in the absence of specific knowledge of mechanisms underlying the data.

A Framework of Transfer Learning in Object Detection for Embedded Systems

A Perceptual Prediction Framework for Self Supervised Event Segmentation

Temporal segmentation of long videos is an important problem, that has largely been tackled through supervised learning, often requiring large amounts of annotated training data. In this paper, we tackle the problem of self-supervised temporal segmentation of long videos that alleviate the need for any supervision. We introduce a self-supervised, predictive learning framework that draws inspiration from cognitive psychology to segment long, visually complex videos into individual, stable segments that share the same semantics. We also introduce a new adaptive learning paradigm that helps reduce the effect of catastrophic forgetting in recurrent neural networks. Extensive experiments on three publicly available datasets – Breakfast Actions, 50 Salads, and INRIA Instructional Videos datasets show the efficacy of the proposed approach. We show that the proposed approach is able to outperform weakly-supervised and other unsupervised learning approaches by up to 24% and have competitive performance compared to fully supervised approaches. We also show that the proposed approach is able to learn highly discriminative features that help improve action recognition when used in a representation learning paradigm.

Characterizing machine learning process: A maturity framework

Academic literature on machine learning modeling fails to address how to make machine learning models work for enterprises. For example, existing machine learning processes cannot address how to define business use cases for an AI application, how to convert business requirements from offering managers into data requirements for data scientists, and how to continuously improve AI applications in term of accuracy and fairness, and how to customize general purpose machine learning models with industry, domain, and use case specific data to make them more accurate for specific situations etc. Making AI work for enterprises requires special considerations, tools, methods and processes. In this paper we present a maturity framework for machine learning model lifecycle management for enterprises. Our framework is a re-interpretation of the software Capability Maturity Model (CMM) for machine learning model development process. We present a set of best practices from our personal experience of building large scale real-world machine learning models to help organizations achieve higher levels of maturity independent of their starting point.

The doctrinal paradox: ROC analysis in a probabilistic framework

The doctrinal paradox is analysed from a probabilistic point of view assuming a simple parametric model for the committee’s behaviour. The well known issue-by-issue and case-by-case majority rules are compared in this model, by means of the concepts of false positive rate (FPR), false negative rate (FNR) and Receiver Operating Characteristics (ROC) space. We introduce also a new rule that we call path-by-path, which is somehow halfway between the other two. Under our model assumptions, the issue-by-issue rule is shown to be the best of the three according to an optimality criterion based in ROC maps, for all values of the model parameters (committee size and competence of its members), when equal weight is given to FPR an FNR. For unequal weights, the relative goodness of the rules depends on the values of the competence and the weights, in a way which is precisely described. The results are illustrated with some numerical examples.

TED: Teaching AI to Explain its Decisions

Artificial intelligence systems are being increasingly deployed due to their potential to increase the efficiency, scale, consistency, fairness, and accuracy of decisions. However, as many of these systems are opaque in their operation, there is a growing demand for such systems to provide explanations for their decisions. Conventional approaches to this problem attempt to expose or discover the inner workings of a machine learning model with the hope that the resulting explanations will be meaningful to the consumer. In contrast, this paper suggests a new approach to this problem. It introduces a simple, practical framework, called Teaching Explanations for Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. We illustrate the generality and effectiveness of this approach with two different examples, resulting in highly accurate explanations with no loss of prediction accuracy for these two examples.

Learning Temporal Point Processes via Reinforcement Learning

Social goods, such as healthcare, smart city, and information networks, often produce ordered event data in continuous time. The generative processes of these event data can be very complex, requiring flexible models to capture their dynamics. Temporal point processes offer an elegant framework for modeling event data without discretizing the time. However, the existing maximum-likelihood-estimation (MLE) learning paradigm requires hand-crafting the intensity function beforehand and cannot directly monitor the goodness-of-fit of the estimated model in the process of training. To alleviate the risk of model-misspecification in MLE, we propose to generate samples from the generative model and monitor the quality of the samples in the process of training until the samples and the real data are indistinguishable. We take inspiration from reinforcement learning (RL) and treat the generation of each event as the action taken by a stochastic policy. We parameterize the policy as a flexible recurrent neural network and gradually improve the policy to mimic the observed event distribution. Since the reward function is unknown in this setting, we uncover an analytic and nonparametric form of the reward function using an inverse reinforcement learning formulation. This new RL framework allows us to derive an efficient policy gradient algorithm for learning flexible point process models, and we show that it performs well in both synthetic and real data.

Dynamic Feature Scaling for K-Nearest Neighbor Algorithm

Nearest Neighbors Algorithm is a Lazy Learning Algorithm, in which the algorithm tries to approximate the predictions with the help of similar existing vectors in the training dataset. The predictions made by the K-Nearest Neighbors algorithm is based on averaging the target values of the spatial neighbors. The selection process for neighbors in the Hermitian space is done with the help of distance metrics such as Euclidean distance, Minkowski distance, Mahalanobis distance etc. A majority of the metrics such as Euclidean distance are scale variant, meaning that the results could vary for different range of values used for the features. Standard techniques used for the normalization of scaling factors are feature scaling method such as Z-score normalization technique, Min-Max scaling etc. Scaling methods uniformly assign equal weights to all the features, which might result in a non-ideal situation. This paper proposes a novel method to assign weights to individual feature with the help of out of bag errors obtained from constructing multiple decision tree models.

PanJoin: A Partition-based Adaptive Stream Join

In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack support for the join predicates on large windows. These algorithms and their hardware accelerators are either limited to equi-join or use a nested loop join to process all the requests. In this paper, we present a new algorithm called PanJoin which has high throughput on large windows and supports both equi-join and non-equi-join. PanJoin implements three new data structures to reduce computations during the probing phase of stream join. We also implement the most hardware-friendly data structure, called BI-Sort, on FPGA. Our evaluation shows that PanJoin outperforms several recently proposed stream join methods by more than 1000x, and it also adapts well to highly skewed data.

Theoretical Analysis of Adversarial Learning: A Minimax Approach

We propose a general theoretical method for analyzing the risk bound in the presence of adversaries. In particular, we try to fit the adversarial learning problem into the minimax framework. We first show that the original adversarial learning problem could be reduced to a minimax statistical learning problem by introducing a transport map between distributions. Then we prove a risk bound for this minimax problem in terms of covering numbers. In contrast to previous minimax bounds in \cite{lee,far}, our bound is informative when the radius of the ambiguity set is small. Our method could be applied to multi-class classification problems and commonly-used loss functions such as hinge loss and ramp loss. As two illustrative examples, we derive the adversarial risk bounds for kernel-SVM and deep neural networks. Our results indicate that a stronger adversary might have a negative impact on the complexity of the hypothesis class and the existence of margin could serve as a defense mechanism to counter adversarial attacks.

A Multi-layer LSTM-based Approach for Robot Command Interaction Modeling

As the first robotic platforms slowly approach our everyday life, we can imagine a near future where service robots will be easily accessible by non-expert users through vocal interfaces. The capability of managing natural language would indeed speed up the process of integrating such platform in the ordinary life. Semantic parsing is a fundamental task of the Natural Language Understanding process, as it allows extracting the meaning of a user utterance to be used by a machine. In this paper, we present a preliminary study to semantically parse user vocal commands for a House Service robot, using a multi-layer Long-Short Term Memory neural network with attention mechanism. The system is trained on the Human Robot Interaction Corpus, and it is preliminarily compared with previous approaches.

Anomaly Detection using Autoencoders in High Performance Computing Systems

Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unhealthy states). We propose a novel approach for anomaly detection in High Performance Computing systems based on a Machine (Deep) Learning technique, namely a type of neural network called autoencoder. The key idea is to train a set of autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes and, after training, use them to identify abnormal conditions. This is different from previous approaches which where based on learning the abnormal condition, for which there are much smaller datasets (since it is very hard to identify them to begin with). We test our approach on a real supercomputer equipped with a fine-grained, scalable monitoring infrastructure that can provide large amount of data to characterize the system behaviour. The results are extremely promising: after the training phase to learn the normal system behaviour, our method is capable of detecting anomalies that have never been seen before with a very good accuracy (values ranging between 88% and 96%).

• Incentivising Participation in Liquid Democracy with Breadth First Delegation• Robustness of the Closest Unstable Equilibrium Point Along a P-V Curve• ADNet: A Deep Network for Detecting Adverts• The largest graphs with given order and diameter: A simple proof• On Stability Condition of Wireless Networked Control Systems under Joint Design of Control Policy and Network Scheduling Policy• Weak convergence of particle swarm optimization• Adaptive model selection method for a conditionally Gaussian semimartingale regression in continuous time• Temporal Graph Convolutional Network for Urban Traffic Flow Prediction Method• On Asymptotic Covariances of A Few Unrotated Factor Solutions• MMALFM: Explainable Recommendation by Leveraging Reviews and Images• Edge directionality properties in complex spherical networks• Distributionally Robust Semi-Supervised Learning for People-Centric Sensing• Clifford-like parallelisms• Learning data augmentation policies using augmented random search• Adaptive Target Recognition: A Case Study Involving Airport Baggage Screening• Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL?• Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations• Localisation, chiral symmetry and confinement in QCD and related theories• Stationary Harmonic Measure as the Scaling Limit of Truncated Harmonic Measure• On the absolute continuity of random nodal volumes• Strong Equivalence for Epistemic Logic Programs Made Easy (Extended Version)• 3s-Unification for Vehicular Headway Modeling• Subsequent Boundary Distance Regression and Pixelwise Classification Networks for Automatic Kidney Segmentation in Ultrasound Images• Large-deviation properties of the largest biconnected component for random graphs• A test case for application of convolutional neural networks to spatio-temporal climate data: Re-identifying clustered weather patterns• Understanding the boosted decision tree methods with the weak-learner approximation• Compliance in Real Time Multiset Rewriting Models• Circuit Depth Reductions• Scattering-free pulse propagation through invisible non-Hermitian disorder• Regularity results of the speed of biased random walks on Galton-Watson trees• Potential Game-Based Non-Myopic Sensor Network Planning for Multi-Target Tracking• Quantum-inspired sublinear classical algorithms for solving low-rank linear systems• On the practice of classification learning for clinical diagnosis and therapy advice in oncology• Generative Dual Adversarial Network for Generalized Zero-shot Learning• Bio-YODIE: A Named Entity Linking System for Biomedical Text• Measures of goodness of fit obtained by canonical transformations on Riemannian manifolds• Pareto-Optimal Allocation of Indivisible Goods with Connectivity Constraints• Comparing Spark vs MPI/OpenMP On Word Count MapReduce• CQASUMM: Building References for Community Question Answering Summarization Corpora• Triangular Ladders $P_{d,2}$ are $e$-positive• Focusing on the Big Picture: Insights into a Systems Approach to Deep Learning for Satellite Imagery• Segue: Overviewing Evolution Patterns of Egocentric Networks by Interactive Construction of Spatial Layouts• Multi-encoder multi-resolution framework for end-to-end speech recognition• Stream attention-based multi-array end-to-end speech recognition• Algorithmic models of human behavior and stochastic optimization• Deep Learning versus Classical Regression for Brain Tumor Patient Survival Prediction• Nonexistence of Bigeodesics in Integrable Models of Last Passage Percolation• Quantum-inspired low-rank stochastic regression with logarithmic dependence on the dimension• Boosting Model Performance through Differentially Private Model Aggregation• Online Timely Status Updates with Erasures for Energy Harvesting Sensors• Analytical Formulation of the Block-Constrained Configuration Model• Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers• Algebraic Many-Body Localization and its implications on information propagation• Modeling and Performance of Uplink Cache-Enabled Massive MIMO Heterogeneous Networks• Molecular computers• A simplifed static frequency converter model for electromechanical transient stability studies of 16$\frac{2}{3}$ Hz railways• The Impact of Timestamp Granularity in Optimistic Concurrency Control• PennyLane: Automatic differentiation of hybrid quantum-classical computations• Eliminating Latent Discrimination: Train Then Mask• p-regularity theory. Applications and developments• Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces• Generalized Ternary Connect: End-to-End Learning and Compression of Multiplication-Free Deep Neural Networks• Prediction of Alzheimer’s disease-associated genes by integration of GWAS summary data and expression data• A Generalized Framework for Approximate Control Variates• OriNet: A Fully Convolutional Network for 3D Human Pose Estimation• A new approach for pedestrian density estimation using moving sensors and computer vision• You Only Live Multiple Times: A Blackbox Solution for Reusing Crash-Stop Algorithms In Realistic Crash-Recovery Settings• Choosing to grow a graph: Modeling network formation as discrete choice• Coordinating Disaster Emergency Response with Heuristic Reinforcement Learning• Blindfold Baselines for Embodied QA• NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification• A Team-Formation Algorithm for Faultline Minimization• Improved Dynamic Memory Network for Dialogue Act Classification with Adversarial Training• Approximation Algorithms for Minimum Norm and Ordered Optimization Problems• Generating faces for affect analysis• LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering• A Review of automatic differentiation and its efficient implementation• Shortcut Graphs and Groups• Finding All Bayesian Network Structures within a Factor of Optimal• Exploiting Local Feature Patterns for Unsupervised Domain Adaptation• Distributed Cooperative Spectrum Sharing in UAV Networks Using Multi-Agent Reinforcement Learning• Electrophysiological indicators of gesture perception• A unified algorithm for the non-convex penalized estimation: The ncpen package• SMERC: Social media event response clustering using textual and temporal information• Multiple-paths $SLE_κ$ in multiply connected domains• Shall I Compare Thee to a Machine-Written Sonnet? An Approach to Algorithmic Sonnet Generation• The first passage time density of Brownian motion and the heat equation with Dirichlet boundary condition in time dependent domains• Private Model Compression via Knowledge Distillation• Regularised Zero-Variance Control Variates• Learning from Binary Multiway Data: Probabilistic Tensor Decomposition and its Statistical Optimality• Task Graph Transformations for Latency Tolerance• A Unified Model for Opinion Target Extraction and Target Sentiment Prediction• Domain Agnostic Real-Valued Specificity Prediction• Nonsingular Gaussian Conditionally Markov Sequences• Parallel Stochastic Asynchronous Coordinate Descent: Tight Bounds on the Possible Parallelism• A General Method for Amortizing Variational Filtering• A SAT+CAS Approach to Finding Good Matrices: New Examples and Counterexamples• A Local Regret in Nonconvex Online Learning• Exploring RNN-Transducer for Chinese Speech Recognition• Balancing Relevance and Diversity in Online Bipartite Matching via Submodularity• Neuroimaging Modality Fusion in Alzheimer’s Classification Using Convolutional Neural Networks• Interpreting Models by Allowing to Ask• Towards the topological recursion for double Hurwitz numbers• A Variational Inference based Detection Method for Repetition Coded Generalized Spatial Modulation• Parametric Shortest Paths in Planar Graphs• Exploiting temporal and depth information for multi-frame face anti-spoofing• Modeling Local Dependence in Natural Language with Multi-channel Recurrent Neural Networks• Fundamental Limits of Exact Support Recovery in High Dimensions• Multi-unit Bilateral Trade• Sensitivity Analysis of a Stationary Point Set Map under Total Perturbations. Part 2: Robinson Stability• Community Exploration: From Offline Optimization to Online Learning• Multiscale Information Storage of Linear Long-Range Correlated Stochastic Processes• M Equilibrium: A dual theory of beliefs and choices in games• Amplitude-Aware Lossy Compression for Quantum Circuit Simulation• Co-Representation Learning For Classification and Novel Class Detection via Deep Networks• Spectral Efficiency Analysis in Presence of Correlated Gamma-Lognormal Desired and Interfering Signals• Sensitivity Analysis of a Stationary Point Set Map under Total Perturbations. Part 1: Lipschitzian Stability• Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models• Application of Faster R-CNN model on Human Running Pattern Recognition• Fast HARQ over Finite Blocklength Codes: A Technique for Low-Latency Reliable Communication• User Demand Based Precoding for G.fast DSL Systems• Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits• On the Throughput of Large-but-Finite MIMO Networks using Schedulers• Protection Placement for State Estimation Measurement Data Integrity• Recurrent Multi-Graph Neural Networks for Travel Cost Prediction• Approximating minimum representations of key Horn functions• Vehicle Re-identification Using Quadruple Directional Deep Learning Features• On Lipschitz-like property for polyhedral moving sets• Nonparametric geometric outlier detection• Optimal extension to Sobolev rough paths• Child Gender Determination with Convolutional Neural Networks on Hand Radio-Graphs• Gradient Harmonized Single-stage Detector• Equilibrium measures on trees• Polynomial Schur’s theorem• On the Polarization Levels of Automorphic-Symmetric Channels• Relating local structures, energies, and occurrence probabilities in a two-dimensional silica network• FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs• Classical Access Structures of Ramp Secret Sharing Based on Quantum Stabilizer Codes• ImageNet/ResNet-50 Training in 224 Seconds• Applications of Littlewood-Richardson tableaux to computing generic extension of semisimple invariant subspaces of nilpotent linear operators• BAN: Focusing on Boundary Context for Object Detection• Interpretable Credit Application Predictions With Counterfactual Explanations• An Online Attention-based Model for Speech Recognition• Probing interacting two-level systems with rare-earth ions• Modular Networks: Learning to Decompose Neural Computation• Modality Attention for End-to-End Audio-visual Speech Recognition• SVM-Based Sea-Surface Small Target Detection: A False-Alarm-Rate-Controllable Approach• Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization• Deep Neural Network Concepts for Background Subtraction: A Systematic Review and Comparative Evaluation• How Secure are Deep Learning Algorithms from Side-Channel based Reverse Engineering?• A conjugate prior for the Dirichlet distribution• Predicting Distresses using Deep Learning of Text Segments in Annual Reports• Towards the Design of Aerostat Wind Turbine Arrays through AI• Intelligent Drone Swarm for Search and Rescue Operations at Sea• Pose Invariant 3D Face Reconstruction• SAFE: Self-Attentive Function Embeddings for Binary Similarity• Genetic algorithm for optimal distribution in cities• Translating Natural Language to SQL using Pointer-Generator Networks and How Decoding Order Matters• Self-Supervised Learning of Depth and Camera Motion from 360° Videos• Improved Fourier Mellin Invariant for Robust Rotation Estimation with Omni-cameras• Detect or Track: Towards Cost-Effective Video Object Detection/Tracking• Spectral Deconfounding and Perturbed Sparse Linear Models• Iteratively Training Look-Up Tables for Network Quantization• Highly Efficient Stepped Wedge Designs for Clusters of Unequal Size• Personal Names Popularity Estimation and its Application to Record Linkage• Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents• Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives• Operator-Valued Matrices with Free or Exchangeable Entries• Comparison of Feature Extraction Methods and Predictors for Income Inference• Quantile regression approach to conditional mode estimation• Sorting out Lipschitz function approximation• On Finding Quantum Multi-collisions• Hallucinating Point Cloud into 3D Sculptural Object• Remarks on a fractional-time stochastic equation• Strong Approximation of Monotone Stochastic Partial Different Equations Driven by Multiplicative Noise• Estimation of urban traffic state with probe vehicles• Advances in sequential measurement and control of open quantum systems• Algorithms for Optimal AC Power Flow in the Presence of Renewable Sources• Embedding Electronic Health Records for Clinical Information Retrieval• Autonomic Intrusion Response in Distributed Computing using Big Data• Multi-task learning for Joint Language Understanding and Dialogue State Tracking• Estimating the Impact of Cyber-Attack Strategies for Stochastic Control Systems• Home Activity Monitoring using Low Resolution Infrared Sensor• Fast Human Pose Estimation• ABox Abduction via Forgetting in ALC (Long Version)• Quickest Detection of Time-Varying False Data Injection Attacks in Dynamic Linear Regression Models• On the Mean Order of Connected Induced Subgraphs of Block Graphs• Deep Object Centric Policies for Autonomous Driving• Robust H-infinity kinematic control of manipulator robots using dual quaternion algebra• Argumentation for Explainable Scheduling (Full Paper with Proofs)• Very Hard Electoral Control Problems• A survey of semidefinite programming approaches to the generalized problem of moments and their error analysis• Cyclic quasi-symmetric functions• Co-regularized Alignment for Unsupervised Domain Adaptation• Higher-Order Cone Programming• New fat-tail normality test based on conditional second moments with applications to finance

Like this:

Like Loading…

Related