Whats new on arXiv

Learning a Policy for Opportunistic Active Learning

Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.

Differentially Private Change-Point Detection

The change-point detection problem seeks to identify distributional changes at an unknown change-point k* in a stream of data. This problem appears in many important practical settings involving personal data, including biosurveillance, fault detection, finance, signal detection, and security systems. The field of differential privacy offers data analysis tools that provide powerful worst-case privacy guarantees. We study the statistical problem of change-point detection through the lens of differential privacy. We give private algorithms for both online and offline change-point detection, analyze these algorithms theoretically, and provide empirical validation of our results.

Generalize Symbolic Knowledge With Neural Rule Engine

Neural-symbolic learning aims to take the advantages of both neural networks and symbolic knowledge to build better intelligent systems. As neural networks have dominated the state-of-the-art results in a wide range of NLP tasks, it attracts considerable attention to improve the performance of neural models by integrating symbolic knowledge. Different from existing works, this paper investigates the combination of these two powerful paradigms from the knowledge-driven side. We propose Neural Rule Engine (NRE), which can learn knowledge explicitly from logic rules and then generalize them implicitly with neural networks. NRE is implemented with neural module networks in which each module represents an action of the logic rule. The experiments show that NRE could greatly improve the generalization abilities of logic rules with a significant increase on recall. Meanwhile, the precision is still maintained at a high level.

IEA: Inner Ensemble Average within a convolutional neural network

Ensemble learning is a method of combining multiple trained models to improve the model accuracy. We introduce the usage of such methods, specifically ensemble average inside Convolutional Neural Networks (CNNs) architectures. By Inner Average Ensemble (IEA) of multiple convolutional neural layers (CNLs) replacing the single CNLs inside the CNN architecture, the accuracy of the CNN increased. A visual and a similarity score analysis of the features generated from IEA explains why it boosts the model performance. Empirical results using different benchmarking datasets and well-known deep model architectures shows that IEA outperforms the ordinary CNL used in CNNs.

Nested multi-instance classification

There are classification tasks that take as inputs groups of images rather than single images. In order to address such situations, we introduce a nested multi-instance deep network. The approach is generic in that it is applicable to general data instances, not just images. The network has several convolutional neural networks grouped together at different stages. This primarily differs from other previous works in that we organize instances into relevant groups that are treated differently. We also introduce a method to replace instances that are missing which successfully creates neutral input instances and consistently outperforms standard fill-in methods in real world use cases. In addition, we propose a method for manual dropout when a whole group of instances is missing that allows us to use richer training data and obtain higher accuracy at the end of training. With specific pretraining, we find that the model works to great effect on our real world and public datasets in comparison to baseline methods, justifying the different treatment among groups of instances.

Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation $90\%$

$1\%$

Understanding Latent Factors Using a GWAP

Recommender systems relying on latent factor models often appear as black boxes to their users. Semantic descriptions for the factors might help to mitigate this problem. Achieving this automatically is, however, a non-straightforward task due to the models’ statistical nature. We present an output-agreement game that represents factors by means of sample items and motivates players to create such descriptions. A user study shows that the collected output actually reflects real-world characteristics of the factors.

Analyze Unstructured Data Patterns for Conceptual Representation

Online news media provides aggregated news and stories from different sources all over the world and up-to-date news coverage. The main goal of this study is to have a solution that considered as a homogeneous source for the news and to represent the news in a new conceptual framework. Furthermore, the user can easily find different updated news in a fast way through the designed interface. The Mobile App implementation is based on modeling the multi-level conceptual analysis discipline. Discovering main concepts of any domain is captured from the hidden unstructured data that are analyzed by the proposed solution. Concepts are discovered through analyzing data patterns to be structured into a tree-based interface for easy navigation for the end user, through the discovered news concepts. Our final experiment results showing that analyzing the news before displaying to the end-user and restructuring the final output in a conceptual multilevel structure, that producing new display frame for the end user to find the related information to his interest.

Reasoning about Actions and State Changes by Injecting Commonsense Knowledge

Comprehending procedural text, e.g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered. Although several recent systems have shown impressive progress in this task, their predictions can be globally inconsistent or highly improbable. In this paper, we show how the predicted effects of actions in the context of a paragraph can be improved in two ways: (1) by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and (2) by biasing reading with preferences from large-scale corpora (e.g., trees rarely move). Unlike earlier methods, we treat the problem as a neural structured prediction task, allowing hard and soft constraints to steer the model away from unlikely predictions. We show that the new model significantly outperforms earlier systems on a benchmark dataset for procedural text comprehension (+8% relative gain), and that it also avoids some of the nonsensical predictions that earlier systems make.

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it possible to explicitly reference existing code examples within a neural code generation model. First, we retrieve sentences that are similar to input sentences using a dynamic-programming-based sentence similarity scoring method. Next, we extract n-grams of action sequences that build the associated abstract syntax tree. Finally, we increase the probability of actions that cause the retrieved n-gram action subtree to be in the predicted code. We show that our approach improves the performance on two code generation tasks by up to +2.6 BLEU.

Modeling OWL with Rules: The ROWL Protege Plugin

In our experience, some ontology users find it much easier to convey logical statements using rules rather than OWL (or description logic) axioms. Based on recent theoretical developments on transformations between rules and description logics, we develop ROWL, a Protege plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule.

Rule-based OWL Modeling with ROWLTab Protege Plugin

It has been argued that it is much easier to convey logical statements using rules rather than OWL (or description logic (DL)) axioms. Based on recent theoretical developments on transformations between rules and DLs, we have developed ROWLTab, a Protege plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL 2 DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule. In this paper, we present ROWLTab, together with a user evaluation of its effectiveness compared to entering axioms using the standard Protege interface. Our evaluation shows that modeling with ROWLTab is much quicker than the standard interface, while at the same time, also less prone to errors for hard modeling tasks.

An Introduction to Inductive Statistical Inference — from Parameter Estimation to Decision-Making

These lecture notes aim at a post-Bachelor audience with a backgound at an introductory level in Applied Mathematics and Applied Statistics. They discuss the logic and methodology of the Bayes-Laplace approach to inductive statistical inference that places common sense and the guiding lines of the scientific method at the heart of systematic analyses of quantitative-empirical data. Following an exposition of exactly solvable cases of single- and two-parameter estimation, the main focus is laid on Markov Chain Monte Carlo (MCMC) simulations on the basis of Gibbs sampling and Hamiltonian Monte Carlo sampling of posterior joint probability distributions for regression parameters occurring in generalised linear models. The modelling of fixed as well as of varying effects (varying intercepts) is considered, and the simulation of posterior predictive distributions is outlined. The issues of model comparison with Bayes factors and the assessment of models’ relative posterior predictive accuracy with information entropy-based criteria DIC and WAIC are addressed. Concluding, a conceptual link to the behavioural subjective expected utility representation of a single decision-maker’s choice behaviour in static one-shot decision problems is established. Codes for MCMC simulations of multi-dimensional posterior joint probability distributions with the JAGS and Stan packages implemented in the statistical software R are provided. The lecture notes are fully hyperlinked. They direct the reader to original scientific research papers and to pertinent biographical information.

Gaussian Mixture Generative Adversarial Networks for Diverse Datasets, and the Unsupervised Clustering of Images

Generative Adversarial Networks (GANs) have been shown to produce realistically looking synthetic images with remarkable success, yet their performance seems less impressive when the training set is highly diverse. In order to provide a better fit to the target data distribution when the dataset includes many different classes, we propose a variant of the basic GAN model, called Gaussian Mixture GAN (GM-GAN), where the probability distribution over the latent space is a mixture of Gaussians. We also propose a supervised variant which is capable of conditional sample synthesis. In order to evaluate the model’s performance, we propose a new scoring method which separately takes into account two (typically conflicting) measures – diversity vs. quality of the generated data. Through a series of empirical experiments, using both synthetic and real-world datasets, we quantitatively show that GM-GANs outperform baselines, both when evaluated using the commonly used Inception Score, and when evaluated using our own alternative scoring method. In addition, we qualitatively demonstrate how the \textit{unsupervised} variant of GM-GAN tends to map latent vectors sampled from different Gaussians in the latent space to samples of different classes in the data space. We show how this phenomenon can be exploited for the task of unsupervised clustering, and provide quantitative evaluation showing the superiority of our method for the unsupervised clustering of image datasets. Finally, we demonstrate a feature which further sets our model apart from other GAN models: the option to control the quality-diversity trade-off by altering, post-training, the probability distribution of the latent space. This allows one to sample higher quality and lower diversity samples, or vice versa, according to one’s needs.

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov’s accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.

Towards Reproducible Empirical Research in Meta-Learning

Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describe properties of the data which are predictive for the performance of machine learning algorithms trained on them. Unfortunately, despite being used in a large number of studies, meta-features are not uniformly described and computed, making many empirical studies irreproducible and hard to compare. This paper aims to remedy this by systematizing and standardizing data characterization measures used in meta-learning, and performing an in-depth analysis of their utility. Moreover, it presents MFE, a new tool for extracting meta-features from datasets and identify more subtle reproducibility issues in the literature, proposing guidelines for data characterization that strengthen reproducible empirical research in meta-learning.

• Reinforcement Learning Testbed for Power-Consumption Optimization• Learning End-to-End Goal-Oriented Dialog with Multiple Answers• On the Performance of a Relay-Assisted Multi-Hop Asymmetric FSO/RF Communication System over Negative Exponential atmospheric turbulence with the effect of pointing error• Gallai-Ramsey numbers of $C_{10}$ and $C_{12}$• QuasarNET: Human-level spectral classification and redshifting with Deep Neural Networks• On the Wiener Index of Uniform Unicyclic Hypergraphs• A study of integer sorting on multicores• Symbolic regression based genetic approximations of the Colebrook equation for flow friction• Semi-Metrification of the Dynamic Time Warping Distance• Centroid estimation based on symmetric KL divergence for Multinomial text classification problem• ABHY Associahedra and Newton polytopes of $F$-polynomials for finite type cluster algebras• Submodular Maximization with Packing Constraints in Parallel• MemComputing Integer Linear Programming• Grammar Induction with Neural Language Models: An Unusual Replication• Interpretable Intuitive Physics Model• Correcting Length Bias in Neural Machine Translation• Fast and accessible first-principles calculations of vibrational properties of materials• Group calibration is a byproduct of unconstrained learning• Consistent Sampling with Replacement• Note on the group edge irregularity strength of graphs• Adaptative significance levels in normal mean hypothesis testing• Model Predictive Control for Regular Linear Systems• Hard Non-Monotonic Attention for Character-Level Transduction• Physically-inspired Gaussian processes for transcriptional regulation in Drosophila melanogaster• Recommendation Through Mixtures of Heterogeneous Item Relationships• The Impact of Preprocessing on Deep Representations for Iris Recognition on Unconstrained Environments• The Fundamental Morphism Theorem in the Categories of Graphs & Graph Reconstruction• Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds• AAD: Adaptive Anomaly Detection through traffic surveillance videos• Improved Upper Bounds for Gallai-Ramsey Numbers of Odd Cycles• Zero-Shot Adaptive Transfer for Conversational Language Understanding• Quadratic Discriminant Analysis under Moderate Dimension• A polynomial-time algorithm for median-closed semilinear constraints• Super-Resolution for Hyperspectral and Multispectral Image Fusion Accounting for Seasonal Spectral Variability• Rational Neural Networks for Approximating Jump Discontinuities of Graph Convolution Operator• The generalized connectivity of some regular graphs• Towards Effective Deep Embedding for Zero-Shot Learning• Discriminative Learning of Similarity and Group Equivariant Representations• DCSM Protocol for Content Transfer in Deep Space Network• Decentralized Detection with Robust Information Privacy Protection• Differential and integral invariants under Mobius transformation• Artifacts Detection and Error Block Analysis from Broadcasted Videos• Maximum likelihood estimator and its consistency for an $(L,1)$ random walk in a parametric random environment• CNN-PS: CNN-based Photometric Stereo for General Non-Convex Surfaces• Robust Wireless Body Area Networks Coexistence: A Game Theoretic Approach to Time-Division MAC• Profiling and Improving the Duty-Cycling Performance of Linux-based IoT Devices• Optimality conditions for approximate Pareto solutions of a nonsmooth vector optimization problem with an infinite number of constraints• DP-ADMM: ADMM-based Distributed Learning with Differential Privacy• OWLAx: A Protege Plugin to Support Ontology Axiomatization through Diagramming• Geometric Kinematic Control of a Spherical Rolling Robot• Story Ending Generation with Incremental Encoding and Commonsense Knowledge• Space-Time Block Coding Based Beamforming for Beam Squint Compensation• A combinatorial property of flows on a cycle• ExpIt-OOS: Towards Learning from Planning in Imperfect Information Games• A Divergence Proof for Latuszynski’s Counter-Example Approaching Infinity with Probability ‘Near’ One• Learning Neural Templates for Text Generation• Bipartite Ramsey numbers of large cycles• Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis• Reducing post-surgery recovery bed occupancy through an analytical prediction model• The real-time reactive surgical case sequencing problem• Baidu Apollo Auto-Calibration System – An Industry-Level Data-Driven and Learning based Vehicle Longitude Dynamic Calibrating Algorithm• Recognizing Generating Subgraphs in Graphs without Cycles of Lengths 6 and 7• The reactive multiple operating room surgical case sequencing problem• Direct Output Connection for a High-Rank Language Model• Dense Scene Flow from Stereo Disparity and Optical Flow• VirtualIdentity: Privacy-Preserving User Profiling• Optimal Control of the Linear Wave Equation by Time-Depending BV-Controls: A Semi-Smooth Newton Approach• Time-Reversal of Coalescing Diffusive Flows and Weak Convergence of Localized Disturbance Flows• Uncovering intimate and casual relationships from mobile phone communication• A Variational Feature Encoding Method of 3D Object for Probabilistic Semantic SLAM• Minimal inference from incomplete 2×2-tables• Optimal shrinkage covariance matrix estimation under random sampling from elliptical distributions• Sensitivity, Affine Transforms and Quantum Communication Complexity• Towards a Better Metric for Evaluating Question Generation Systems• Pronoun Translation in English-French Machine Translation: An Analysis of Error Types• Leadership in Singleton Congestion Games: What is Hard and What is Easy• Outage Probability of Millimeter Wave Cellular Uplink with Truncated Power Control• Minimal forward random point attractors need not exist• A List of Problems on the Reverse Mathematics of Ramsey Theory on the Rado Graph and on Infinite, Finitely Branching Trees• Automated Scene Flow Data Generation for Training and Verification• Learning to adapt: a meta-learning approach for speaker adaptation• Deciding Robust Feasibility and Infeasibility Using a Set Containment Approach: An Application to Stationary Passive Gas Network Operations• Comparative Studies of Detecting Abusive Language on Twitter• Capacity of Locally Recoverable Codes• Multi-Source Syntactic Neural Machine Translation• Acquiring Annotated Data with Cross-lingual Explicitation for Implicit Discourse Relation Classification• Hybrid Joint Diagonalization Algorithms• Self-stabilizing Overlays for high-dimensional Monotonic Searchability• Diagrammatic proof of the large $N$ melonic dominance in the SYK model• Fully Dynamic MIS in Uniformly Sparse Graphs• PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors• Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems• A Coordinate-Free Construction of Scalable Natural Gradient• A categorification of biclosed sets of strings• Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features• A structure theorem for stochastic processes indexed by the discrete hypercube• An Exponential Cox-Ingersoll-Ross Process as Discounting Factor• Asymptotic opitmality of degree-greedy discovering of independent sets in Configuration Model graphs• Pathwise Uniqueness for SDEs with Singular Drift and Nonconstant Diffusion: A simple proof• Algorithms and Bounds for Drawing Directed Graphs• Parametric Topology Optimization with Multi-Resolution Finite Element Models• Robot_gym: accelerated robot training through simulation in the cloud with ROS and Gazebo• Improved approximation algorithms for hitting 3-vertex paths• Metallic glasses for spintronics: anomalous temperature dependence and giant enhancement of inverse spin Hall effect• High-Performance Multi-Mode Ptychography Reconstruction on Distributed GPUs• Asymptotic analysis of the Friedkin-Johnsen model when the matrix of the susceptibility weights approaches the identity matrix• $K_4$-subdivisions have the edge-Erdös-Pósa property• Deep Chronnectome Learning via Full Bidirectional Long Short-Term Memory Networks for MCI Diagnosis• Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision• Lyashko-Looijenga morphisms and primitive factorizations of the Coxeter element• Modeling Empathy and Distress in Reaction to News Stories• A Radix-M Construction for Complementary Sets• Local bounds for stochastic reaction diffusion equations• Accelerating Parallel Tempering: Quantile Tempering Algorithm (QuanTA)• On Subadditive Duality for Conic Mixed-Integer Programs• Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation• Ramsey problems for Berge hypergraphs• Geometry of $\ell_p^n$-balls: Classical results and recent developments• Bifurcations in the time-delayed Kuramoto model of coupled oscillators: Exact results• iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection

Like this:

Like Loading…

Related