On Meta-Learning for Dynamic Ensemble Selection
In this paper, we propose a novel dynamic ensemble selection framework using meta-learning. The framework is divided into three steps. In the first step, the pool of classifiers is generated from the training data. The second phase is responsible to extract the meta-features and train the meta-classifier. Five distinct sets of meta-features are proposed, each one corresponding to a different criterion to measure the level of competence of a classifier for the classification of a given query sample. The meta-features are computed using the training data and used to train a meta-classifier that is able to predict whether or not a base classifier from the pool is competent enough to classify an input instance. Three different training scenarios for the training of the meta-classifier are considered: problem-dependent, problem-independent and hybrid. Experimental results show that the problem-dependent scenario provides the best result. In addition, the performance of the problem-dependent scenario is strongly correlated with the recognition rate of the system. A comparison with state-of-the-art techniques shows that the proposed-dependent approach outperforms current dynamic ensemble selection techniques.
Augmenting Compositional Models for Knowledge Base Completion Using Gradient Representations
Neural models of Knowledge Base data have typically employed compositional representations of graph objects: entity and relation embeddings are systematically combined to evaluate the truth of a candidate Knowedge Base entry. Using a model inspired by Harmonic Grammar, we propose to tokenize triplet embeddings by subjecting them to a process of optimization with respect to learned well-formedness conditions on Knowledge Base triplets. The resulting model, known as Gradient Graphs, leads to sizable improvements when implemented as a companion to compositional models. Also, we show that the ‘supracompositional’ triplet token embeddings it produces have interpretable properties that prove helpful in performing inference on the resulting triplet representations.
The estimation of bias and variance in clustering coefficient streaming algorithms
Clustering coefficient is one of the most important metrics to understand the complex structure of networks. This paper addresses the estimation of clustering coefficient in network streams. There have been substantial work in this area, most of conducting empirical comparisons of various algorithms. The variance and the bias of the estimators have not been quantified. Starting with a simple yet powerful streaming algorithm, we derived the variance and bias for the estimator, and the estimators for the variances and bias. More importantly, we simplify the estimators so that it can be used in practice. The variance and bias estimators are verified extensively on 49 real networks.
Topological Approaches to Deep Learning
We perform topological data analysis on the internal states of convolutional deep neural networks to develop an understanding of the computations that they perform. We apply this understanding to modify the computations so as to (a) speed up computations and (b) improve generalization from one data set of digits to another. One byproduct of the analysis is the production of a geometry on new sets of features on data sets of images, and use this observation to develop a methodology for constructing analogues of CNN’s for many other geometries, including the graph structures constructed by topological data analysis.
Adversarial Gain
Adversarial examples can be defined as inputs to a model which induce a mistake – where the model output is different than that of an oracle, perhaps in surprising or malicious ways. Original models of adversarial attacks are primarily studied in the context of classification and computer vision tasks. While several attacks have been proposed in natural language processing (NLP) settings, they often vary in defining the parameters of an attack and what a successful attack would look like. The goal of this work is to propose a unifying model of adversarial examples suitable for NLP tasks in both generative and classification settings. We define the notion of adversarial gain: based in control theory, it is a measure of the change in the output of a system relative to the perturbation of the input (caused by the so-called adversary) presented to the learner. This definition, as we show, can be used under different feature spaces and distance conditions to determine attack or defense effectiveness across different intuitive manifolds. This notion of adversarial gain not only provides a useful way for evaluating adversaries and defenses, but can act as a building block for future work in robustness under adversaries due to its rooted nature in stability and manifold theory.
Towards Unsupervised Speech-to-Text Translation
We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language. As opposed to traditional cascaded systems and end-to-end architectures, our system does not require any labeled data (i.e., transcribed source audio or parallel source and target text corpora) during training, making it especially applicable to language pairs with very few or even zero bilingual resources. The framework initializes the ST system with a cross-modal bilingual dictionary inferred from the monolingual corpora, that maps every source speech segment corresponding to a spoken word to its target text translation. For unseen source speech utterances, the system first performs word-by-word translation on each speech segment in the utterance. The translation is improved by leveraging a language model and a sequence denoising autoencoder to provide prior knowledge about the target language. Experimental results show that our unsupervised system achieves comparable BLEU scores to supervised end-to-end models despite the lack of supervision. We also provide an ablation analysis to examine the utility of each component in our system.
Nonlinear Collaborative Scheme for Deep Neural Networks
Conventional research attributes the improvements of generalization ability of deep neural networks either to powerful optimizers or the new network design. Different from them, in this paper, we aim to link the generalization ability of a deep network to optimizing a new objective function. To this end, we propose a \textit{nonlinear collaborative scheme} for deep network training, with the key technique as combining different loss functions in a nonlinear manner. We find that after adaptively tuning the weights of different loss functions, the proposed objective function can efficiently guide the optimization process. What is more, we demonstrate that, from the mathematical perspective, the nonlinear collaborative scheme can lead to (i) smaller KL divergence with respect to optimal solutions; (ii) data-driven stochastic gradient descent; (iii) tighter PAC-Bayes bound. We also prove that its advantage can be strengthened by nonlinearity increasing. To some extent, we bridge the gap between learning (i.e., minimizing the new objective function) and generalization (i.e., minimizing a PAC-Bayes bound) in the new scheme. We also interpret our findings through the experiments on Residual Networks and DenseNet, showing that our new scheme performs superior to single-loss and multi-loss schemes no matter with randomization or not.
An Analysis of Centrality Measures for Complex and Social Networks
Measures of complex network analysis, such as vertex centrality, have the potential to unveil existing network patterns and behaviors. They contribute to the understanding of networks and their components by analyzing their structural properties, which makes them useful in several computer science domains and applications. Unfortunately, there is a large number of distinct centrality measures and little is known about their common characteristics in practice. By means of an empirical analysis, we aim at a clear understanding of the main centrality measures available, unveiling their similarities and differences in a large number of distinct social networks. Our experiments show that the vertex centrality measures known as information, eigenvector, subgraph, walk betweenness and betweenness can distinguish vertices in all kinds of networks with a granularity performance at 95%, while other metrics achieved a considerably lower result. In addition, we demonstrate that several pairs of metrics evaluate the vertices in a very similar way, i.e. their correlation coefficient values are above 0.7. This was unexpected, considering that each metric presents a quite distinct theoretical and algorithmic foundation. Our work thus contributes towards the development of a methodology for principled network analysis and evaluation.
IteRank: An iterative network-oriented approach to neighbor-based collaborative ranking
Neighbor-based collaborative ranking (NCR) techniques follow three consecutive steps to recommend items to each target user: first they calculate the similarities among users, then they estimate concordance of pairwise preferences to the target user based on the calculated similarities. Finally, they use estimated pairwise preferences to infer the total ranking of items for the target user. This general approach faces some problems as the rank data is usually sparse as users usually have compared only a few pairs of items and consequently, the similarities among users is calculated based on limited information and is not accurate enough for inferring true values of preference concordance and can lead to an invalid ranking of items. This article presents a novel framework, called IteRank, that models the data as a bipartite network containing users and pairwise preferences. It then simultaneously refines users’ similarities and preferences’ concordances using a random walk method on this graph structure. It uses the information in this first step in another network structure for simultaneously adjusting the concordances of preferences and rankings of items. Using this approach, IteRank can overcome some existing problems caused by the sparsity of the data. Experimental results show that IteRank improves the performance of recommendation compared to the state of the art NCR techniques that use the traditional NCR framework for recommendation.
• A Data-Driven Approach for Estimating Customer Contribution to System Peak Demand• Word Mover’s Embedding: From Word2Vec to Document Embedding• Weakly Supervised Grammatical Error Correction using Iterative Decoding• Fashionable Modelling with Flux• Issues in the software implementation of stochastic numerical Runge-Kutta• Memory footprint reduction for the FFT-based volume integral equation method via tensor decompositions• AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks• An Adaptive Pruning Algorithm for Spoofing Localisation Based on Tropical Geometry• Lower and upper bounds for strong approximation errors for numerical approximations of stochastic heat equations• Unique Information and Secret Key Agreement• Network Slicing with Mobile Edge Computing for Micro-Operator Networks in Beyond 5G• META-DES.H: a dynamic ensemble selection technique using meta-learning and a dynamic weighting approach• Capture and Recovery of Connected Vehicle Data: A Compressive Sensing Approach• A homogeneous polynomial associated with general hypergraphs and its applications• Cycle-consistency training for end-to-end speech recognition• Transductive Learning with String Kernels for Cross-Domain Text Classification• The Hard-CoRe Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution• From multiline queues to Macdonald polynomials via the exclusion process• AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection• ISA4ML: Training Data-Unaware Imperceptible Security Attacks on Machine Learning Modules of Autonomous Vehicles• Limit theorems for the tagged particle in exclusion processes on regular trees• Scalable Deep $k$-Subspace Clustering• Cylindric rhombic tableaux and the two-species ASEP on a ring• What evidence does deep learning model use to classify Skin Lesions?• Minimax Estimation of Neural Net Distance• On the sets of $n$ points forming $n+1$ directions• SPECTRE: Seedless Network Alignment via Spectral Centralities• Semidefinite relaxations for certifying robustness to adversarial examples• Wonderful models for generalized Dowling arrangements• The Goldenshluger-Lepski Method for Constrained Least-Squares Estimators over RKHSs• Augmenting Neural Response Generation with Context-Aware Topical Attention• Neural Machine Translation into Language Varieties• Real-time Magnetometer Disturbance Estimation via Online Nonlinear Programming• 3D Pick & Mix: Object Part Blending in Joint Shape and Image Manifolds• RSVP-graphs: Fast High-dimensional Covariance Matrix Estimation under Latent Confounding• Dynamic Pricing under a Static Calendar• Location Estimation and Detection in Wireless Sensor Networks in the Presence of Fading• Ischemic Stroke Lesion Segmentation in CT Perfusion Scans using Pyramid Pooling and Focal Loss• Inner-Approximating Reachable Sets for Polynomial Systems with Time-Varying Uncertainties• Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks• Value-based Search in Execution Space for Mapping Instructions to Programs• Efficient Marginalization-based MCMC Methods for Hierarchical Bayesian Inverse Problems• Unifying Isolated and Overlapping Audio Event Detection with Multi-Label Multi-Task Convolutional Recurrent Neural Networks• Beyond Equal-Length Snippets: How Long is Sufficient to Recognize an Audio Scene?• Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization• Some Random Paths with Angle Constraints• A simplified disproof of Beck’s three permutations conjecture and an application to root-mean-squared discrepancy• Machine learning architectures to predict motion sickness using a Virtual Reality rollercoaster simulation tool• Diversity• Neural Task Representations as Weak Supervision for Model Agnostic Cross-Lingual Transfer• Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation• Semiparametric Mixture Regression with Unspecified Error Distributions• Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs• Optimal Sequence Length Requirements for Phylogenetic Tree Reconstruction with Indels• Unsupervised Hyperalignment for Multilingual Word Embeddings• Exploiting Explicit Paths for Multi-hop Reading Comprehension• Efficient Projection onto the Perfect Phylogeny Model• The Burst Failure Influence on the $H_\infty$ Norm• VIREL: A Variational Inference Framework for Reinforcement Learning• Content preserving text generation with attribute controls• Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings• Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary• Closed-Loop GAN for continual Learning• SafeRoute: Learning to Navigate Streets Safely in an Urban Environment• Predictive Deployment of UAV Base Stations in Wireless Networks: Machine Learning Meets Contract Theory• Optimal multiplexing of sparse controllers for linear systems• Accurate, Energy-Efficient, Decentralized, Single-Hop, Asynchronous Time Synchronization Protocols for Wireless Sensor Networks• Identifying and Controlling Important Neurons in Neural Machine Translation• Boosted Sparse and Low-Rank Tensor Regression• Understanding and Comparing Scalable Gaussian Process Regression for Big Data• Fast Integrity Verification for High-Speed File Transfers• Improved approximation algorithms for path vertex covers in regular graphs• Optimal Power Flow: An Introduction to Predictive, Distributed and Stochastic Control Challenges• Convergence of the Deep BSDE Method for Coupled FBSDEs• Learning Representations from Product Titles for Modeling Large-scale Transaction Logs• Radius-margin bounds for deep neural networks• Nonparallel Emotional Speech Conversion• An Efficient Hybrid Beamforming Design for Massive MIMO Receive Systems via SINR Maximization Based on an Improved Bat Algorithm• Smoothed Analysis of the Art Gallery Problem• Large-scale Heteroscedastic Regression via Gaussian Process• Stochastic Primal-Dual Method for Empirical Risk Minimization with $\mathcal{O}(1)$ Per-Iteration Complexity• Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study• Stability Analysis for Switched Systems with Sequence-based Average Dwell Time• Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs• Biconvex Landscape In SDP-Related Learning• Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study• DUNet: A deformable network for retinal vessel segmentation• Hermite-Gaussian model for quantum states• Optimal Rank and Select Queries on Dictionary-Compressed Text• Reliable graph-based collaborative ranking• The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning• Learning to Defense by Learning to Attack• Learning sparse mixtures of rankings from noisy information• Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints• CAAD 2018: Powerful None-Access Black-Box Attack Based on Adversarial Transformation Network• Multidimensional segment trees can do range queries and updates in logarithmic time• Farey boat I. Continued fractions and triangulations, modular group and polygon dissections• Hardness of computing and approximating predicates and functions with leaderless population protocols• Relation Mention Extraction from Noisy Data with Hierarchical Reinforcement Learning• Deep Learning based Computer-Aided Diagnosis Systems for Diabetic Retinopathy: A Survey• Wizard of Wikipedia: Knowledge-Powered Conversational agents• Unfolding with Gaussian Processes• Stochastic Neighbor Embedding under f-divergences• Compressed Multiple Pattern Matching• Dynamic Feature Acquisition Using Denoising Autoencoders• Recovery of compressively sensed ultrasound images with structured Sparse Bayesian Learning• QuickXsort – A Fast Sorting Scheme in Theory and Practice• Canonical Least Favorable Submodels:A New TMLE Procedure for Multidimensional Parameters• Legible Normativity for AI Alignment: The Value of Silly Rules• ReXCam: Resource-Efficient, Cross-Camera Video Analytics at Enterprise Scale• Cooperative Search Games: Symmetric Equilibria, Robustness, and Price of Anarchy• Challenges in detecting evolutionary forces in language change using diachronic corpora• Nonparametric Spectral Methods for Multivariate Spatial and Spatial-Temporal Data• Partitions of Matrix Spaces With an Application to $q$-Rook Polynomials• Towards Sparse Hierarchical Graph Classifiers• Auto-ML Deep Learning for Rashi Scripts OCR• Geometry-Aware Recurrent Neural Networks for Active Visual Recognition• Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes• Tight complexity lower bounds for integer linear programming with few constraints• Beyond single-threshold searches: the Event Stacking Test• Inexact alternating projections on nonconvex sets• SimplerVoice: A Key Message & Visual Description Generator System for Illiteracy• Instrumental Variable Methods using Dynamic Interventions• Space-Time Sampling for Network Observability• Block-wise Partitioning for Extreme Multi-label Classification• A dataset for benchmarking vision-based localization at intersections• Singular Optimal Controls of Stochastic Recursive Systems and Hamilton-Jacobi-Bellman Inequality• Adversarial Black-Box Attacks for Automatic Speech Recognition Systems Using Multi-Objective Genetic Optimization• Lower Bounds for External Memory Integer Sorting via Network Coding• Modeling Traffic Networks Using Integrated Route and Link Data• Modeling Stated Preference for Mobility-on-Demand Transit: A Comparison of Machine Learning and Logit Models• A Batched Scalable Multi-Objective Bayesian Optimization Algorithm• RA-UNet: A hybrid deep attention-aware network to extract liver and tumor in CT scans• Elastic CRFs for Open-ontology Slot Filling• Validated Asynchronous Byzantine Agreement with Optimal Resilience and Asymptotically Optimal Time and Word Communication• Improving GAN with neighbors embedding and gradient matching• Bi-Real Net: Binarizing Deep Network Towards Real-Network Performance• A Function Fitting Method• Deep Robust Framework for Protein Function Prediction using Variable-Length Protein Sequences• Learning to Embed Probabilistic Structures Between Deterministic Chaos and Random Process in a Variational Bayes Predictive-Coding RNN• Subcarrier Multiplexing for Parallel Data Transmission in Indoor Visible Light Communication Systems• WDM for Multi-user Indoor VLC Systems with SCM• Underwater Single Image Color Restoration Using Haze-Lines and a New Quantitative Dataset• Exploring the Relation Between Two Levels of Scheduling Using a Novel Simulation Approach• Some Results on the Power of Nondeterministic Computation• Structure and Content of the Visible Darknet• Size-Degree Trade-Offs for Sums-of-Squares and Positivstellensatz Proofs• Channel input adaptation via natural type selection• Semi-Supervised Confidence Network aided Gated Attention based Recurrent Neural Network for Clickbait Detection• Multiuser Wirelessly Powered Backscatter Communications: Nonlinearity, Waveform Design and SINR-Energy Tradeoff• Bounds on Capacity Region of Optical Intensity Multiple Access Channel• Symmetric simple exclusion process in dynamic environment: hydrodynamics• Weak universality of the dynamical $Φ_3^4$ model on the whole space• Transient Stability Analysis of Power Systems via Occupation Measures• Supervised learning of an opto-magnetic neural network with ultrashort laser pulses• Investigating context features hidden in End-to-End TTS• A Hybrid Approach to Joint Estimation of Channel and Antenna impedance
Like this:
Like Loading…
Related