t-Exponential Memory Networks for Question-Answering Machines
Recent advances in deep learning have brought to the fore models that can make multiple computational steps in the service of completing a task; these are capable of describ- ing long-term dependencies in sequential data. Novel recurrent attention models over possibly large external memory modules constitute the core mechanisms that enable these capabilities. Our work addresses learning subtler and more complex underlying temporal dynamics in language modeling tasks that deal with sparse sequential data. To this end, we improve upon these recent advances, by adopting concepts from the field of Bayesian statistics, namely variational inference. Our proposed approach consists in treating the network parameters as latent variables with a prior distribution imposed over them. Our statistical assumptions go beyond the standard practice of postulating Gaussian priors. Indeed, to allow for handling outliers, which are prevalent in long observed sequences of multivariate data, multivariate t-exponential distributions are imposed. On this basis, we proceed to infer corresponding posteriors; these can be used for inference and prediction at test time, in a way that accounts for the uncertainty in the available sparse training data. Specifically, to allow for our approach to best exploit the merits of the t-exponential family, our method considers a new t-divergence measure, which generalizes the concept of the Kullback-Leibler divergence. We perform an extensive experimental evaluation of our approach, using challenging language modeling benchmarks, and illustrate its superiority over existing state-of-the-art techniques.
Deep Priority Hashing
Deep hashing enables image retrieval by end-to-end learning of deep representations and hash codes from training data with pairwise similarity information. Subject to the distribution skewness underlying the similarity information, most existing deep hashing methods may underperform for imbalanced data due to misspecified loss functions. This paper presents Deep Priority Hashing (DPH), an end-to-end architecture that generates compact and balanced hash codes in a Bayesian learning framework. The main idea is to reshape the standard cross-entropy loss for similarity-preserving learning such that it down-weighs the loss associated to highly-confident pairs. This idea leads to a novel priority cross-entropy loss, which prioritizes the training on uncertain pairs over confident pairs. Also, we propose another priority quantization loss, which prioritizes hard-to-quantize examples for generation of nearly lossless hash codes. Extensive experiments demonstrate that DPH can generate high-quality hash codes and yield state-of-the-art image retrieval results on three datasets, ImageNet, NUS-WIDE, and MS-COCO.
Fast and Accurate Graph Stream Summarization
A graph stream is a continuous sequence of data items, in which each item indicates an edge, including its two endpoints and edge weight. It forms a dynamic graph that changes with every item in the stream. Graph streams play important roles in cyber security, social networks, cloud troubleshooting systems and other fields. Due to the vast volume and high update speed of graph streams, traditional data structures for graph storage such as the adjacency matrix and the adjacency list are no longer sufficient. However, prior art of graph stream summarization, like CM sketches, gSketches, TCM and gMatrix, either supports limited kinds of queries or suffers from poor accuracy of query results. In this paper, we propose a novel Graph Stream Sketch (GSS for short) to summarize the graph streams, which has the linear space cost (O( | E | ), E is the edge set of the graph) and the constant update time complexity (O(1)) and supports all kinds of queries over graph streams with the controllable errors. Both theoretical analysis and experiment results confirm the superiority of our solution with regard to the time/space complexity and query results’ precision compared with the state-of-the-art. |
Robust Factorization and Completion of Streaming Tensor Data via Variational Bayesian Inference
Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. In many applications the full tensor is not known, but instead received in a slice-by-slice manner over time. Streaming factorizations aim to take advantage of inherent temporal relationships in data analytics. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them susceptible to outliers and vulnerable to over-fitting. This paper presents the first Bayesian robust streaming tensor factorization model. Our model successfully identifies sparse outliers, automatically determines the underlying tensor rank and accurately fits low-rank structure. We implement our model in Matlab and compare it to existing algorithms. Our algorithm is applied to factorize and complete various streaming tensors including synthetic data, dynamic MRI, video sequences, and Internet traffic data.
Coverage-Guided Fuzzing for Deep Neural Networks
In company with the data explosion over the past decade, deep neural network (DNN) based software has experienced unprecedented leap and is becoming the key driving force of many novel industrial applications, including many safety-critical scenarios such as autonomous driving. Despite great success achieved in various human intelligence tasks, similar to traditional software, DNNs could also exhibit incorrect behaviors caused by hidden defects causing severe accidents and losses. In this paper, we propose an automated fuzz testing framework for hunting potential defects of general-purpose DNNs. It performs metamorphic mutation to generate new semantically preserved tests, and leverages multiple plugable coverage criteria as feedback to guide the test generation from different perspectives. To be scalable towards practical-sized DNNs, our framework maintains tests in batch, and prioritizes the tests selection based on active feedback. The effectiveness of our framework is extensively investigated on 3 popular datasets (MNIST, CIFAR-10, ImageNet) and 7 DNNs with diverse complexities, under large set of 6 coverage criteria as feedback. The large-scale experiments demonstrate that our fuzzing framework can (1) significantly boost the coverage with guidance; (2) generate useful tests to detect erroneous behaviors and facilitate the DNN model quality evaluation; (3) accurately capture potential defects during DNN quantization for platform migration.
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions
Convolutional neural networks (CNNs) have shown great capability of solving various artificial intelligence tasks. However, the increasing model size has raised challenges in employing them in resource-limited applications. In this work, we propose to compress deep models by using channel-wise convolutions, which re- place dense connections among feature maps with sparse ones in CNNs. Based on this novel operation, we build light-weight CNNs known as ChannelNets. Channel- Nets use three instances of channel-wise convolutions; namely group channel-wise convolutions, depth-wise separable channel-wise convolutions, and the convolu- tional classification layer. Compared to prior CNNs designed for mobile devices, ChannelNets achieve a significant reduction in terms of the number of parameters and computational cost without loss in accuracy. Notably, our work represents the first attempt to compress the fully-connected classification layer, which usually accounts for about 25% of total parameters in compact CNNs. Experimental results on the ImageNet dataset demonstrate that ChannelNets achieve consistently better performance compared to prior methods.
Embedding Multimodal Relational Data for Knowledge Base Completion
Toward Validation of Textual Information Retrieval Techniques for Software Weaknesses
This paper presents a preliminary validation of common textual information retrieval techniques for mapping unstructured software vulnerability information to distinct software weaknesses. The validation is carried out with a dataset compiled from four software repositories tracked in the Snyk vulnerability database. According to the results, the information retrieval techniques used perform unsatisfactorily compared to regular expression searches. Although the results vary from a repository to another, the preliminary validation presented indicates that explicit referencing of vulnerability and weakness identifiers is preferable for concrete vulnerability tracking. Such referencing allows the use of keyword-based searches, which currently seem to yield more consistent results compared to information retrieval techniques. Further validation work is required for improving the precision of the techniques, however.
Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell
This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing human-elicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and com- pares this to a feature-identification method based on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.
Appendix – Recommended Statistical Significance Tests for NLP Tasks
Statistical significance testing plays an important role when drawing conclusions from experimental results in NLP papers. Particularly, it is a valuable tool when one would like to establish the superiority of one algorithm over another. This appendix complements the guide for testing statistical significance in NLP presented in \cite{dror2018hitchhiker} by proposing valid statistical tests for the common tasks and evaluation measures in the field.
Deep Bilevel Learning
We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent. Our approach is based on the principles of cross-validation, where a validation set is used to limit the model overfitting. We formulate such principles as a bilevel optimization problem. This formulation allows us to define the optimization of a cost on the validation set subject to another optimization on the training set. The overfitting is controlled by introducing weights on each mini-batch in the training set and by choosing their values so that they minimize the error on the validation set. In practice, these weights define mini-batch learning rates in a gradient descent update equation that favor gradients with better generalization capabilities. Because of its simplicity, this approach can be integrated with other regularization methods and training schemes. We evaluate extensively our proposed algorithm on several neural network architectures and datasets, and find that it consistently improves the generalization of the model, especially when labels are noisy.
Recommender Systems with Characterized Social Regularization
Social recommendation, which utilizes social relations to enhance recommender systems, has been gaining increasing attention recently with the rapid development of online social network. Existing social recommendation methods are based on the fact that users preference or decision is influenced by their social friends’ behaviors. However, they assume that the influences of social relation are always the same, which violates the fact that users are likely to share preference on diverse products with different friends. In this paper, we present a novel CSR (short for Characterized Social Regularization) model by designing a universal regularization term for modeling variable social influence. Our proposed model can be applied to both explicit and implicit iteration. Extensive experiments on a real-world dataset demonstrate that CSR significantly outperforms state-of-the-art social recommendation methods.
Merging datasets through deep learning
Merging datasets is a key operation for data analytics. A frequent requirement for merging is joining across columns that have different surface forms for the same entity (e.g., the name of a person might be represented as ‘Douglas Adams’ or ‘Adams, Douglas’). Similarly, ontology alignment can require recognizing distinct surface forms of the same entity, especially when ontologies are independently developed. However, data management systems are currently limited to performing merges based on string equality, or at best using string similarity. We propose an approach to performing merges based on deep learning models. Our approach depends on (a) creating a deep learning model that maps surface forms of an entity into a set of vectors such that alternate forms for the same entity are closest in vector space, (b) indexing these vectors using a nearest neighbors algorithm to find the forms that can be potentially joined together. To build these models, we had to adapt techniques from metric learning due to the characteristics of the data; specifically we describe novel sample selection techniques and loss functions that work for this problem. To evaluate our approach, we used Wikidata as ground truth and built models from datasets with approximately 1.1M people’s names (200K identities) and 130K company names (70K identities). We developed models that allow for joins with precision@1 of .75-.81 and recall of .74-.81. We make the models available for aligning people or companies across multiple datasets.
Anomaly Detection in the Presence of Missing Values
Standard methods for anomaly detection assume that all features are observed at both learning time and prediction time. Such methods cannot process data containing missing values. This paper studies five strategies for handling missing values in test queries: (a) mean imputation, (b) MAP imputation, (c) reduction (reduced-dimension anomaly detectors via feature bagging), (d) marginalization (for density estimators only), and (e) proportional distribution (for tree-based methods only). Our analysis suggests that MAP imputation and proportional distribution should give better results than mean imputation, reduction, and marginalization. These hypotheses are largely confirmed by experimental studies on synthetic data and on anomaly detection benchmark data sets using the Isolation Forest (IF), LODA, and EGMM anomaly detection algorithms. However, marginalization worked surprisingly well for EGMM, and there are exceptions where reduction works well on some benchmark problems. We recommend proportional distribution for IF, MAP imputation for LODA, and marginalization for EGMM.
Determining the Dependence Structure of Multivariate Extremes
In multivariate extreme value analysis, the nature of the extremal dependence between variables should be considered when selecting appropriate statistical models. Interest often lies with determining which subsets of variables can take their largest values simultaneously, while the others are of smaller order. Our approach is based on exploiting hidden regular variation properties on a collection of non-standard cones. This provides a new set of indices that reveal aspects of the extremal dependence structure not available through any existing measures of dependence. We derive theoretical properties of these indices, demonstrate their value through a series of examples, and develop methods of inference which also estimate the proportion of extremal mass associated with each cone. We consider two inferential approaches: in the first, we approximate the cones via a truncation of the variables; the second involves partitioning the simplex associated with their radial-angular components. We apply the methods to UK river flows, estimating the probabilities of different subsets of sites being simultaneously large.
A Scalable Strategy for the Identification of Latent-variable Graphical Models
In this paper we propose an identification method for latent-variable graphical models associated to autoregressive (AR) Gaussian stationary processes. The identification procedure exploits the approximation of AR processes through stationary reciprocal processes thus benefiting of the numerical advantages of dealing with block-circulant matrices. These advantages become more and more significant as the order of the process gets large. We show how the identification can be cast in a regularized convex program and we present numerical examples that compares the performances of the proposed method with the existing ones.
• A note on eigenvalues and Hamiltoinan properties of $k$-connected graphs• Small-signal Stability Analysis and Performance Evaluation of Microgrids under Distributed Control• Real-time network traffic signal control for emission reduction based on nonlinear decision rule• Maximizing net income of the auction waterfall with an abort decision tree• Collaborative Artificial Intelligence (AI) for User-Cell association in Ultra-Dense Cellular Systems• Sidorenko’s conjecture for blow-ups• Atomic decomposition of characters and crystals• An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN• Developing a Purely Visual Based Obstacle Detection using Inverse Perspective Mapping• A Framework for Robust Assimilation of Potentially Malign Third-Party Data, and its Statistical Meaning• Unsupervised Statistical Machine Translation• Solving Non-smooth Constrained Programs with Lower Complexity than $\mathcal{O}(1/\varepsilon)$: A Primal-Dual Homotopy Smoothing Approach• Two-sample aggregate data meta-analysis of medians• Digital Urban Sensing: A Multi-layered Approach• BOLD5000: A public fMRI dataset of 5000 images• Routing for Traffic Networks with Mixed Autonomy• Heavy Bernoulli-percolation clusters are indistinguishable• Learning Concept Abstractness Using Weak Supervision• FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media• Automatic differentiation for error analysis of Monte Carlo data• An Online Updating Approach for Testing the Proportional Hazards Assumption with Streams of Big Survival Data• Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory• Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations• BPE and CharCNNs for Translation of Morphology: \ A Cross-Lingual Comparison and Analysis• Magic-State Functional Units: Mapping and Scheduling Multi-Level Distillation Circuits for Fault-Tolerant Quantum Architectures• Secure Transmit Antenna Selection Protocol for MIMO NOMA Networks over Nakagami-m Channels• Learning User Preferences and Understanding Calendar Contexts for Event Scheduling• Robust estimations for the tail index of Weibull-type distribution• Reconstruction and Registration of Large-Scale Medical Scene Using Point Clouds Data from Different Modalities• Cross validation residuals for generalised least squares and other correlated data models• Preferential sampling for presence/absence data and for fusion of presence/absence data with presence-only data• Joint Trajectory and Resource Allocation Design for UAV Communication Systems• An Efficient Framework for Concurrent Execution of Smart Contracts• RNNs as psycholinguistic subjects: Syntactic state and grammatical dependency• Neural MultiVoice Models for Expressing Novel Personalities in Dialog• Multi-agent Economics and the Emergence of Critical Markets• Distributed-Memory Forest-of-Octrees Raycasting• Multilinear processes in Banach space• Localizing Moments in Video with Temporal Language• Stack-Sorting, Set Partitions, and Lassalle’s Sequence• Proximal-Free ADMM for Decentralized Composite Optimization via Graph Simplification• Retinal Vessel Segmentation under Extreme Low Annotation: A Generative Adversarial Network Approach• A completion of the proof of the Edge-statistics Conjecture• IKA: Independent Kernel Approximator• Semantic Human Matting• Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference• A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation• FlipTracker: Understanding Natural Error Resilience in HPC Applications• Semiparametric model averaging for high dimensional conditional quantile prediction• Towards a Better Match in Siamese Network Based Visual Object Tracker• Towards quantitative methods to assess network generative models• Notes on the Ogawa integrability and a condition for convergence in the multidimensional case• Temporally Coherent Video Harmonization Using Adversarial Networks• The Lecture Hall Cone as a toric deformation• Anytime Hedge achieves optimal regret in the stochastic regime• Counting Consecutive Pattern Matches in $\mathcal{S}_n(132)$ and $\mathcal{S}_n(123)$• Image Manipulation with Perceptual Discriminators• Power Flow Analysis Using Graph based Combination of Iterative Methods and Vertex Contraction Approach• Heterogeneous Non-Orthogonal Multiple Access for Ultra-Reliable and Broadband Services in Multi-Cell Fog-RAN• Noncrossing Partitions, Tamari Lattices, and Parabolic Quotients of the Symmetric Group• Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition• Almost arithmetic progressions in the primes and other large sets• Generating Highly Realistic Images of Skin Lesions with GANs• Conditional predictive inference for high-dimensional stable algorithms• Exploration of Bi-Level PageRank Algorithm for Power Flow Analysis Using Graph Database• An Elementary Proof of a Classical Information-Theoretic Formula• Intelligent Reflecting Surface Enhanced Wireless Network: Joint Active and Passive Beamforming Design• Averaging principle for a class of stochastic differential equations• Repetition avoidance in products of factors• Wireless Powered User Cooperative Computation in Mobile Edge Computing Systems• Pre-training on high-resource speech recognition improves low-resource speech-to-text translation• Modeling the EM Field Distribution within a Computer Chip Package• Stellar Cluster Detection using GMM with Deep Variational Autoencoder• Modified Diversity of Class Probability Estimation Co-training for Hyperspectral Image Classification• How is Contrast Encoded in Deep Neural Networks?• Data Augmentation for Skin Lesion Analysis• On Clique Coverings of Complete Multipartite Graphs• Conditional Transfer with Dense Residual Attention: Synthesizing traffic signs from street-view imagery• Resonant synchronization and information retrieve from memorized Kuramoto network• Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit• Sentylic at IEST 2018: Gated Recurrent Neural Network and Capsule Network Based Approach for Implicit Emotion Detection• Bregman divergences based on optimal design criteria and simplicial measures of dispersion• Blur-Countering Keypoint Detection via Eigenvalue Asymmetry• L1-regularization for multi-period portfolio selection• Stochastic approximation on non-compact measure spaces and application to measure-valued Pólya processes• Theoretical analysis and propositions for ‘ontology citation’• On uniqueness in Steiner problem• Portfolio diversification and model uncertainty: a robust dynamic mean-variance approach• Nearly-linear monotone paths in edge-ordered graphs• Blind Community Detection from Low-rank Excitations of a Graph Filter• On Secure Mixed RF-FSO Systems With TAS and Imperfect CSI• Deep Reinforcement Learning in High Frequency Trading• Bounds on the Error Probability of Raptor Codes under Maximum Likelihood Decoding• Parallel numerical method for nonlocal-in-time Schrödinger equation• Modeling human intuitions about liquid flow with particle-based simulation• Computing the Difficulty of Critical Bootstrap Percolation Models is NP-hard• Copenhagen at CoNLL–SIGMORPHON 2018: Multilingual Inflection in Context with Explicit Morphosyntactic Decoding• CNNs-based Acoustic Scene Classification using Multi-Spectrogram Fusion and Label Expansions• Self-Organised Criticality and Emergent Hyperbolic Networks — Blueprint for Complexity in Social Dynamics• Learning Context-Sensitive Time-Decay Attention for Role-Based Dialogue Modeling• Reinforcement Learning under Threats• Traffic Density Estimation using a Convolutional Neural Network• Deep Depth from Defocus: how can defocus blur improve 3D estimation using dense neural networks?• Adjoint Power Flow Analysis for Evaluating Feasibility• Knowledge Integrated Classifier Design Based on Utility Optimization• Chvátal’s Conjecture Holds for Ground Sets of Seven Elements• Stance Prediction for Russian: Data and Analysis• Document-Level Neural Machine Translation with Hierarchical Attention Networks• Modelling Point Spread Function in Fluorescence Microscopy with a Sparse Combination of Gaussian Mixture: Trade-off between Accuracy and Efficiency• Classification Algorithms for Semi-Blind Uplink/Downlink Decoupling in sub-6 GHz/mmWave 5G Networks• GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation• Learning Paths from Signature Tensors• A Bayesian framework for the analog reconstruction of kymographs from fluorescence microscopy data• Anomalous diffusion in one and two dimensional combs• Massive MIMO Channel Estimation for Millimeter Wave Systems via Matrix Completion• Bimodal network architectures for automatic generation of image annotation from text• Barycenters of points in polytope skeleta• Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models• Ranking RDF Instances in Degree-decoupled RDF Graphs• Gene Shaving using influence function of a kernel method• Reciprocity-based cooperative phalanx maintained by overconfident players• Online local pool generation for dynamic classifier selection: an extended version• Efficient Egocentric Visual Perception Combining Eye-tracking, a Software Retina and Deep Learning• Some results relating Kolmogorov complexity and entropy of amenable group actions• Efficient Difference-in-Differences Estimation with High-Dimensional Common Trend Confounding• A Quantitative Approach to Understanding Online Antisemitism• The effects of inhibitory neuron fraction on the dynamics of an avalanching neural network• DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency
Like this:
Like Loading…
Related