Whats new on arXiv

Future-Prediction-Based Model for Neural Machine Translation

We propose a novel model for Neural Machine Translation (NMT). Different from the conventional method, our model can predict the future text length and words at each decoding time step so that the generation can be helped with the information from the future prediction. With such information, the model does not stop generation without having translated enough content. Experimental results demonstrate that our model can significantly outperform the baseline models. Besides, our analysis reflects that our model is effective in the prediction of the length and words of the untranslated content.

Cold-start recommendations in Collective Matrix Factorization

This work explores the ability of collective matrix factorization models in recommender systems to make predictions about users and items for which there is side information available but no feedback or interactions data, and proposes a new formulation with a faster cold-start prediction formula that can be used in real-time systems. While these cold-start recommendations are not as good as warm-start ones, they were found to be of better quality than non-personalized recommendations, and predictions about new users were found to be more reliable than those about new items. The formulation proposed here resulted in improved cold-start recommendations in many scenarios, at the expense of worse warm-start ones.

On overcoming the Curse of Dimensionality in Neural Networks

On Some Integral Means

Harmonic, Geometric, Arithmetic, Heronian and Contraharmonic means have been studied by many mathematicians. In 2003, H. Evens studied these means from geometrical point of view and established some of the inequalities between them in using a circle and its radius. In 1961, E. Beckenback and R. Bellman introduced several inequalities corresponding to means. In this paper, we will introduce the concept of mean functions and integral means and give bounds on some of these mean functions and integral means.

Zero-shot User Intent Detection via Capsule Neural Networks

User intent detection plays a critical role in question-answering and dialog systems. Most previous works treat intent detection as a classification problem where utterances are labeled with predefined intents. However, it is labor-intensive and time-consuming to label users’ utterances as intents are diversely expressed and novel intents will continually be involved. Instead, we study the zero-shot intent detection problem, which aims to detect emerging user intents where no labeled utterances are currently available. We propose two capsule-based architectures: INTENT-CAPSNET that extracts semantic features from utterances and aggregates them to discriminate existing intents, and INTENTCAPSNET-ZSL which gives INTENTCAPSNET the zero-shot learning ability to discriminate emerging intents via knowledge transfer from existing intents. Experiments on two real-world datasets show that our model not only can better discriminate diversely expressed existing intents, but is also able to discriminate emerging intents when no labeled utterances are available.

Query Log Compression for Workload Analytics

Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmark development, and many other database administration tasks. Unfortunately, it is common for production databases to deal with millions or even more queries each day, so these logs must be summarized before they can be used. Designing an appropriate summary encoding requires trading off between conciseness and information content. For example: simple workload sampling may miss rare, but high impact queries. In this paper, we present LogR, a lossy log compression scheme suitable use for many automated log analytics tools, as well as for human inspection. We formalize and analyze the space/fidelity trade-off in the context of a broader family of ‘pattern’ and ‘pattern mixture’ log encodings to which LogR belongs. We show through a series of experiments that LogR compressed encodings can be created efficiently, come with provable information-theoretic bounds on their accuracy, and outperform state-of-art log summarization strategies.

DeFactoNLP: Fact Verification using Entity Recognition, TFIDF Vector Comparison and Decomposable Attention

In this paper, we describe DeFactoNLP, the system we designed for the FEVER 2018 Shared Task. The aim of this task was to conceive a system that can not only automatically assess the veracity of a claim but also retrieve evidence supporting this assessment from Wikipedia. In our approach, the Wikipedia documents whose Term Frequency-Inverse Document Frequency (TFIDF) vectors are most similar to the vector of the claim and those documents whose names are similar to those of the named entities (NEs) mentioned in the claim are identified as the documents which might contain evidence. The sentences in these documents are then supplied to a textual entailment recognition module. This module calculates the probability of each sentence supporting the claim, contradicting the claim or not providing any relevant information to assess the veracity of the claim. Various features computed using these probabilities are finally used by a Random Forest classifier to determine the overall truthfulness of the claim. The sentences which support this classification are returned as evidence. Our approach achieved a 0.4277 evidence F1-score, a 0.5136 label accuracy and a 0.3833 FEVER score.

Multilingual Clustering of Streaming News

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art results on datasets in German, English and Spanish.

From Bayesian Inference to Logical Bayesian Inference: A New Mathematical Frame for Semantic Communication and Machine Learning

Bayesian Inference (BI) uses the Bayes’ posterior whereas Logical Bayesian Inference (LBI) uses the truth function or membership function as the inference tool. LBI was proposed because BI was not compatible with the classical Bayes’ prediction and didn’t use logical probability and hence couldn’t express semantic meaning. In LBI, statistical probability and logical probability are strictly distinguished, used at the same time, and linked by the third kind of Bayes’ Theorem. The Shannon channel consists of a set of transition probability functions whereas the semantic channel consists of a set of truth functions. When a sample is large enough, we can directly derive the semantic channel from Shannon’s channel. Otherwise, we can use parameters to construct truth functions and use the Maximum Semantic Information (MSI) criterion to optimize the truth functions. The MSI criterion is equivalent to the Maximum Likelihood (ML) criterion, and compatible with the Regularized Least Square (RLS) criterion. By matching the two channels one with another, we can obtain the Channels’ Matching (CM) algorithm. This algorithm can improve multi-label classifications, maximum likelihood estimations (including unseen instance classifications), and mixture models. In comparison with BI, LBI 1) uses the prior P(X) of X instead of that of Y or {\theta} and fits cases where the source P(X) changes, 2) can be used to solve the denotations of labels, and 3) is more compatible with the classical Bayes’ prediction and likelihood method. LBI also provides a confirmation measure between -1 and 1 for induction.

ViewpointS: towards a Collective Brain

Tracing knowledge acquisition and linking learning events to interaction between peers is a major challenge of our times. We have conceived, designed and evaluated a new paradigm for constructing and using collective knowledge by Web interactions that we called ViewpointS. By exploiting the similarity with Edelman’s Theory of Neuronal Group Selection (TNGS), we conjecture that it may be metaphorically considered a Collective Brain, especially effective in the case of trans-disciplinary representations. Far from being without doubts, in the paper we present the reasons (and the limits) of our proposal that aims to become a useful integrating tool for future quantitative explorations of individual as well as collective learning at different degrees of granu-larity. We are therefore challenging each of the current approaches: the logical one in the semantic Web, the statistical one in mining and deep learning, the social one in recommender systems based on authority and trust; not in each of their own preferred field of operation, rather in their integration weaknesses far from the holistic and dynamic behavior of the human brain.

Have You Stolen My Model? Evasion Attacks Against Deep Neural Network Watermarking Techniques

Deep neural networks have had enormous impact on various domains of computer science, considerably outperforming previous state of the art machine learning techniques. To achieve this performance, neural networks need large quantities of data and huge computational resources, which heavily increases their construction costs. The increased cost of building a good deep neural network model gives rise to a need for protecting this investment from potential copyright infringements. Legitimate owners of a machine learning model want to be able to reliably track and detect a malicious adversary that tries to steal the intellectual property related to the model. Recently, this problem was tackled by introducing in deep neural networks the concept of watermarking, which allows a legitimate owner to embed some secret information(watermark) in a given model. The watermark allows the legitimate owner to detect copyright infringements of his model. This paper focuses on verifying the robustness and reliability of state-of- the-art deep neural network watermarking schemes. We show that, a malicious adversary, even in scenarios where the watermark is difficult to remove, can still evade the verification by the legitimate owners, thus avoiding the detection of model theft.

Sensitivity Analysis with Manifolds

The course of dimensionality is a common problem in statistics and data analysis. Variable sensitivity analysis methods are a well studied and established set of tools designed to overcome these sorts of problems. However, as this work shows, these methods fail to capture relevant features and patterns hidden within the geometry of the enveloping manifold projected into a variable. We propose a sensitivity index that captures and reflects the relevance of distinct variables within a model by focusing at the geometry of their projections.

• Enhancing Stock Market Prediction with Extended Coupled Hidden Markov Model over Multi-Sourced Data• A Datamining Approach for Emotions Extraction and Discovering Cricketers performance from Stadium to Sensex• Secure transmission with covert requirement in untrusted relaying networks• Exploring Gap Filling as a Cheaper Alternative to Reading Comprehension Questionnaires when Evaluating Machine Translation for Gisting• On a generalization of the Pentagonal Number Theorem• Network Alignment by Discrete Ollivier-Ricci Flow• Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet• World influence and interactions of universities from Wikipedia networks• Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition• Chittron: An Automatic Bangla Image Captioning System• Identifying Land Patterns from Satellite Imagery in Amazon Rainforest using Deep Learning• Towards an Intelligent Edge: Wireless Communication Meets Machine Learning• Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations• IntentsKB: A Knowledge Base of Entity-Oriented Search Intents• A fast Metropolis-Hastings method for generating random correlation matrices• Trivial Transfer Learning for Low-Resource Neural Machine Translation• Sequential Detection of Regime Changes in Neural Data• Natural Language Person Search Using Deep Reinforcement Learning• Momentum Model-based Minimal Parameter Identification of a Space Robot• RASSA: Resistive Accelerator for Approximate Long Read DNA Mapping• Neural Ranking Models for Temporal Dependency Structure Parsing• Exact results for the extreme Thouless effect in a model of network dynamics• Neural Character-based Composition Models for Abuse Detection• Multitask Learning for Fundamental Frequency Estimation in Music• The global rate of convergence for optimal tensor methods in smooth convex optimization• A Study of Dynamic Multipath Clusters at 60 GHz in a Large Indoor Environment• MTNT: A Testbed for Machine Translation of Noisy Text• Direct coupling coherent quantum observers with discounted mean square performance criteria and penalized back-action• Mining Frequent Patterns in Evolving Graphs• Visual Transfer between Atari Games using Competitive Reinforcement Learning• Flexible sensitivity analysis for observational studies without observable implications• Resource Allocation for a Wireless Powered Integrated Radar and Communication System• On the Role of Event Boundaries in Egocentric Activity Recognition from Photostreams• Asymptotically Pseudo-Free Matrices• Global Network Prediction from Local Node Dynamics• Modeling Topical Coherence in Discourse without Supervision• Asymptotically Independent U-Statistics in High-Dimensional Testing• The asymptotic normality of $(s,s+1)$-cores with distinct parts• Hypernyms Through Intra-Article Organization in Wikipedia• Parametric Furstenberg Theorem on Random Products of $SL(2, \mathbb{R})$ matrices• Network-Decomposed Hierarchical Cooperation in Ad Hoc Networks With Social Relationships• Programmable Memristive Threshold Logic Gate Array• Network estimation via graphon with node features• Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition• Two-level Transmission Scheme for Cache-enabled Fog Radio Access Networks• Data Augmentation for Neural Online Chat Response Selection• On the regularity of the stochastic heat equation on polygonal domains in $R^2$• Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks• Impact of Secondary User Interference on Primary Network in Cognitive Radio Systems• Structured Quasi-Newton Methods for Optimization with Orthogonality Constraints• GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search• Delocalisation of one-dimensional marginals of product measures and the capacity of LTI discrete channels• YouTube-VOS: Sequence-to-Sequence Video Object Segmentation• Shrinkage for Covariance Estimation: Asymptotics, Confidence Intervals, Bounds and Applications in Sensor Monitoring and Finance• Improved bounds for the extremal number of subdivisions• Minimal Soft Lattice Theta Functions• Sea Clutter Distribution Modeling: A Kernel Density Estimation Approach• Image Segmentation with Pseudo-marginal MCMC Sampling and Nonparametric Shape Priors• Prediction of Electric Multiple Unit Fleet Size Based on Convolutional Neural Network• Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web• LRS3-TED: a large-scale dataset for visual speech recognition• Performance evaluation through DEA benchmarking adjusted to goals• Transition from asynchronous to oscillatory dynamics in balanced spiking networks with instantaneous synapses• Flatland: a Lightweight First-Person 2-D Environment for Reinforcement Learning• Community detection analysis in wind speed-monitoring systems using mutual information-based complex network• A Bilevel Framework for Optimal Price-Setting of Time-and-Level-of-Use Tariffs• Sojourn time dimensions of fractional Brownian motion• Clique-partitioned graphs• Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification• Controlled Loewner-Kufarev Equation Embedded into the Universal Grassmannian• Tensor Networks for Latent Variable Analysis: Higher Order Canonical Polyadic Decomposition• Modulus of Continuity of Controlled Loewner-Kufarev Equations• Crowdsourcing Semantic Label Propagation in Relation Classification• Machine learning for predicting thermal power consumption of the Mars Express Spacecraft• Learning Vision-based Cohesive Flight in Drone Swarms• A Hierarchical Framework for Correcting Under-Reporting in Count Data• Resource constrained shortest path algorithm for EDF short-term thermal production planning problem• Emergence of Communication in an Interactive World with Consistent Speakers• Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence• YAC: BFT Consensus Algorithm for Blockchain• Excursions of a spectrally negative Lévy process from a two-point set• End-to-End Argument Mining for Discussion Threads Based on Parallel Constrained Pointer Architecture• PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks• Entanglement-assisted quantum codes from Galois LCD codes• Multi-clusters in adaptive networks• A note on gamma triangles and local gamma vectors• Application of DenseNet in Camera Model Identification and Post-processing Detection• Numerical experiments with multistep model-predictive control approaches and sensitivity updates for the tracking control of cars• Data-to-Text Generation with Content Selection and Planning• The multidimensional truncated Moment Problem: The Moment Cone• The Complexity Landscape of Decompositional Parameters for ILP• Optical Flow Super-Resolution Based on Image Guidence Using Convolutional Neural Network• Affordance Extraction and Inference based on Semantic Role Labeling• IoU is not submodular• Adversarial Attack Type I: Generating False Positives• Optimization Design of Decentralized Control for Complex Decentralized Systems• Ubiquity in graphs II: Ubiquity of graphs with non-linear end structure• Image computing for fibre-bundle endomicroscopy: A review• Incremental approaches to updating attribute reducts when refining and coarsening coverings• Dynamics of a large system of spiking neurons with synaptic delay• UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification• Endorsements on Social Media: An Empirical Study of Affiliate Marketing Disclosures on YouTube and Pinterest• Separations of sets• Channel Characterization for Chip-scale Wireless Communications within Computing Packages• Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy• Typed Linear Algebra for Efficient Analytical Querying• Convex optimization using quantum oracles• Learning Saliency Prediction From Sparse Fixation Pixel Map• Detail Preserving Depth Estimation from a Single Image Using Attention Guided Networks• Automatic Event Salience Identification• Minimum Description Length codes are critical• Towards Dynamic Computation Graphs via Sparse Latent Structure• Exploring the Landscape of Relational Syllogistic Logics• Stability of partial Fourier matrices with clustered nodes• A minimal counterexample to a strengthening of Perles’ conjecture• Context-Patch Face Hallucination Based on Thresholding Locality-constrained Representation and Reproducing Learning• A3Net: Adversarial-and-Attention Network for Machine Reading Comprehension• Learned Cardinalities: Estimating Correlated Joins with Deep Learning• Cobham’s Theorem and Automaticity• Diverse and Coherent Paragraph Generation from Images• Proper likelihood ratio based ROC curves for general binary classification problems• Convolutional Neural Networkfor Trajectory Prediction• Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction• An algorithm for approximating subactions• A Minimum Discounted Reward Hamilton-Jacobi Formulation for Computing Reachable Sets• A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks• A Method for Distributed Transactive Control in Power Systems based on the Projected Consensus Algorithm• InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset• NTUA-SLP at IEST 2018: Ensemble of Neural Transfer Methods for Implicit Emotion Classification• Cambrian triangulations and their tropical realizations• A CNN Accelerator on FPGA Using Depthwise Separable Convolution• Estimating Small Differences in Car-Pose from Orbits

Like this:

Like Loading…

Related