Neural Task Planning with And-Or Graph Representations
This paper focuses on semantic task planning, i.e., predicting a sequence of actions toward accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The primary challenges are how to model task-specific knowledge and how to integrate this knowledge into the learning procedure. In this work, we propose training a recurrent long short-term memory (LSTM) network to address this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network generally requires large numbers of annotated samples to cover the semantic space (e.g., diverse action decomposition and ordering). To overcome this issue, we introduce a knowledge and-or graph (AOG) for task description, which hierarchically represents a task as atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according to common sense) by training another auxiliary LSTM network with a small set of annotated samples. Furthermore, these generated samples (i.e., task-oriented action sequences) effectively facilitate training of the model for semantic task planning. In our experiments, we create a new dataset that contains diverse daily tasks and extensively evaluate the effectiveness of our approach.
Adversarial Feature Learning of Online Monitoring Data for Operation Reliability Assessment in Distribution Network
Choosing How to Choose Papers
Data Poisoning Attacks against Online Learning
We consider data poisoning attacks, a class of adversarial attacks on machine learning where an adversary has the power to alter a small fraction of the training data in order to make the trained classifier satisfy certain objectives. While there has been much prior work on data poisoning, most of it is in the offline setting, and attacks for online learning, where training data arrives in a streaming manner, are not well understood. In this work, we initiate a systematic investigation of data poisoning attacks for online learning. We formalize the problem into two settings, and we propose a general attack strategy, formulated as an optimization problem, that applies to both with some modifications. We propose three solution strategies, and perform extensive experimental evaluation. Finally, we discuss the implications of our findings for building successful defenses.
Term Set Expansion based NLP Architect by Intel AI Lab
Pyramidal Recurrent Unit for Language Modeling
Distributionally Robust Distribution Network Configuration Under Random Contingency
Topology design is a critical task for the reliability, economic operation, and resilience of distribution systems. This paper proposes a distributionally robust optimization (DRO) model for designing the topology of a new distribution system facing random contingencies (e.g., imposed by natural disasters). The proposed DRO model optimally configures the network topology and integrates distributed generation to effectively meet the loads. Moreover, we take into account the uncertainty of contingency. Using the moment information of distribution line failures, we construct an ambiguity set of the contingency probability distribution, and minimize the expected amount of load shedding with regard to the worst-case distribution within the ambiguity set. As compared with a classical robust optimization model, the DRO model explicitly considers the contingency uncertainty and so provides a less conservative configuration, yielding a better out-of-sample performance. We recast the proposed model to facilitate the column-and-constraint generation algorithm. We demonstrate the out-of-sample performance of the proposed approach in numerical case studies.
One-Shot Relational Learning for Knowledge Graphs
Knowledge graphs (KGs) are the key components of various natural language processing applications. To further expand KGs’ coverage, previous studies on knowledge graph completion usually require a large number of training instances for each relation. However, we observe that long-tail relations are actually more common in KGs and those newly added relations often do not have many known triples for training. In this work, we aim at predicting new facts under a challenging setting where only one training instance is available. We propose a one-shot relational learning framework, which utilizes the knowledge extracted by embedding models and learns a matching metric by considering both the learned embeddings and one-hop graph structures. Empirically, our model yields considerable performance improvements over existing embedding models, and also eliminates the need of re-training the embedding models when dealing with newly added relations.
N-ary Relation Extraction using Graph State LSTM
MIaS: Math-Aware Retrieval in Digital Mathematical Libraries
Digital mathematical libraries (DMLs) such as arXiv, Numdam, and EuDML contain mainly documents from STEM fields, where mathematical formulae are often more important than text for understanding. Conventional information retrieval (IR) systems are unable to represent formulae and they are therefore ill-suited for math information retrieval (MIR). To fill the gap, we have developed, and open-sourced the MIaS MIR system. MIaS is based on the full-text search engine Apache Lucene. On top of text retrieval, MIaS also incorporates a set of tools for preprocessing mathematical formulae. We describe the design of the system and present speed, and quality evaluation results. We show that MIaS is both efficient, and effective, as evidenced by our victory in the NTCIR-11 Math-2 task.
Convolutional Neural Networks with Recurrent Neural Filters
We introduce a class of convolutional neural networks (CNNs) that utilize recurrent neural networks (RNNs) as convolution filters. A convolution filter is typically implemented as a linear affine transformation followed by a non-linear function, which fails to account for language compositionality. As a result, it limits the use of high-order filters that are often warranted for natural language processing tasks. In this work, we model convolution filters with RNNs that naturally capture compositionality and long-term dependencies in language. We show that simple CNN architectures equipped with recurrent neural filters (RNFs) achieve results that are on par with the best published ones on the Stanford Sentiment Treebank and two answer sentence selection datasets.
Automated Query Expansion using High Dimensional Clustering
The exponential growth of information on the Internet has created a big challenge for retrieval systems in terms of yielding relevant results. This challenge requires automatic approaches for reformatting or expanding users’ queries to increase recall. Query expansion (QE), a technique for broadening users’ queries by appending additional tokens or phrases bases on semantic similarity metrics, plays a crucial role in overcoming this challenge. However, such a procedure increases computational complexity and may lead to unwanted noise in information retrieval. This paper attempts to push the state of the art of QE by developing an automated technique using high dimensional clustering of word vectors to create effective expansions with reduced noise. We implemented a command line tool, named Xu, and evaluated its performance against a dataset of news articles, concluding that on average, expansions generated using this technique outperform those generated by previous approaches, and the base user query.
Matrix Factorization Equals Efficient Co-occurrence Representation
Matrix factorization is a simple and effective solution to the recommendation problem. It has been extensively employed in the industry and has attracted much attention from the academia. However, it is unclear what the low-dimensional matrices represent. We show that matrix factorization can actually be seen as simultaneously calculating the eigenvectors of the user-user and item-item sample co-occurrence matrices. We then use insights from random matrix theory (RMT) to show that picking the top eigenvectors corresponds to removing sampling noise from user/item co-occurrence matrices. Therefore, the low-dimension matrices represent a reduced noise user and item co-occurrence space. We also analyze the structure of the top eigenvector and show that it corresponds to global effects and removing it results in less popular items being recommended. This increases the diversity of the items recommended without affecting the accuracy.
Temporal Information Extraction by Predicting Relative Time-lines
The current leading paradigm for temporal information extraction from text consists of three phases: (1) recognition of events and temporal expressions, (2) recognition of temporal relations among them, and (3) time-line construction from the temporal relations. In contrast to the first two phases, the last phase, time-line construction, received little attention and is the focus of this work. In this paper, we propose a new method to construct a linear time-line from a set of (extracted) temporal relations. But more importantly, we propose a novel paradigm in which we directly predict start and end-points for events from the text, constituting a time-line without going through the intermediate step of prediction of temporal relations as in earlier work. Within this paradigm, we propose two models that predict in linear complexity, and a new training loss using TimeML-style annotations, yielding promising results.
• Scheduling a Rescue• An Issue in the Martingale Analysis of the Influence Maximization Algorithm IMM• Guiding Deep Learning System Testing using Surprise Adequacy• Techniques for Cooperative Cognitive Radio Networks• On the isometric path partition problem• A Summary Description of the A2RD Project• Water Disaggregation via Shape Features based Bayesian Discriminative Sparse Coding• A regularity criterion for a 3D chemo-repulsion system and its application to a bilinear optimal control problem• Deep Learning of Vortex Induced Vibrations• Creating a surrogate commuter network from Australian Bureau of Statistics census data• Stereo 3D Object Trajectory Reconstruction• Behavior Trees as a Representation for Medical Procedures• Models for Predicting Community-Specific Interest in News Articles• Analysis and optimal individual pitch control decoupling by inclusion of an azimuth offset in the multi-blade coordinate transformation• Combining Predictions of Auto Insurance Claims• The Behrens-Fisher Problem with Covariates and Baseline Adjustments• Large Margin Neural Language Model• Max-Min and Min-Max universally yield Gumbel• Open Set Chinese Character Recognition using Multi-typed Attributes• Tests for price indices in a dynamic item universe• A Size Condition for Diameter Two Orientable Graphs• Harnessing Historical Corrections to build Test Collections for Named Entity Disambiguation• COFGA: Classification Of Fine-Grained Features In Aerial Images• Downstream Effects of Affirmative Action• Modeling and Simulation of Spark Streaming• Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation• Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures• Natural Language Generation with Neural Variational Models• Review Helpfulness Assessment based on Convolutional Neural Network• On the Fundamental Limit of Private Information Retrieval for Coded Distributed Storage• Real-time Pedestrian Detection Approach with an Efficient Data Communication Bandwidth Strategy• Optimal Grid Drawings of Complete Multipartite Graphs and an Integer Variant of the Algebraic Connectivity• Targeted Syntactic Evaluation of Language Models• An upper bound on the number of self-avoiding polygons via joining• Importance Weighting and Varational Inference• ParsRec: Meta-Learning Recommendations for Bibliographic Reference Parsing• Volatility in the Issue Attention Economy• Adversarial Decomposition of Text Representation• Single Shot Scene Text Retrieval• A Hybrid Alternative to Gibbs Sampling for Bayesian Latent Variable Models• Hybrid Processing Design for Multipair Massive MIMO Relaying with Channel Spatial Correlation• Parameter sharing between dependency parsers for related languages• Quantum enhanced cross-validation for near-optimal neural networks architecture selection• Computing Weakly Reversible Deficiency Zero Network Translations Using Elementary Flux Modes• An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing• Cognitive Consistency Routing Algorithm of Capsule-network• Joint distribution of Busemann functions for the exactly solvable corner growth model• Closed-Form Word Error Rate Analysis for Successive Interference Cancellation Decoders• A note on the local weak limit of a sequence of expander graphs• Evaluating the Utility of Hand-crafted Features in Sequence Labelling• A Universal Bijection for Catalan Structures• A Framework for Complementary Companion Character Behavior in Video Games• An improved belief propagation algorithm for detecting meso-scale structure in complex networks• TRINITY: Coordinated Performance, Energy and Temperature Management in 3D Processor-Memory Stacks• Disfluency Detection using a Noisy Channel Model and a Deep Neural Language Model• Disfluency Detection using Auto-Correlational Neural Networks• Localization Guided Learning for Pedestrian Attribute Recognition• SOLAR: Deep Structured Latent Representations for Model-Based Reinforcement Learning• Robust Factor Number Specification for Large-dimensional Factor Model• Asymptotics for scaled Kramers-Smoluchowski equations in several dimensions with general potentials• Unsupervised Learning of Syntactic Structure with Invertible Neural Projections• All You Need is ‘Love’: Evading Hate-speech Detection• On Some Combinatorial Problems in Cographs• Effect of low-temperature annealing on the void-induced microstructure in amorphous silicon: A computational study• WiC: 10,000 Example Pairs for Evaluating Context-Sensitive Representations• Investigating Human + Machine Complementarity for Recidivism Predictions• Analysis of Frequency Agile Radar via Compressed Sensing• A Residual Bootstrap for Conditional Value-at-Risk• National PM2.5 and NO2 Exposure Models for China Based on Land Use Regression, Satellite Measurements, and Universal Kriging• High-confidence error estimates for learned value functions• Multiple Lane Detection Algorithm Based on Optimised Dense Disparity Map Estimation• Random Matrices from Linear Codes and Wigner’s semicircle law• Mapping Natural Language Commands to Web Elements• Directional Pareto efficiency: concepts and optimality conditions• Selection of equilibria in a linear quadratic mean-field game• Noisy Non-Adaptive Group Testing: A (Near-)Definite Defectives Approach• Weighted total variation based convex clustering• Toward Fast and Accurate Neural Discourse Segmentation• On the Continuous Limit of Weak GARCH• MIMO-OFDM Scheme design for Medium Voltage Underground Cables based Power Line Communication• Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representation• Cognitive Action Laws: The Case of Visual Features• Removing out-of-focus blur from a single image• Hirzebruch-type inequalities viewed as tools in combinatorics• Estimating the result of randomized controlled trials without randomization in order to assess the ability of diagnostic tests to predict a treatment outcome• A Multi-channel DART Algorithm• Seven proofs of the Pearson Chi-squared independence test and its graphical interpretation• Humans and Machines can be Jointly Spatially Multiplexed by Massive MIMO• Analysing the potential of seq-to-seq models for incremental interpretation in task-oriented dialogue• What do character-level models learn about morphology? The case of dependency parsing• Semi-implicit Milstein approximation scheme for non-colliding particle systems• Pitman transforms and Brownian motion in the interval viewed as an affine alcove• A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units• Exponential inequality for chaos based on sampling without replacement• Pareto optimization of resonances and minimum-time control• Why Do Neural Response Generation Models Prefer Universal Replies?• Undecidable word problem in subshift automorphism groups• Yang-Mills measure on the two-dimensional torus as a random distribution• Representation Learning for Image-based Music Recommendation• Bayesian model-based spatiotemporal survey design for log-Gaussian Cox process• A Practical Grid-Partitioning Method Considering the Dynamic VAR Response of Power Grid under Contingency Set• DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth• Tails in a fixed-point problem for a branching process with state-independent immigration• DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model• On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook• The dispersion time of random walks on finite graphs• Making \emph{ordinary least squares} linear classfiers more robust• A Note on the Complexity of Manipulating Weighted Schulze Voting• Dirichlet forms and ultrametric Cantor sets associated to higher-rank graphs• Geometric progressions in syndetic sets• Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks• Gallai-Ramsey numbers of odd cycles• Ground state energy of noninteracting fermions with a random energy spectrum• A large-scale regularity theory for the Monge-Ampere equation with rough data and application to the optimal matching problem• Estimating seal pup production in the Greenland Sea using Bayesian hierarchical modeling• The Undirected Optical Indices of Trees• The Sparse Latent Position Model for nonnegative weighted networks• A Set of Conjectured Identities for Stirling Numbers of the First Kind• A Quantum Interior Point Method for LPs and SDPs• Terminal Orientation in OFDM-based LiFi Systems• Distance Based Source Domain Selection for Sentiment Classification• Motorcycle Classification in Urban Scenarios using Convolutional Neural Networks for Feature Extraction• Efficient Cooperative HARQ for Multi-Source Multi-Relay Wireless Networks• Fully Decentralized Massive MIMO Detection Based on Recursive Methods• New covering codes of radius $R$, codimension $tR$ and $tR+\frac{R}{2}$, and saturating sets in projective spaces• Card-660: Cambridge Rare Word Dataset – a Reliable Benchmark for Infrequent Word Representation Models• Immaculate line bundles on toric varieties• How Robust is 3D Human Pose Estimation to Occlusion?• Finding events in temporal networks: Segmentation meets densest-subgraph discovery• Quantifying spatio-temporal boundary condition uncertainty for the North American deglaciation• Bridging Knowledge Gaps in Neural Entailment via Symbolic Models• A Discriminative Latent-Variable Model for Bilingual Lexicon Induction• Active set algorithms for estimating shape-constrained density ratios• Trait-dependent branching particle systems with competition and multiple offspring• Joint Domain Alignment and Discriminative Feature Learning for Unsupervised Deep Domain Adaptation• 3D-Aware Scene Manipulation via Inverse Graphics• Evaluating Theory of Mind in Question Answering• Universal Dependency Parsing with a General Transition-Based DAG Parser• Rational Recurrences• Polar Codes with Integrated Probabilistic Shaping for 5G New Radio• An explicit formula for a weight enumerator of linear-congruence codes• Deriving Machine Attention from Human Rationales• Mean Field Analysis of Neural Networks: A Central Limit Theorem• A Tree-based Decoder for Neural Machine Translation• Inference based on Kotlarski’s Identity• Mining (maximal) span-cores from temporal networks• A transport-based multifidelity preconditioner for Markov chain Monte Carlo• Understanding Back-Translation at Scale• Emergence of Turbulent Epochs in Oil Prices• What Makes Reading Comprehension Questions Easier?• Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies• Classification of Reconfiguration Graphs of Shortest Path Graphs With No Induced $4$-cycles• Learning Discriminative Representation with Signed Laplacian Restricted Boltzmann Machine• MedSTS: A Resource for Clinical Semantic Textual Similarity• Application-Network Collaboration Using SDN for Ultra-Low Delay Teleorchestras• Improving Networked Music Performance Systems Using Application-Network Collaboration• Almost Envy-Free Allocations with Connected Bundles• Implementation Notes for the Soft Cosine Measure• Privacy-preserving Neural Representations of Text• Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data• Shortest Paths with Ordinal Weights• Quantitative Evaluation of Base and Detail Decomposition Filters Based on their Artifacts• The Scientific Prize Network Predicts Who Pushes the Boundaries of Science• Gibbs Phenomenon of Framelet Expansions and Quasi-projection Approximation• Safe 3-coloring of graphs• Identifying Well-formed Natural Language Questions• WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse• Martingale-driven approximations of singular stochastic PDEs• Mesoscopic Eigenvalue Density Correlations of Wigner Matrices• Local law and complete eigenvector delocalization for supercritical Erdös-Rényi graphs• Crossing Probabilities of Multiple Ising Interfaces• Extending weakly polynomial functions from high rank varieties• Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning
Like this:
Like Loading…
Related