A Hierarchical Approach to Neural Context-Aware Modeling
We present a new recurrent neural network topology to enhance state-of-the-art machine learning systems by incorporating a broader context. Our approach overcomes recent limitations with extended narratives through a multi-layered computational approach to generate an abstract context representation. Therefore, the developed system captures the narrative on word-level, sentence-level, and context-level. Through the hierarchical set-up, our proposed model summarizes the most salient information on each level and creates an abstract representation of the extended context. We subsequently use this representation to enhance neural language processing systems on the task of semantic error detection. To show the potential of the newly introduced topology, we compare the approach against a context-agnostic set-up including a standard neural language model and a supervised binary classification network. The performance measures on the error detection task show the advantage of the hierarchical context-aware topologies, improving the baseline by 12.75% relative for unsupervised models and 20.37% relative for supervised models.
MaskConnect: Connectivity Learning by Gradient Descent
Although deep networks have recently emerged as the model of choice for many computer vision problems, in order to yield good results they often require time-consuming architecture search. To combat the complexity of design choices, prior work has adopted the principle of modularized design which consists in defining the network in terms of a composition of topologically identical or similar building blocks (a.k.a. modules). This reduces architecture search to the problem of determining the number of modules to compose and how to connect such modules. Again, for reasons of design complexity and training cost, previous approaches have relied on simple rules of connectivity, e.g., connecting each module to only the immediately preceding module or perhaps to all of the previous ones. Such simple connectivity rules are unlikely to yield the optimal architecture for the given problem. In this work we remove these predefined choices and propose an algorithm to learn the connections between modules in the network. Instead of being chosen a priori by the human designer, the connectivity is learned simultaneously with the weights of the network by optimizing the loss function of the end task using a modified version of gradient descent. We demonstrate our connectivity learning method on the problem of multi-class image classification using two popular architectures: ResNet and ResNeXt. Experiments on four different datasets show that connectivity learning using our approach yields consistently higher accuracy compared to relying on traditional predefined rules of connectivity. Furthermore, in certain settings it leads to significant savings in number of parameters.
News Article Teaser Tweets and How to Generate Them
We define the task of teaser generation and provide an evaluation benchmark and baseline systems for it. A teaser is a short reading suggestion for an article that is illustrative and includes curiosity-arousing elements to entice potential readers to read the news item. Teasers are one of the main vehicles for transmitting news to social media users. We compile a novel dataset of teasers by systematically accumulating tweets and selecting ones that conform to the teaser definition. We compare a number of neural abstractive architectures on the task of teaser generation and the overall best performing system is See et al.(2017)’s seq2seq with pointer network.
Call Detail Records Driven Anomaly Detection and Traffic Prediction in Mobile Cellular Networks
Mobile networks possess information about the users as well as the network. Such information is useful for making the network end-to-end visible and intelligent. Big data analytics can efficiently analyze user and network information, unearth meaningful insights with the help of machine learning tools. Utilizing big data analytics and machine learning, this work contributes in three ways. First, we utilize the call detail records (CDR) data to detect anomalies in the network. For authentication and verification of anomalies, we use k-means clustering, an unsupervised machine learning algorithm. Through effective detection of anomalies, we can proceed to suitable design for resource distribution as well as fault detection and avoidance. Second, we prepare anomaly-free data by removing anomalous activities and train a neural network model. By passing anomaly and anomaly-free data through this model, we observe the effect of anomalous activities in training of the model and also observe mean square error of anomaly and anomaly free data. Lastly, we use an autoregressive integrated moving average (ARIMA) model to predict future traffic for a user. Through simple visualization, we show that anomaly free data better generalizes the learning models and performs better on prediction task.
Semantic DMN: Formalizing and Reasoning About Decisions in the Presence of Background Knowledg
The Decision Model and Notation (DMN) is a recent OMG standard for the elicitation and representation of decision models, and for managing their interconnection with business processes. DMN builds on the notion of decision table, and their combination into more complex decision requirements graphs (DRGs), which bridge between business process models and decision logic models. DRGs may rely on additional, external business knowledge models, whose functioning is not part of the standard. In this work, we consider one of the most important types of business knowledge, namely background knowledge that conceptually accounts for the structural aspects of the domain of interest, and propose decision requirement knowledge bases (DKBs), where DRGs are modeled in DMN, and domain knowledge is captured by means of first-order logic with datatypes. We provide a logic-based semantics for such an integration, and formalize different DMN reasoning tasks for DKBs. We then consider background knowledge formulated as a description logic ontology with datatypes, and show how the main verification tasks for DMN in this enriched setting, can be formalized as standard DL reasoning services, and actually carried out in ExpTime. We discuss the effectiveness of our framework on a case study in maritime security. This work is under consideration in Theory and Practice of Logic Programming (TPLP).
Security and Privacy Issues in Deep Learning
With the development of machine learning, expectations for artificial intelligence (AI) technology are increasing day by day. In particular, deep learning has shown enriched performance results in a variety of fields. There are many applications that are closely related to our daily life, such as making significant decisions in application area based on predictions or classifications, in which a deep learning (DL) model could be relevant. Hence, if a DL model causes mispredictions or misclassifications due to malicious external influences, it can cause very large difficulties in real life. Moreover, training deep learning models involves relying on an enormous amount of data and the training data often includes sensitive information. Therefore, deep learning models should not expose the privacy of such data. In this paper, we reviewed the threats and developed defense methods on the security of the models and the data privacy under the notion of SPAI: Secure and Private AI. We also discuss current challenges and open issues.
Rank and Rate: Multi-task Learning for Recommender Systems
The two main tasks in the Recommender Systems domain are the ranking and rating prediction tasks. The rating prediction task aims at predicting to what extent a user would like any given item, which would enable to recommend the items with the highest predicted scores. The ranking task on the other hand directly aims at recommending the most valuable items for the user. Several previous approaches proposed learning user and item representations to optimize both tasks simultaneously in a multi-task framework. In this work we propose a novel multi-task framework that exploits the fact that a user does a two-phase decision process – first decides to interact with an item (ranking task) and only afterward to rate it (rating prediction task). We evaluated our framework on two benchmark datasets, on two different configurations and showed its superiority over state-of-the-art methods.
Gender Bias in Neural Natural Language Processing
We examine whether neural natural language processing (NLP) systems reflect historical biases in training data. We define a general benchmark to quantify gender bias in a variety of neural NLP tasks. Our empirical evaluation with state-of-the-art neural coreference resolution and textbook RNN-based language models trained on benchmark datasets finds significant gender bias in how models view occupations. We then mitigate bias with CDA: a generic methodology for corpus augmentation via causal interventions that breaks associations between gendered and gender-neutral words. We empirically show that CDA effectively decreases gender bias while preserving accuracy. We also explore the space of mitigation strategies with CDA, a prior approach to word embedding debiasing (WED), and their compositions. We show that CDA outperforms WED, drastically so when word embeddings are trained. For pre-trained embeddings, the two methods can be effectively composed. We also find that as training proceeds on the original data set with gradient descent the gender bias grows as the loss reduces, indicating that the optimization encourages bias; CDA mitigates this behavior.
Practical Constrained Optimization of Auction Mechanisms in E-Commerce Sponsored Search Advertising
Sponsored search in E-commerce platforms such as Amazon, Taobao and Tmall provides sellers an effective way to reach potential buyers with most relevant purpose. In this paper, we study the auction mechanism optimization problem in sponsored search on Alibaba’s mobile E-commerce platform. Besides generating revenue, we are supposed to maintain an efficient marketplace with plenty of quality users, guarantee a reasonable return on investment (ROI) for advertisers, and meanwhile, facilitate a pleasant shopping experience for the users. These requirements essentially pose a constrained optimization problem. Directly optimizing over auction parameters yields a discontinuous, non-convex problem that denies effective solutions. One of our major contribution is a practical convex optimization formulation of the original problem. We devise a novel re-parametrization of auction mechanism with discrete sets of representative instances. To construct the optimization problem, we build an auction simulation system which estimates the resulted business indicators of the selected parameters by replaying the auctions recorded from real online requests. We summarized the experiments on real search traffics to analyze the effects of fidelity of auction simulation, the efficacy under various constraint targets and the influence of regularization. The experiment results show that with proper entropy regularization, we are able to maximize revenue while constraining other business indicators within given ranges.
t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data
Predicting Solution Summaries to Integer Linear Programs under Imperfect Information with Machine Learning
The paper provides a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict solution summaries (i.e., solution descriptions at a given level of detail) to discrete stochastic optimization problems. We approximate the solutions based on supervised learning and the training dataset consists of a large number of deterministic problems that have been solved independently and offline. Uncertainty regarding a missing subset of the inputs is addressed through sampling and aggregation methods. Our motivating application concerns booking decisions of intermodal containers on double-stack trains. Under perfect information, this is the so-called load planning problem and it can be formulated by means of integer linear programming. However, the formulation cannot be used for the application at hand because of the restricted computational budget and unknown container weights. The results show that standard deep learning algorithms allow one to predict descriptions of solutions with high accuracy in very short time (milliseconds or less).
FADE: Fast and Asymptotically efficient Distributed Estimator for dynamic networks
Consider a set of agents that wish to estimate a vector of parameters of their mutual interest. For this estimation goal, agents can sense and communicate. When sensing, an agent measures (in additive gaussian noise) linear combinations of the unknown vector of parameters. When communicating, an agent can broadcast information to a few other agents, by using the channels that happen to be randomly at its disposal at the time. To coordinate the agents towards their estimation goal, we propose a novel algorithm called FADE (Fast and Asymptotically efficient Distributed Estimator), in which agents collaborate at discrete time-steps; at each time-step, agents sense and communicate just once, while also updating their own estimate of the unknown vector of parameters. FADE enjoys five attractive features: first, it is an intuitive estimator, simple to derive; second, it withstands dynamic networks, that is, networks whose communication channels change randomly over time; third, it is strongly consistent in that, as time-steps play out, each agent’s local estimate converges (almost surely) to the true vector of parameters; fourth, it is both asymptotically unbiased and efficient, which means that, across time, each agent’s estimate becomes unbiased and the mean-square error (MSE) of each agent’s estimate vanishes to zero at the same rate of the MSE of the optimal estimator at an almighty central node; fifth, and most importantly, when compared with a state-of-art consensus+innovation (CI) algorithm, it yields estimates with outstandingly lower mean-square errors, for the same number of communications — for example, in a sparsely connected network model with 50 agents, we find through numerical simulations that the reduction can be dramatic, reaching several orders of magnitude.
Integrated Continuous-time Hidden Markov Models
Motivated by applications in movement ecology, in this paper I propose a new class of integrated continuous-time hidden Markov models in which each observation depends on the underlying state of the process over the whole interval since the previous observation, not only on its current state. I show that under appropriate conditioning, such a model can be regarded as a conventional hidden Markov model, enabling efficient evaluation of its likelihood without sampling of its state sequence. This leads to an algorithm for inference which is more efficient, and scales better with the number of data, than existing methods. An application to animal movement data is given.
Egocentric Spatial Memory
Egocentric spatial memory (ESM) defines a memory system with encoding, storing, recognizing and recalling the spatial information about the environment from an egocentric perspective. We introduce an integrated deep neural network architecture for modeling ESM. It learns to estimate the occupancy state of the world and progressively construct top-down 2D global maps from egocentric views in a spatially extended environment. During the exploration, our proposed ESM model updates belief of the global map based on local observations using a recurrent neural network. It also augments the local mapping with a novel external memory to encode and store latent representations of the visited places over long-term exploration in large environments which enables agents to perform place recognition and hence, loop closure. Our proposed ESM network contributes in the following aspects: (1) without feature engineering, our model predicts free space based on egocentric views efficiently in an end-to-end manner; (2) different from other deep learning-based mapping system, ESMN deals with continuous actions and states which is vitally important for robotic control in real applications. In the experiments, we demonstrate its accurate and robust global mapping capacities in 3D virtual mazes and realistic indoor environments by comparing with several competitive baselines.
• SafeDrive: Enhancing Lane Appearance for Autonomous and Assisted Driving Under Limited Visibility• Trajectory Optimization for Cooperative Dual-band UAV Swarms• Enumerating Cryptarithms Using Deterministic Finite Automata• Efficient Gauss-Newton-Krylov momentum conservation constrained PDE-LDDMM using the band-limited vector field parameterization• Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems• Kinetic-controlled hydrodynamics for traffic models with driver-assist vehicles• Efficiency, Sequenceability and Deal-Optimality in Fair Division of Indivisible Goods• Refining the bijections among ascent sequences, (2+2)-free posets, integer matrices and pattern-avoiding permutations• Graphs admitting only constant splines• Parameterized Orientable Deletion• A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation• Estimating Failure in Brittle Materials using Graph Theory• On Approximating (Sparse) Covering Integer Programs• Markerless Visual Robot Programming by Demonstration• Sub-Nyquist Radar Systems: Temporal, Spectral and Spatial Compression• The structure of claw-free binary matroids• Textual Explanations for Self-Driving Vehicles• Deep Recurrent Neural Networks for ECG Signal Denoising• Reach-Avoid Problems via Sum-of-Squares Optimization and Dynamic Programming• Time-frequency transforms of white noises and Gaussian analytic functions• Lattice Agreement in Message Passing Systems• Testing the Efficient Network TRaining (ENTR) Hypothesis: initially reducing training image size makes Convolutional Neural Network training for image recognition tasks more efficient• UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering• Acquisition of Localization Confidence for Accurate Object Detection• The Efficiency of Geometric Samplers for Exoplanet Transit Timing Variation Models• On the localization of the roots for Kac polynomials• A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum• Pulse Sequence Resilient Fast Brain Segmentation• Fast and Robust Symmetric Image Registration Based on Intensity and Spatial Information• Non-crossing trees, quadrangular dissections, ternary trees, and duality preserving bijections• Doubly Attentive Transformer Machine Translation• Pareto-Optimization Framework for Automated Network-on-Chip Design• A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics• Tight Upper Bounds on the Crossing Number in a Minor-Closed Class• An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization• Shared Spectrum for Mobile-Cells Backhaul and Access Link• K-medoids Clustering of Data Sequences with Composite Distributions• Count-Based Exploration with the Successor Representation• Security against false data injection attack in cyber-physical systems• MnasNet: Platform-Aware Neural Architecture Search for Mobile• Optimized Transmission for Consensus in Wireless Sensor Networks• Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems• Interactive Summarization and Exploration of Top Aggregate Query Answers• Deep Graph Laplacian Regularization• Adaptive Non-Parametric Regression With the $K$-NN Fused Lasso• Brain MRI Image Super Resolution using Phase Stretch Transform and Transfer Learning• Composable Core-sets for Determinant Maximization Problems via Spectral Spanners• The Devil of Face Recognition is in the Noise• Truthful Peer Grading with Limited Effort from Teaching Staff• Unmanned Aerial Vehicle Path Planning for Traffic Estimation and Detection of Non-Recurrent Congestion• A Construction of Bent Functions on a finite group• Optimization by Pairwise Linkage Detection, Incremental Linkage Set, and Restricted / Back Mixing: DSMGA-II• Deep Learning-based CSI Feedback Approach for Time-varying Massive MIMO Channels• Improving the Annotation of DeepFashion Images for Fine-grained Attribute Recognition• Leveraging Unlabeled Whole-Slide-Images for Mitosis Detection• Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder• Deep Belief Networks Based Feature Generation and Regression for Predicting Wind Power• Extremes of Locally-stationary Chi-square processes on discrete grids• Deep Cross Modal Learning for Caricature Verification and Identification(CaVINet)• Neural Article Pair Modeling for Wikipedia Sub-article Matching• Regular self-dual and self-Petrie-dual maps of arbitrary valency• Spectrum concentration in deep residual learning: a free probability appproach• Generalization of core percolation on complex networks• Input-to-State Stability of a Clamped-Free Damped String in the Presence of Distributed and Boundary Disturbances• Multimodal Deep Domain Adaptation• SegStereo: Exploiting Semantic Information for Disparity Estimation• Two curve Chebyshev approximation and its application to signal clustering• Efficient Computation of Sequence Mappability• Learning Collaborative Generation Correction Modules for Blind Image Deblurring and Beyond• Inserting an Edge into a Geometric Embedding• RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification• Deep Learning in Physical Layer Communications• Using Feature Grouping as a Stochastic Regularizer for High-Dimensional Noisy Data• A Robust Deep Attention Network to Noisy Labels in Semi-supervised Biomedical Segmentation• Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks• Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals• A Zero-Shot Framework for Sketch-based Image Retrieval• Expectation of the Largest Betting Size in Labouchère System• Neighborhood Complexes of Kneser Graphs, $KG_{3,k}$• Size reconstructibility of graphs• Robust distributed calibration of radio interferometers with direction dependent distortions• Modeling joint probability distribution of yield curve parameters• Deep Visual Odometry Methods for Mobile Robots• Combinatorial proofs of some linear algebraic identities• Co-existence of Trend and Value in Financial Markets: Estimating an Extended Chiarella Model• Epidemic Spreading and Aging in Temporal Networks with Memory• A First Experiment on Including Text Literals in KGloVe• Remote sensing image regression for heterogeneous change detection• Interior gradient and Hessian estimates for the Dirichlet problem of semi-linear degenerate elliptic systems: a probabilistic approach• The Becker-Döring process: law of large numbers and non-equilibrium potential• On subgroups of minimal index• Maximal displacement and population growth for branching Brownian motions• Robustness of the pathways structure of fluctuations in stochastic homogenization• Scale equivariance in CNNs with vector fields• Nodal Lengths in Shrinking Domains for Random Eigenfunctions on $\mathbb{S}^2$• Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition• Bayesian Uncertainty Estimation Under Complex Sampling• A note on full weight spectrum codes• Speed-sensorless state feedback control of induction machines with LC filter• Disaster Monitoring using Unmanned Aerial Vehicles and Deep Learning• Deep learning in agriculture: A survey• Single-shot holographic 3D particle-localization under multiple scattering• OpenCLIPER: an OpenCL-based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices• On the spectral gap of some Cayley graphs on the Weyl group $W(B_n)$• A critique of the econometrics of happiness: Are we underestimating the returns to education and income?• Inferring the ground truth through crowdsourcing• Extensible Grounding of Speech for Robot Instruction• Synchronization patterns in LIF Neural Networks: Merging Nonlocal and Diagonal Connectivity• Resource Allocation in Full-Duplex Mobile-Edge Computing Systems with NOMA and Energy Harvesting• Investigating the time dynamics of high frequency wind speed in complex terrains by using the Fisher-Shannon method: application to Switzerland• Joint Learning of Intrinsic Images and Semantic Segmentation• Antipodes of monoidal decomposition spaces• Data Center Interconnects at 400G and Beyond• On the Unbiased Asymptotic Normality of Quantile Regression with Fixed Effects• Weak ergodic theorem for Markov chains in the absence of invariant countably additive measures• On Exploring Temporal Graphs of Small Pathwidth• Parallel Optimal Control for Cooperative Automation of Large-scale Connected Vehicles via ADMM• Stochastic Gradient Descent with Biased but Consistent Gradient Estimators• Deep Dual Pyramid Network for Barcode Segmentation using Barcode-30k Database• Gaussian Process Landmarking for Three-Dimensional Geometric Morphometrics• Deep End-to-end Fingerprint Denoising and Inpainting• Minimal Ramsey graphs for cyclicity• The Formal Inverse of the Period-Doubling Sequence• Effective Parallel Corpus Mining using Bilingual Sentence Embeddings• Computing the Strategy to Commit to in Polymatrix Games (Extended Version)• End-to-End Physics Event Classification with the CMS Open Data: Applying Image-based Deep Learning on Detector Data to Directly Classify Collision Events at the LHC• Real-Time Millimeter-Wave MIMO Channel Sounder for Dynamic Directional Measurements• What am I searching for?• Gender Privacy: An Ensemble of Semi Adversarial Networks for Confounding Arbitrary Gender Classifiers• Entanglement cost and quantum channel simulation
Like this:
Like Loading…
Related