Whats new on arXiv

FedMark: A Marketplace for Federated Data on the Web

The Web of Data (WoD) has experienced a phenomenal growth in the past. This growth is mainly fueled by tireless volunteers, government subsidies, and open data legislations. The majority of commercial data has not made the transition to the WoD, yet. The problem is that it is not clear how publishers of commercial data can monetize their data in this new setting. Advertisement, which is one of the main financial engines of the World Wide Web, cannot be applied to the Web of Data as such unwanted data can easily be filtered out, automatically. This raises the question how the WoD can (i) maintain its grow when subsidies disappear and (ii) give commercial data providers financial incentives to share their wealth of data. In this paper, we propose a marketplace for the WoD as a solution for this data monetization problem. Our approach allows a customer to transparently buy data from a combination of different providers. To that end, we introduce two different approaches for deciding which data elements to buy and compare their performance. We also introduce FedMark, a prototypical implementation of our marketplace that represents a first step towards an economically viable WoD beyond subsidies.

Discovering Context Specific Causal Relationships

With the increasing need of personalised decision making, such as personalised medicine and online recommendations, a growing attention has been paid to the discovery of the context and heterogeneity of causal relationships. Most existing methods, however, assume a known cause (e.g. a new drug) and focus on identifying from data the contexts of heterogeneous effects of the cause (e.g. patient groups with different responses to the new drug). There is no approach to efficiently detecting directly from observational data context specific causal relationships, i.e. discovering the causes and their contexts simultaneously. In this paper, by taking the advantages of highly efficient decision tree induction and the well established causal inference framework, we propose the Tree based Context Causal rule discovery (TCC) method, for efficient exploration of context specific causal relationships from data. Experiments with both synthetic and real world data sets show that TCC can effectively discover context specific causal rules from the data.

The Mismatch Principle: Statistical Learning Under Large Model Uncertainties

We study the learning capacity of empirical risk minimization with regard to the squared loss and a convex hypothesis class consisting of linear functions. While these types of estimators were originally designed for noisy linear regression problems, it recently turned out that they are in fact capable of handling considerably more complicated situations, involving highly non-linear distortions. This work intends to provide a comprehensive explanation of this somewhat astonishing phenomenon. At the heart of our analysis stands the mismatch principle, which is a simple, yet generic recipe to establish theoretical error bounds for empirical risk minimization. The scope of our results is fairly general, permitting arbitrary sub-Gaussian input-output pairs, possibly with strongly correlated feature variables. Noteworthy, the mismatch principle also generalizes to a certain extent the classical orthogonality principle for ordinary least squares. This adaption allows us to investigate problem setups of recent interest, most importantly, high-dimensional parameter regimes and non-linear observation processes. In particular, our theoretical framework is applied to various scenarios of practical relevance, such as single-index models, variable selection, and strongly correlated designs. We thereby demonstrate the key purpose of the mismatch principle, that is, learning (semi-)parametric output rules under large model uncertainties and misspecifications.

Deep learning, deep change? Mapping the development of the Artificial Intelligence General Purpose Technology

General Purpose Technologies (GPTs) that can be applied in many industries are an important driver of economic growth and national and regional competitiveness. In spite of this, the geography of their development and diffusion has not received significant attention in the literature. We address this with an analysis of Deep Learning (DL), a core technique in Artificial Intelligence (AI) increasingly being recognized as the latest GPT. We identify DL papers in a novel dataset from ArXiv, a popular preprints website, and use CrunchBase, a technology business directory to measure industrial capabilities related to it. After showing that DL conforms with the definition of a GPT, having experienced rapid growth and diffusion into new fields where it has generated an impact, we describe changes in its geography. Our analysis shows China’s rise in AI rankings and relative decline in several European countries. We also find that initial volatility in the geography of DL has been followed by consolidation, suggesting that the window of opportunity for new entrants might be closing down as new DL research hubs become dominant. Finally, we study the regional drivers of DL clustering. We find that competitive DL clusters tend to be based in regions combining research and industrial activities related to it. This could be because GPT developers and adopters located close to each other can collaborate and share knowledge more easily, thus overcoming coordination failures in GPT deployment. Our analysis also reveals a Chinese comparative advantage in DL after we control for other explanatory factors, perhaps underscoring the importance of access to data and supportive policies for the successful development of this complex, `omni-use’ technology.

Causal Discovery by Telling Apart Parents and Children

We consider the problem of inferring the directed, causal graph from observational data, assuming no hidden confounders. We take an information theoretic approach, and make three main contributions. First, we show how through algorithmic information theory we can obtain SCI, a highly robust, effective and computationally efficient test for conditional independence—and show it outperforms the state of the art when applied in constraint-based inference methods such as stable PC. Second, building upon on SCI, we show how to tell apart the parents and children of a given node based on the algorithmic Markov condition. We give the Climb algorithm to efficiently discover the directed, causal Markov blanket—and show it is at least as accurate as inferring the global network, while being much more efficient. Last, but not least, we detail how we can use the Climb score to direct those edges that state of the art causal discovery algorithms based on PC or GES leave undirected—and show this improves their precision, recall and F1 scores by up to 20%.

Learning to Learn from Web Data through Deep Semantic Embeddings

In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.

Faster Support Vector Machines

The time complexity of support vector machines (SVMs) prohibits training on huge data sets with millions of samples. Recently, multilevel approaches to train SVMs have been developed to allow for time efficient training on huge data sets. While regular SVMs perform the entire training in one – time consuming – optimization step, multilevel SVMs first build a hierarchy of problems decreasing in size that resemble the original problem and then train an SVM model for each hierarchy level benefiting from the solved models of previous levels. We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy. Extensive experiments show that our new algorithm achieves speed-ups up to two orders of magnitude while having similar or better classification quality over state-of-the-art algorithms.

DeeSIL: Deep-Shallow Incremental Learning

Incremental Learning (IL) is an interesting AI problem when the algorithm is assumed to work on a budget. This is especially true when IL is modeled using a deep learning approach, where two complex challenges arise due to limited memory, which induces catastrophic forgetting and delays related to the retraining needed in order to incorporate new classes. Here we introduce DeeSIL, an adaptation of a known transfer learning scheme that combines a fixed deep representation used as feature extractor and learning independent shallow classifiers to increase recognition capacity. This scheme tackles the two aforementioned challenges since it works well with a limited memory budget and each new concept can be added within a minute. Moreover, since no deep retraining is needed when the model is incremented, DeeSIL can integrate larger amounts of initial data that provide more transferable features. Performance is evaluated on ImageNet LSVRC 2012 against three state of the art algorithms. Results show that, at scale, DeeSIL performance is 23 and 33 points higher than the best baseline when using the same and more initial data respectively.

Towards Fine Grained Network Flow Prediction

One main challenge for the design of networks is that traffic load is not generally known in advance. This makes it hard to adequately devote resources such as to best prevent or mitigate bottlenecks. While several authors have shown how to predict traffic in a coarse grained manner by aggregating flows, fine grained prediction of traffic at the level of individual flows, including bursty traffic, is widely considered to be impossible. This paper shows, to the best of our knowledge, the first approach to fine grained per flow traffic prediction. In short, we introduce the Frequency-based Kernel Kalman Filter (FKKF), which predicts individual flows’ behavior based on measurements. Our FKKF relies on the well known Kalman Filter in combination with a kernel to support the prediction of non linear functions. Furthermore we change the operating space from time to frequency space. In this space, into which we transform the input data via a Short-Time Fourier Transform (STFT), the peak structures of flows can be predicted after gleaning their key characteristics, with a Principal Component Analysis (PCA), from past and ongoing flows that stem from the same socket-to-socket connection. We demonstrate the effectiveness of our approach on popular benchmark traces from a university data center. Our approach predicts traffic on average across 17 out of 20 groups of flows with an average prediction error of 6.43% around 0.49 (average) seconds in advance, whilst existing coarse grained approaches exhibit prediction errors of 77% at best.

A Structural-Factor Approach to Modeling High-Dimensional Time Series $_{2.5}$

Adaptive Document Retrieval for Deep Question Answering

State-of-the-art systems in deep question answering proceed as follows: (1) an initial document retrieval selects relevant documents, which (2) are then processed by a neural network in order to extract the final answer. Yet the exact interplay between both components is poorly understood, especially concerning the number of candidate documents that should be retrieved. We show that choosing a static number of documents — as used in prior research — suffers from a noise-information trade-off and yields suboptimal results. As a remedy, we propose an adaptive document retrieval model. This learns the optimal candidate number for document retrieval, conditional on the size of the corpus and the query. We report extensive experimental results showing that our adaptive approach outperforms state-of-the-art methods on multiple benchmark datasets, as well as in the context of corpora with variable sizes.

Triangle Lasso for Simultaneous Clustering and Optimization in Graph Datasets

Recently, network lasso has drawn many attentions due to its remarkable performance on simultaneous clustering and optimization. However, it usually suffers from the imperfect data (noise, missing values etc), and yields sub-optimal solutions. The reason is that it finds the similar instances according to their features directly, which is usually impacted by the imperfect data, and thus returns sub-optimal results. In this paper, we propose triangle lasso to avoid its disadvantage. Triangle lasso finds the similar instances according to their neighbours. If two instances have many common neighbours, they tend to become similar. Although some instances are profiled by the imperfect data, it is still able to find the similar counterparts. Furthermore, we develop an efficient algorithm based on Alternating Direction Method of Multipliers (ADMM) to obtain a moderately accurate solution. In addition, we present a dual method to obtain the accurate solution with the low additional time consumption. We demonstrate through extensive numerical experiments that triangle lasso is robust to the imperfect data. It usually yields a better performance than the state-of-the-art method when performing data analysis tasks in practical scenarios.

The Deconfounded Recommender: A Causal Inference Approach to Recommendation

The goal of a recommender system is to show its users items that they will like. In forming its prediction, the recommender system tries to answer: ‘what would the rating be if we ‘forced’ the user to watch the movie?’ This is a question about an intervention in the world, a causal question, and so traditional recommender systems are doing causal inference from observational data. This paper develops a causal inference approach to recommendation. Traditional recommenders are likely biased by unobserved confounders, variables that affect both the ‘treatment assignments’ (which movies the users watch) and the ‘outcomes’ (how they rate them). We develop the deconfounded recommender, a strategy to leverage classical recommendation models for causal predictions. The deconfounded recommender uses Poisson factorization on which movies users watched to infer latent confounders in the data; it then augments common recommendation models to correct for potential confounding bias. The deconfounded recommender improves recommendation and it enjoys stable performance against interventions on test sets.

• The empirical likelihood prior applied to bias reduction of general estimating equations• Indoor Coverage Enhancement for mmWave Systems with Passive Reflectors: Measurements and Ray Tracing Simulations• SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing• Optimal Control for Discrete-time Markov Jump Linear System with Control Input Delay• Improved Decision Rule Approximations for Multi-Stage Robust Optimization via Copositive Programming• Lexicosyntactic Inference in Neural Models• Theoretical study of an adaptive cubic regularization method with dynamic inexact Hessian information• Spatio-temproal prediction of crimes using network analytic approach• XL-NBT: A Cross-lingual Neural Belief Tracking Framework• $Z_2\times Z_2$-cordial cycle-free hypergraphs• Dynamic Temporal Alignment of Speech to Lips• An incremental local-first community detection method for dynamic graphs• Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery• Counting Connected Graphs without Overlapping Cycles• Pseudorandom Generators for Read-Once Branching Programs, in any Order• Neural Machine Translation of Text from Non-Native Speakers• Iteration-Complexity of the Subgradient Method on Riemannian Manifolds with Lower Bounded Curvature• Applying Machine Learning To Maize Traits Prediction• Person Re-Identification by Semantic Region Representation and Topology Constraint• Incremental Learning in Person Re-Identification• Multimodal speech synthesis architecture for unsupervised speaker adaptation• Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension• Seymour’s Second Neighborhood Conjecture for Subsets• Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions• Refined Asymptotics in the Online Selection of an Increasing Subsequence• Question Generation from SQL Queries Improves Neural Semantic Parsing• Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding• Reed-Solomon codes over small fields with constrained generator matrices• Analysis of ‘Learn-As-You-Go’ (LAGO) Studies• Binomial coefficients and multifactorial numbers through generative grammars• A General Framework of Multi-Armed Bandit Processes by Switching Restrictions• Stability condition of a two-dimensional QBD process and its application to estimation of efficiency for two-queue models• Group-Strategyproof mechanisms for facility location with Euclidean distance• Universal Image Manipulation Detection using Deep Siamese Convolutional Neural Network• PAC-learning is Undecidable• GPU PaaS Computation Model in Aneka Cloud Computing Environment• Wrangling Rogues: Managing Experimental Post-Moore Architectures• Optimal asset allocation for a DC plan with partial information under inflation and mortality risks• On cyclic codes of length $2^e$ over finite fields• On the error in Laplace approximations of high-dimensional integrals• A Distribution Similarity Based Regularizer for Learning Bayesian Networks• Optimal gradient estimates of heat kernels of stable-like operators• Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality• Signed Graph Convolutional Network• Towards Anticipation of Architectural Smells using Link Prediction Techniques• Alzheimer’s Disease Modelling and Staging through Independent Gaussian Process Analysis of Spatio-Temporal Brain Changes• Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods• Wandering chimeras in adaptive network of pulse-coupled oscillators• Spectrum of free-form Sudoku graphs• Progressive Operational Perceptron with Memory• FAMU: study of the energy dependent transfer rate $Λ_{μp \rightarrow μO}$• FusionNet and AugmentedFlowNet: Selective Proxy Ground Truth for Training on Unlabeled Images• Configurable Distributed Physical Downlink Control Channel for 5G New Radio: ResourceBundling and Diversity Trade-off• Bayesian Regression for a Dirichlet Distributed Response using Stan• Amplitude Quantization for Type-2 Codebook Based CSI Feedback in New Radio System• PPP-Completeness with Connections to Cryptography• Semiparametric estimation of structural failure time model in continuous-time processes• Evolutionary, Mean-Field and Pressure-Resistance Game Modelling of Networks Security• Scalable Edge Partitioning• Dynamic Intention-Aware Recommendation with Self-Attention• Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up and Enhance Recommendations• Spillover Effects in Cluster Randomized Trials with Noncompliance• What Stands-in for a Missing Tool? A Prototypical Grounded Knowledge-based Approach to Tool Substitution• CapsDeMM: Capsule network for Detection of Munro\textquoteright s Microabscess in skin biopsy images• A unified Framework for Robust Modelling of Financial Markets in discrete time• On the almost decrease of a subexponential density• An Assessment of Covariates of Nonstationary Storm Surge Statistical Behavior by Bayesian Model Averaging• Synthetic Patient Generation: A Deep Learning Approach Using Variational Autoencoders• On the compression of messages in the multi-party setting• A Class of Non-Parametric Statistical Manifolds modelled on Sobolev Space• Reproducible evaluation of classification methods in Alzheimer’s disease: framework and application to MRI and PET data• Translational Motion Compensation for Soft Tissue Velocity Images• Learning to Dialogue via Complex Hindsight Experience Replay• Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies• Optimized Rate-Adaptive Protograph-Based LDPC Codes for Source Coding with Side Information• State-of-the-art Chinese Word Segmentation with Bi-LSTMs• Single-View Place Recognition under Seasonal Changes• Simultaneous synthesis of FLAIR and segmentation of white matter hypointensities from T1 MRIs• CU-Net: Coupled U-Nets• The asymmetric traveling salesman path LP has constant integrality ratio• Detecting Core-Periphery Structure in Spatial Networks• On generalized Erdös-Ginzburg-Ziv constants for $\mathbb{Z}_2^d$• Multi-View Graph Embedding Using Randomized Shortest Paths• Class-Aware Fully-Convolutional Gaussian and Poisson Denoising• Dynamic-sensitive cooperation in the presence of multiple strategy updating rules• Splitter Theorems for Graph Immersions• Detecting cognitive impairments by agreeing on interpretations of linguistic features• A Semi-Supervised and Inductive Embedding Model for Churn Prediction of Large-Scale Mobile Games• Peptide-Spectra Matching from Weak Supervision• Contract-based Incentive Mechanism for LTE over Unlicensed Channels• Improved Latency-Communication Trade-Off for Map-Shuffle-Reduce Systems with Stragglers• Learning Monocular Depth by Distilling Cross-domain Stereo Networks• Video-to-Video Synthesis

Like this:

Like Loading…

Related