Robust Estimation of Data-Dependent Causal Effects based on Observing a Single Time-Series
Consider the case that one observes a single time-series, where at each time t one observes a data record O(t) involving treatment nodes A(t), possible covariates L(t) and an outcome node Y(t). The data record at time t carries information for an (potentially causal) effect of the treatment A(t) on the outcome Y(t), in the context defined by a fixed dimensional summary measure Co(t). We are concerned with defining causal effects that can be consistently estimated, with valid inference, for sequentially randomized experiments without further assumptions. More generally, we consider the case when the (possibly causal) effects can be estimated in a double robust manner, analogue to double robust estimation of effects in the i.i.d. causal inference literature. We propose a general class of averages of conditional (context-specific) causal parameters that can be estimated in a double robust manner, therefore fully utilizing the sequential randomization. We propose a targeted maximum likelihood estimator (TMLE) of these causal parameters, and present a general theorem establishing the asymptotic consistency and normality of the TMLE. We extend our general framework to a number of typically studied causal target parameters, including a sequentially adaptive design within a single unit that learns the optimal treatment rule for the unit over time. Our work opens up robust statistical inference for causal questions based on observing a single time-series on a particular unit.
Wavelet estimation of the dimensionality of curve time series
Functional data analysis is ubiquitous in most areas of sciences and engineering. Several paradigms are proposed to deal with the dimensionality problem which is inherent to this type of data. Sparseness, penalization, thresholding, among other principles, have been used to tackle this issue. We discuss here a solution based on a finite-dimensional functional space. We employ wavelet representation of the functionals to estimate this finite dimension, and successfully model a time series of curves. The proposed method is shown to have nice asymptotic properties. Moreover, the wavelet representation permits the use of several bootstrap procedures, and it results in faster computing algorithms. Besides the theoretical and computational properties, some simulation studies and an application to real data are provided.
Deep Smoke Segmentation
Inspired by the recent success of fully convolutional networks (FCN) in semantic segmentation, we propose a deep smoke segmentation network to infer high quality segmentation masks from blurry smoke images. To overcome large variations in texture, color and shape of smoke appearance, we divide the proposed network into a coarse path and a fine path. The first path is an encoder-decoder FCN with skip structures, which extracts global context information of smoke and accordingly generates a coarse segmentation mask. To retain fine spatial details of smoke, the second path is also designed as an encoder-decoder FCN with skip structures, but it is shallower than the first path network. Finally, we propose a very small network containing only add, convolution and activation layers to fuse the results of the two paths. Thus, we can easily train the proposed network end to end for simultaneous optimization of network parameters. To avoid the difficulty in manually labelling fuzzy smoke objects, we propose a method to generate synthetic smoke images. According to results of our deep segmentation method, we can easily and accurately perform smoke detection from videos. Experiments on three synthetic smoke datasets and a realistic smoke dataset show that our method achieves much better performance than state-of-the-art segmentation algorithms based on FCNs. Test results of our method on videos are also appealing.
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text
Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
A Recurrent Neural Network for Sentiment Quantification
Quantification is a supervised learning task that consists in predicting, given a set of classes C and a set D of unlabelled items, the prevalence (or relative frequency) p(c | D) of each class c in C. Quantification can in principle be solved by classifying all the unlabelled items and counting how many of them have been attributed to each class. However, this ‘classify and count’ approach has been shown to yield suboptimal quantification accuracy; this has established quantification as a task of its own, and given rise to a number of methods specifically devised for it. We propose a recurrent neural network architecture for quantification (that we call QuaNet) that observes the classification predictions to learn higher-order ‘quantification embeddings’, which are then refined by incorporating quantification predictions of simple classify-and-count-like methods. We test {QuaNet on sentiment quantification on text, showing that it substantially outperforms several state-of-the-art baselines. |
Understanding Regularization in Batch Normalization
Batch Normalization (BN) makes output of hidden neuron had zero mean and unit variance, improving convergence and generalization when training neural networks. This work understands these phenomena theoretically. We analyze BN by using a building block of neural networks, which consists of a weight layer, a BN layer, and a nonlinear activation function. This simple network helps us understand the characteristics of BN, where the results are generalized to deep models in numerical studies. We explore BN in three aspects. First, by viewing BN as a stochastic process, an analytical form of regularization inherited in BN is derived. Second, the optimization dynamic with this regularization shows that BN enables training converged with large maximum and effective learning rates. Third, BN’s generalization with regularization is explored by using random matrix theory and statistical mechanics. Both simulations and experiments support our analyses.
A Neural Network Model for Determining the Success or Failure of High-tech Projects Development: A Case of Pharmaceutical industry
Financing high-tech projects always entails a great deal of risk. The lack of a systematic method to pinpoint the risk of such projects has been recognized as one of the most salient barriers for evaluating them. So, in order to develop a mechanism for evaluating high-tech projects, an Artificial Neural Network (ANN) has been developed through this study. The structure of this paper encompasses four parts. The first part deals with introducing paper’s whole body. The second part gives a literature review. The collection process of risk related variables and the process of developing a Risk Assessment Index system (RAIS) through Principal Component Analysis (PCA) are those issues that are discussed in the third part. The fourth part particularly deals with pharmaceutical industry. Finally, the fifth part has focused on developing an ANN for pattern recognition of failure or success of high-tech projects. Analysis of model’s results and a final conclusion are also presented in this part.
Geometric Operator Convolutional Neural Network
The Convolutional Neural Network (CNN) has been successfully applied in many fields during recent decades; however it lacks the ability to utilize prior domain knowledge when dealing with many realistic problems. We present a framework called Geometric Operator Convolutional Neural Network (GO-CNN) that uses domain knowledge, wherein the kernel of the first convolutional layer is replaced with a kernel generated by a geometric operator function. This framework integrates many conventional geometric operators, which allows it to adapt to a diverse range of problems. Under certain conditions, we theoretically analyze the convergence and the bound of the generalization errors between GO-CNNs and common CNNs. Although the geometric operator convolution kernels have fewer trainable parameters than common convolution kernels, the experimental results indicate that GO-CNN performs more accurately than common CNN on CIFAR-10/100. Furthermore, GO-CNN reduces dependence on the amount of training examples and enhances adversarial stability. In the practical task of medically diagnosing bone fractures, GO-CNN obtains 3% improvement in terms of the recall.
Parameter Transfer Extreme Learning Machine based on Projective Model
Recent years, transfer learning has attracted much attention in the community of machine learning. In this paper, we mainly focus on the tasks of parameter transfer under the framework of extreme learning machine (ELM). Unlike the existing parameter transfer approaches, which incorporate the source model information into the target by regularizing the di erence between the source and target domain parameters, an intuitively appealing projective-model is proposed to bridge the source and target model parameters. Specifically, we formulate the parameter transfer in the ELM networks by the means of parameter projection, and train the model by optimizing the projection matrix and classifier parameters jointly. Further more, the `L2,1-norm structured sparsity penalty is imposed on the source domain parameters, which encourages the joint feature selection and parameter transfer. To evaluate the e ectiveness of the proposed method, comprehensive experiments on several commonly used domain adaptation datasets are presented. The results show that the proposed method significantly outperforms the non-transfer ELM networks and other classical transfer learning methods.
JobComposer: Career Path Optimization via Multicriteria Utility Learning
With online professional network platforms (OPNs, e.g., LinkedIn, Xing, etc.) becoming popular on the web, people are now turning to these platforms to create and share their professional profiles, to connect with others who share similar professional aspirations and to explore new career opportunities. These platforms however do not offer a long-term roadmap to guide career progression and improve workforce employability. The career trajectories of OPN users can serve as a reference but they are not always optimal. A career plan can also be devised through consultation with career coaches, whose knowledge may however be limited to a few industries. To address the above limitations, we present a novel data-driven approach dubbed JobComposer to automate career path planning and optimization. Its key premise is that the observed career trajectories in OPNs may not necessarily be optimal, and can be improved by learning to maximize the sum of payoffs attainable by following a career path. At its heart, JobComposer features a decomposition-based multicriteria utility learning procedure to achieve the best tradeoff among different payoff criteria in career path planning. Extensive studies using a city state-based OPN dataset demonstrate that JobComposer returns career paths better than other baseline methods and the actual career paths.
Chi-Square Test Neural Network: A New Binary Classifier based on Backpropagation Neural Network
We introduce the chi-square test neural network: a single hidden layer backpropagation neural network using chi-square test theorem to redefine the cost function and the error function. The weights and thresholds are modified using standard backpropagation algorithm. The proposed approach has the advantage of making consistent data distribution over training and testing sets. It can be used for binary classification. The experimental results on real world data sets indicate that the proposed algorithm can significantly improve the classification accuracy comparing to related approaches.
An outlier-resistant indicator of anomalies among inter-laboratory comparison data with associated uncertainty
A new robust pairwise statistic, the pairwise median scaled difference (MSD), is proposed for the detection of anomalous location/uncertainty pairs in heteroscedastic interlaboratory study data with associated uncertainties. The distribution for the IID case is presented and approximate critical values for routine use are provided. The determination of observation-specific quantiles and p-values for heteroscedastic data, using parametric bootstrapping, is demonstrated by example. It is shown that the statistic has good power for detecting anomalies compared to a previous pairwise statistic, and offers much greater resistance to multiple outlying values.
DeepPINK: reproducible feature selection in deep neural networks
Deep learning has become increasingly popular in both supervised and unsupervised machine learning thanks to its outstanding empirical performance. However, because of their intrinsic complexity, most deep learning methods are largely treated as black box tools with little interpretability. Even though recent attempts have been made to facilitate the interpretability of deep neural networks (DNNs), existing methods are susceptible to noise and lack of robustness. Therefore, scientists are justifiably cautious about the reproducibility of the discoveries, which is often related to the interpretability of the underlying statistical models. In this paper, we describe a method to increase the interpretability and reproducibility of DNNs by incorporating the idea of feature selection with controlled error rate. By designing a new DNN architecture and integrating it with the recently proposed knockoffs framework, we perform feature selection with a controlled error rate, while maintaining high power. This new method, DeepPINK (Deep feature selection using Paired-Input Nonlinear Knockoffs), is applied to both simulated and real data sets to demonstrate its empirical utility.
Causal Explanation Analysis on Social Media
Understanding causal explanations – reasons given for happenings in one’s life – has been found to be an important psychological factor linked to physical and mental health. Causal explanations are often studied through manual identification of phrases over limited samples of personal writing. Automatic identification of causal explanations in social media, while challenging in relying on contextual and sequential cues, offers a larger-scale alternative to expensive manual ratings and opens the door for new applications (e.g. studying prevailing beliefs about causes, such as climate change). Here, we explore automating causal explanation analysis, building on discourse parsing, and presenting two novel subtasks: causality detection (determining whether a causal explanation exists at all) and causal explanation identification (identifying the specific phrase that is the explanation). We achieve strong accuracies for both tasks but find different approaches best: an SVM for causality prediction (F1 = 0.791) and a hierarchy of Bidirectional LSTMs for causal explanation identification (F1 = 0.853). Finally, we explore applications of our complete pipeline (F1 = 0.868), showing demographic differences in mentions of causal explanation and that the association between a word and sentiment can change when it is used within a causal explanation.
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
Graph-based Deep-Tree Recursive Neural Network (DTRNN) for Text Classification
A novel graph-to-tree conversion mechanism called the deep-tree generation (DTG) algorithm is first proposed to predict text data represented by graphs. The DTG method can generate a richer and more accurate representation for nodes (or vertices) in graphs. It adds flexibility in exploring the vertex neighborhood information to better reflect the second order proximity and homophily equivalence in a graph. Then, a Deep-Tree Recursive Neural Network (DTRNN) method is presented and used to classify vertices that contains text data in graphs. To demonstrate the effectiveness of the DTRNN method, we apply it to three real-world graph datasets and show that the DTRNN method outperforms several state-of-the-art benchmarking methods.
Compositional Stochastic Average Gradient for Machine Learning and Related Applications
Many machine learning, statistical inference, and portfolio optimization problems require minimization of a composition of expected value functions (CEVF). Of particular interest is the finite-sum versions of such compositional optimization problems (FS-CEVF). Compositional stochastic variance reduced gradient (C-SVRG) methods that combine stochastic compositional gradient descent (SCGD) and stochastic variance reduced gradient descent (SVRG) methods are the state-of-the-art methods for FS-CEVF problems. We introduce compositional stochastic average gradient descent (C-SAG) a novel extension of the stochastic average gradient method (SAG) to minimize composition of finite-sum functions. C-SAG, like SAG, estimates gradient by incorporating memory of previous gradient information. We present theoretical analyses of C-SAG which show that C-SAG, like SAG, and C-SVRG, achieves a linear convergence rate when the objective function is strongly convex; However, C-CAG achieves lower oracle query complexity per iteration than C-SVRG. Finally, we present results of experiments showing that C-SAG converges substantially faster than full gradient (FG), as well as C-SVRG.
• Compound Poisson approximation for random fields with application to sequence alignment• A Local Lemma for Focused Stochastic Algorithms• Edit Errors with Block Transpositions: Deterministic Document Exchange Protocols and Almost Optimal Binary Codes• emrQA: A Large Corpus for Question Answering on Electronic Medical Records• An Optimal $χ$-Bound for ($P_6$, diamond)-Free Graphs• ‘Read My Lips’: Using Automatic Text Analysis to Classify Politicians by Party and Ideology• Exhaustive generation for permutations avoiding a (colored) regular sets of patterns• Vandermonde Factorization of Hankel Matrix for Complex Exponential Signal Recovery — Application in Fast NMR Spectroscopy• Information Signal Design for Incentivizing Team Formation• Analysis for the Slow Convergence in Arimoto Algorithm• End-to-end Multimodal Emotion and Gender Recognition with Dynamic Weights of Joint Loss• Bounding the number of self-avoiding walks: Hammersley-Welsh with polygon insertion• Adaptive Douglas-Rachford Splitting Algorithm for the Sum of Two Operators• Spatial-Spectral Fusion by Combining Deep Learning and Variation Model• A note on heat kernel estimates, resistance bounds and Poincaré inequality• Robust Iris Segmentation Based on Fully Convolutional Networks and Generative Adversarial Networks• Transferring Deep Reinforcement Learning with Adversarial Objective and Augmentation• Sequence-to-Action: End-to-End Semantic Graph Generation for Semantic Parsing• Bounds on the edge-Wiener index of cacti with $n$ vertices and $t$ cycles• Stretched exponential decay of correlations in the quasiperiodic continuum percolation model• PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track• Cycle Ramsey numbers for random graphs• Matrix Infinitely Divisible Series: Tail Inequalities and Applications in Optimization• Lipschitz Networks and Distributional Robustness• Complexity Reduction for Systems of Interacting Orientable Agents: Beyond The Kuramoto Model• Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction• Adelic Extension Classes, Atiyah Bundles and Non-Commutative Codes• A comparative study of top-k high utility itemset mining methods• Sion’s mini-max theorem and Nash equilibrium in a multi-players game with two groups which is zero-sum and symmetric in each group• Nash equilibrium in asymmetric multi-players zero-sum game with two strategic variables and only one alien• Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions• Constructing a solution of the $(2+1)$-dimensional KPZ equation• Equalization with Expectation Propagation at Smoothing Level• A Novel A Priori Simulation Algorithm for Absorbing Receivers in Diffusion-Based Molecular Communication Systems• A Deep Learning Spatiotemporal Prediction Framework for Mobile Crowdsourced Services• RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes• Counterexamples to a conjecture of Las Vergnas• Multiplicative random cascades with additional stochastic process in financial markets• High-dimensional varying index coefficient quantile regression model• Multi-species neutron transport equation• Improving the Expressiveness of Deep Learning Frameworks with Recursion• Bounded Rational Decision-Making with Adaptive Neural Network Priors• Metabolize Neural Network• Framework for Discrete Rate Transmission in Buffer-Aided Underlay CRN With Direct Path• Existence, uniqueness and stability of semi-linear rough partial differential equations• Music Sequence Prediction with Mixture Hidden Markov Models• Multi-target Unsupervised Domain Adaptation without Exactly Shared Categories• Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images• Stabilization of port-Hamiltonian systems with discontinuous energy densities• Non-monotonic Reasoning in Deductive Argumentation• Unveiling co-evolutionary patterns in systems of cities: a systematic exploration of the SimpopNet model• Handwriting styles: benchmarks and evaluation metrics• Algebraic matroids in action• Private Information Retrieval From a Cellular Network With Caching at the Edge• On the predictive power of database classifiers formed by a small network of interacting chemical oscillators• Automated bird sound recognition in realistic settings• An elementary proof of de Finetti’s Theorem• MesoNet: a Compact Facial Video Forgery Detection Network• A note on the spectra of some subgraphs of the hypercube• Improving full waveform inversion by wavefield reconstruction with the alternating direction method of multipliers• A Simple and Practical Concurrent Non-blocking Unbounded Graph with Reachability Queries• Image Reassembly Combining Deep Learning and Shortest Path Problem• Parity Crowdsourcing for Cooperative Labeling• Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation• Bangla License Plate Recognition Using Convolutional Neural Networks (CNN)• Treewidth of display graphs: bounds, brambles and applications• Optimal Distributed and Tangential Boundary Control for the Unsteady Stochastic Stokes Equations• OCNet: Object Context Network for Scene Parsing• Segmentation-free compositional $n$-gram embedding• Trees and linear anticomplete pairs• Proof of a Conjecture of Galvin• Planar graphs without cycles of lengths 4 and 5 and close triangles are DP-3-colorable• Lifted Projective Reed-Solomon Codes• Faster Balanced Clusterings in High Dimension• Improving generalization of vocal tract feature reconstruction: from augmented acoustic inversion to articulatory feature reconstruction without articulatory data• From Möbius inversion to renormalisation• Noisy Voronoi: a Simple Framework for Terminal-Clustering Problems• Compressive Hyperspectral Imaging: Fourier Transform Interferometry meets Single Pixel Camera• A class of orders with linear? time sorting algorithm• Cone valuations, Gram’s relation, and flag-angles• How to model fake news• Energy-Efficient Mobile-Edge Computation Offloading for Applications with Shared Data• A simplified proof of weak convergence in Douglas-Rachford method to a solution of the unnderlying inclusion problem• Equivalence of approximation by convolutional neural networks and fully-connected networks• Optimal Reinsurance for Gerber-Shiu Functions in the Cramer-Lundberg Model• Étude de l’informativité des transcriptions : une approche basée sur le résumé automatique• Several classes of optimal Ferrers diagram rank-metric codes• Aesthetic Discrimination of Graph Layouts• Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization• Computing optimal discrete readout weights in reservoir computing is NP-hard• A Neural Network Aided Approach for LDPC Coded DCO-OFDM with Clipping Distortion• Toric degenerations of Grassmannians from matching fields• Determining the Number of Communities in Degree-corrected Stochastic Block Models• Using SIMD and SIMT vectorization to evaluate sparse chemical kinetic Jacobian matrices and thermochemical source terms• Saving Lives at Sea with UAV-assisted Wireless Networks• A Roadmap for the Value-Loading Problem• Shape-Enforcing Operators for Point and Interval Estimators• Iris recognition in cases of eye pathology• The Effect of Context on Metaphor Paraphrase Aptness Judgments• Guaranteed simulation error bounds for linear time invariant systems identified from data• The Effect of Time Delay on the Average Data Rate and Performance in Networked Control Systems• A Novel Neural Sequence Model with Multiple Attentions for Word Sense Disambiguation• Reasoning in Bayesian Opinion Exchange Networks Is PSPACE-Hard• Energy Efficient Resource Allocation for Mobile-Edge Computation Networks with NOMA• A Quantum Spatial Graph Convolutional Neural Network using Quantum Passing Information• Scaling limits of discrete optimal transport• Adversarial Attacks on Node Embeddings• Accelerating Beam Sweeping in mmWave Standalone 5G New Radios using Recurrent Neural Networks• Distributed Nonconvex Constrained Optimization over Time-Varying Digraphs• Localization of Neumann Eigenfunctions near Irregular Boundaries• Text2Scene: Generating Abstract Scenes from Textual Descriptions• A note on the tight example in On the randomised query complexity of composition• VideoMatch: Matching based Video Object Segmentation• Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering• Unsupervised Video Object Segmentation using Motion Saliency-Guided Spatio-Temporal Propagation• ‘This is why we play’: Characterizing Online Fan Communities of the NBA Teams• Quantifier-free description of the solutions set of the generalized interval-quantifier system of linear equations• Challenges of capturing engagement on Facebook for Altmetrics• On the minimal displacement vector of compositions and convex combinations of nonexpansive mappings• Random Language Model: a path to principled complexity• SOS lower bounds with hard constraints: think global, act local• Hybrid Master Equation for Jump-Diffusion Approximation of Biomolecular Reaction Networks• A Primal-Dual Quasi-Newton Method for Exact Consensus Optimization• The Saddle Point Problem of Polynomials• Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions• Testing for exponentiality for stationary associated random variables
Like this:
Like Loading…
Related