DeepTracker: Visualizing the Training Process of Convolutional Neural Networks
Deep convolutional neural networks (CNNs) have achieved remarkable success in various fields. However, training an excellent CNN is practically a trial-and-error process that consumes a tremendous amount of time and computer resources. To accelerate the training process and reduce the number of trials, experts need to understand what has occurred in the training process and why the resulting CNN behaves as such. However, current popular training platforms, such as TensorFlow, only provide very little and general information, such as training/validation errors, which is far from enough to serve this purpose. To bridge this gap and help domain experts with their training tasks in a practical environment, we propose a visual analytics system, DeepTracker, to facilitate the exploration of the rich dynamics of CNN training processes and to identify the unusual patterns that are hidden behind the huge amount of training log. Specifically,we combine a hierarchical index mechanism and a set of hierarchical small multiples to help experts explore the entire training log from different levels of detail. We also introduce a novel cube-style visualization to reveal the complex correlations among multiple types of heterogeneous training data including neuron weights, validation images, and training iterations. Three case studies are conducted to demonstrate how DeepTracker provides its users with valuable knowledge in an industry-level CNN training process, namely in our case, training ResNet-50 on the ImageNet dataset. We show that our method can be easily applied to other state-of-the-art ‘very deep’ CNN models.
Spectral-Pruning: Compressing deep neural network via spectral analysis
The model size of deep neural network is getting larger and larger to realize superior performance in complicated tasks. This makes it difficult to implement deep neural network in small edge-computing devices. To overcome this problem, model compression methods have been gathering much attention. However, there have been only few theoretical back-grounds that explain what kind of quantity determines the compression ability. To resolve this issue, we develop a new theoretical frame-work for model compression, and propose a new method called {\it Spectral-Pruning} based on the theory. Our theoretical analysis is based on the observation such that the eigenvalues of the covariance matrix of the output from nodes in the internal layers often shows rapid decay. We define ‘degree of freedom’ to quantify an intrinsic dimensionality of the model by using the eigenvalue distribution and show that the compression ability is essentially controlled by this quantity. Along with this, we give a generalization error bound of the compressed model. Our proposed method is applicable to wide range of models, unlike the existing methods, e.g., ones possess complicated branches as implemented in SegNet and ResNet. Our method makes use of both ‘input’ and ‘output’ in each layer and is easy to implement. We apply our method to several datasets to justify our theoretical analyses and show that the proposed method achieves the state-of-the-art performance.
Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification
We propose a novel model for multi-label text classification, which is based on sequence-to-sequence learning. The model generates higher-level semantic unit representations with multi-level dilated convolution as well as a corresponding hybrid attention mechanism that extracts both the information at the word-level and the level of the semantic unit. Our designed dilated convolution effectively reduces dimension and supports an exponential expansion of receptive fields without loss of local information, and the attention-over-attention mechanism is able to capture more summary relevant information from the source context. Results of our experiments show that the proposed model has significant advantages over the baseline models on the dataset RCV1-V2 and Ren-CECps, and our analysis demonstrates that our model is competitive to the deterministic hierarchical models and it is more robust to classifying low-frequency labels.
Semi-Autoregressive Neural Machine Translation
Evolutionary dynamics of cryptocurrency transaction networks: An empirical study
Cryptocurrency is a well-developed blockchain technology application that is currently a heated topic throughout the world. The public availability of transaction histories offers an opportunity to analyze and compare different cryptocurrencies. In this paper, we present a dynamic network analysis of three representative blockchain-based cryptocurrencies: Bitcoin, Ethereum, and Namecoin. By analyzing the accumulated network growth, we find that, unlike most other networks, these cryptocurrency networks do not always densify over time, and they are changing all the time with relatively low node and edge repetition ratios. Therefore, we then construct separate networks on a monthly basis, trace the changes of typical network characteristics (including degree distribution, degree assortativity, clustering coefficient, and the largest connected component) over time, and compare the three. We find that the degree distribution of these monthly transaction networks cannot be well fitted by the famous power-law distribution, at the same time, different currency still has different network properties, e.g., both Bitcoin and Ethereum networks are heavy-tailed with disassortative mixing, however, only the former can be treated as a small world. These network properties reflect the evolutionary characteristics and competitive power of these three cryptocurrencies and provide a foundation for future research.
Deep Learning: Computational Aspects
In this article we review computational aspects of Deep Learning (DL). Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models. Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference. Stochastic gradient descent (SGD) optimization and batch sampling are used to learn from massive data sets.
Detecting Outliers in Data with Correlated Measures
Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.
Predicting Semantic Relations using Global Graph Properties
Semantic graphs, such as WordNet, are resources which curate natural language on two distinguishable layers. On the local level, individual relations between synsets (semantic building blocks) such as hypernymy and meronymy enhance our understanding of the words used to express their meanings. Globally, analysis of graph-theoretic properties of the entire net sheds light on the structure of human language as a whole. In this paper, we combine global and local properties of semantic graphs through the framework of Max-Margin Markov Graph Models (M3GM), a novel extension of Exponential Random Graph Model (ERGM) that scales to large multi-relational graphs. We demonstrate how such global modeling improves performance on the local task of predicting semantic relations between synsets, yielding new state-of-the-art results on the WN18RR dataset, a challenging version of WordNet link prediction in which ‘easy’ reciprocal cases are removed. In addition, the M3GM model identifies multirelational motifs that are characteristic of well-formed lexical semantic ontologies.
Predefined Sparseness in Recurrent Sequence Models
Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.
Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
We introduce extreme summarization, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question ‘What is the article about?’. We collect a real-world, large-scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.
What Makes Natural Scene Memorable?
Recent studies on image memorability have shed light on the visual features that make generic images, object images or face photographs memorable. However, a clear understanding and reliable estimation of natural scene memorability remain elusive. In this paper, we provide an attempt to answer: ‘what exactly makes natural scene memorable’. Specifically, we first build LNSIM, a large-scale natural scene image memorability database (containing 2,632 images and memorability annotations). Then, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of natural scene. In particular, we find that high-level feature of scene category is rather correlated with natural scene memorability. Thus, we propose a deep neural network based natural scene memorability (DeepNSM) predictor, which takes advantage of scene category. Finally, the experimental results validate the effectiveness of DeepNSM.
Adaptive Structural Learning of Deep Belief Network for Medical Examination Data and Its Knowledge Extraction by using C4.5
Deep Learning has a hierarchical network architecture to represent the complicated feature of input patterns. The adaptive structural learning method of Deep Belief Network (DBN) has been developed. The method can discover an optimal number of hidden neurons for given input data in a Restricted Boltzmann Machine (RBM) by neuron generation-annihilation algorithm, and generate a new hidden layer in DBN by the extension of the algorithm. In this paper, the proposed adaptive structural learning of DBN was applied to the comprehensive medical examination data for the cancer prediction. The prediction system shows higher classification accuracy (99.8% for training and 95.5% for test) than the traditional DBN. Moreover, the explicit knowledge with respect to the relation between input and output patterns was extracted from the trained DBN network by C4.5. Some characteristics extracted in the form of IF-THEN rules to find an initial cancer at the early stage were reported in this paper.
Dynamical systems theory for causal inference with application to synthetic control methods
To estimate treatment effects in panel data, suitable control units need to be selected to generate counterfactual outcomes. To guard against cherry-picking of potential controls, which is an important concern in practice, we leverage results from dynamical systems theory. Specifically, key results on delay embeddings in dynamical systems~\citep{Takens1981} show that under fairly general assumptions a dynamical system can be reconstructed up to a one-to-one mapping from scalar observations of the system. This suggests a quantified measure of strength of the dynamical relationship between any two time series variables. The key idea in this paper is to use this measure to ensure that selected control units are dynamically related to treated units, and thus guard against cherry-picking of controls. We illustrate our approach on the synthetic control methodology of~\citet{Abadie2003}, which generates counterfactuals using a model of treated unit outcomes fitted on outcomes from control units. In this setting, we propose to screen out control units that have a weak dynamical relationship to the single treated unit before the model is fit. In simulated studies, we show that the standard synthetic control methodology can be biased towards any desirable direction by adversarially creating artificial control units, but the bias is largely mitigated if we apply the aforementioned screening. In real-world applications, the proposed approach contributes to more reliable control selection, and thus more robust estimation of treatment effects.
A new Taxonomy of Continuous Global Optimization Algorithms
Surrogate-based optimization and nature-inspired metaheuristics have become the state-of-the-art in solving real-world optimization problems. Still, it is difficult for beginners and even experts to get an overview that explains their advantages in comparison to the large number of available methods in the scope of continuous optimization. Available taxonomies lack the integration of surrogate-based approaches and thus their embedding in the larger context of this broad field. This article presents a taxonomy of the field, which further matches the idea of nature-inspired algorithms, as it is based on the human behavior in path finding. Intuitive analogies make it easy to conceive the most basic principles of the search algorithms, even for beginners and non-experts in this area of research. However, this scheme does not oversimplify the high complexity of the different algorithms, as the class identifier only defines a descriptive meta-level of the algorithm search strategies. The taxonomy was established by exploring and matching algorithm schemes, extracting similarities and differences, and creating a set of classification indicators to distinguish between five distinct classes. In practice, this taxonomy allows recommendations for the applicability of the corresponding algorithms and helps developers trying to create or improve their own algorithms.
A Study of Reinforcement Learning for Neural Machine Translation
Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system. However, due to its instability, successfully RL training is challenging, especially in real-world systems where deep models and large datasets are leveraged. In this paper, taking several large-scale translation tasks as testbeds, we conduct a systematic study on how to train better NMT models using reinforcement learning. We provide a comprehensive comparison of several important factors (e.g., baseline reward, reward shaping) in RL training. Furthermore, to fill in the gap that it remains unclear whether RL is still beneficial when monolingual data is used, we propose a new method to leverage RL to further boost the performance of NMT systems trained with source/target monolingual data. By integrating all our findings, we obtain competitive results on WMT14 English- German, WMT17 English-Chinese, and WMT17 Chinese-English translation tasks, especially setting a state-of-the-art performance on WMT17 Chinese-English translation task.
Piecewise Linear Approximation in Data Streaming: Algorithmic Implementations and Experimental Analysis
Piecewise Linear Approximation (PLA) is a well-established tool to reduce the size of the representation of time series by approximating the series by a sequence of line segments while keeping the error introduced by the approximation within some predetermined threshold. With the recent rise of edge computing, PLA algorithms find a complete new set of applications with the emphasis on reducing the volume of streamed data. In this study, we identify two scenarios set in a data-stream processing context: data reduction in sensor transmissions and datacenter storage. In connection to those scenarios, we identify several streaming metrics and propose streaming protocols as algorithmic implementations of the state of the art PLA techniques. In an experimental evaluation, we measure the quality of the reviewed meth- ods and protocols and evaluate their performance against those streaming statistics. All known methods have defi- ciencies when it comes to handling streaming-like data, e.g. inflation of the input stream, high latency or poor aver- age error. Our experimental results highlight the challenges raised when transferring those classical methods into the stream processing world and present alternative techniques to overcome them and balance the related trade-offs.
Extracting Sentiment Attitudes From Analytical Texts
In this paper we present the RuSentRel corpus including analytical texts in the sphere of international relations. For each document we annotated sentiments from the author to mentioned named entities, and sentiments of relations between mentioned entities. In the current experiments, we considered the problem of extracting sentiment relations between entities for the whole documents as a three-class machine learning task. We experimented with conventional machine-learning methods (Naive Bayes, SVM, Random Forest).
• Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum• Scale Drift Correction of Camera Geo-Localization using Geo-Tagged Images• Rain Streak Removal for Single Image via Kernel Guided CNN• Whitney’s Theorem, Triangular Sets and Probabilistic Descent on Manifolds• Doubly Robust Sure Screening for Elliptical Copula Regression Model• Hamilton cycles in vertex-transitive graphs of order a product of two primes• On the joint distribution of the marginals of multipartite random quantum states• Convolutional Neural Networks for Aerial Vehicle Detection and Recognition• Analyzing Learned Representations of a Deep ASR Performance Prediction Model• Malliavin regularity and weak approximation of semilinear SPDE with Lévy noise• Title-Guided Encoding for Keyphrase Generation• Automatic 3D bi-ventricular segmentation of cardiac images by a shape-constrained multi-task deep learning approach• Vector Approximate Message Passing Algorithm for Structured Perturbed Sensing Matrix• Hypercoercivity of Piecewise Deterministic Markov Process-Monte Carlo• Asymptotically good edge correspondence colouring• CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering• Label and Sample: Efficient Training of Vehicle Object Detector from Sparsely Labeled Data• A Perspective on Unique Information: Directionality, Intuitions, and Secret Key Agreement• Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge• Single Image Dehazing Based on Generic Regularity• Ensemble Learning Applied to Classify GPS Trajectories of Birds into Male or Female• Online Human Activity Recognition using Low-Power Wearable Devices• Autonomous Driving without a Burden: View from Outside with Elevated LiDAR• Discriminative but Not Discriminatory: A Comparison of Fairness Definitions under Different Worldviews• Semi-Supervised Event Extraction with Paraphrase Clusters• Bayesian inference for a single factor copula stochastic volatility model using Hamiltonian Monte Carlo• Identifying Domain Adjacent Instances for Semantic Parsers• Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation• Secrecy Performance Analysis of UAV Transmissions Subject to Eavesdropping and Jamming• Rule Module Inheritance with Modification Restrictions• Strong and Weak Optimizations in Classical and Quantum Models of Stochastic Processes• Scientific Relation Extraction with Selectively Incorporated Concept Embeddings• The Disparate Effects of Strategic Classification• Localized solar power prediction based on weather data from local history and global forecasts• Fast Super-resolution 3D SAR Imaging Using an Unfolded Deep Network• Novel Time Asynchronous NOMA schemes for Downlink Transmissions• Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions• Approach for Video Classification with Multi-label on YouTube-8M Dataset• IIIDYT at IEST 2018: Implicit Emotion Classification With Deep Contextualized Word Representations• Exploring the Applications of Faster R-CNN and Single-Shot Multi-box Detection in a Smart Nursery Domain• Tableau Correspondences and Representation Theory• Regression Adjustments for Estimating the Global Treatment Effect in Experiments with Interference• HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion• Empirical Analysis of Common Subgraph Isomorphism Approaches to the Lost-in-Space Star Identification Problem• Deeply Supervised Depth Map Super-Resolution as Novel View Synthesis• Stereo Computation for a Single Mixture Image• Explicit 3-colorings for exponential graphs• Generalized Capsule Networks with Trainable Routing Procedure• Augmenting Bottleneck Features of Deep Neural Network Employing Motor State for Speech Recognition at Humanoid Robots• Generating Text through Adversarial Training using Skip-Thought Vectors• Is the Sibuya distribution a progeny?• Bisplit graphs satisfy the Chen-Chvátal conjecture• Harnack Inequality and Applications for SDEs Driven by $G$-Brownian motion• Wide Activation for Efficient and Accurate Image Super-Resolution• On determination of Zero-sum $\ell$-generalized Schur Numbers for some linear equations• simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions• Stars of Empty Simplices• Human migration patterns in large scale spatial with the resume data• Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension• Automorphisms of Kronrod-Reeb graphs of Morse functions on compact surfaces• Hadamard full propelinear codes with associated group $C_{2t}\times C_2$; rank and kernel• Generalisation in humans and deep neural networks• Learning from Positive and Unlabeled Data under the Selected At Random Assumption• Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture• On the convergence of optimistic policy iteration for stochastic shortest path problem• Intrinsic wavelet regression for surfaces of Hermitian positive definite matrices• Identifiability of Low-Rank Sparse Component Analysis• Learning behavioral context recognition with multi-stream temporal convolutional networks• Solving Partition Problems Almost Always Requires Pushing Many Vertices Around• Learning Multilingual Word Embeddings in a Latent Metric Space: A Geometric Approach• Field Formulation of Parzen Data Analysis• Deep Stochastic Attraction and Repulsion Embedding for Image Based Localization• Improving Cross-Lingual Word Embeddings by Meeting in the Middle• Amobee at IEST 2018: Transfer Learning from Language Models• Sparsity in Deep Neural Networks – An Empirical Investigation with TensorQuant• Transparent Tx and Rx Waveform Processing for 5G New Radio Mobile Communications• A Directed Information Learning Framework for Event-Driven M2M Traffic Prediction• SPULTRA: Low-Dose CT Image Reconstruction with Joint Statistical and Learned Image Models• Empirical likelihood for linear models with spatial errors• An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation• Theoretical Foundations of the A2RD Project: Part I• The Martin Gardner Polytopes• Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems• Discriminative Representation Combinations for Accurate Face Spoofing Detection• Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions• Exponential inequalities for nonstationary Markov Chains• The Complexity of Student-Project-Resource Matching-Allocation Problems• A Monotone Preservation Result for Boolean Queries Expressed as a Containment of Conjunctive Queries• Gradient-based Training of Slow Feature Analysis by Differentiable Approximate Whitening• Real-Time MDNet• A strong baseline for question relevancy ranking• Analysis of temporal properties of wind extremes• WiSeBE: Window-based Sentence Boundary Evaluation• Multi-operator spectrum sharing using matching game in small cells network• Binary additive MRD codes with minimum distance n-1 must contain a semifield spread set• Central limit theorems for non-symmetric random walks on nilpotent covering graphs: Part II• Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised• Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation• Facial Information Recovery from Heavily Damaged Images using Generative Adversarial Network- PART 1• An exactly solvable record model for rainfall• BézierGAN: Automatic Generation of Smooth Curves from Interpretable Low-Dimensional Parameters• A note on palindromic length of Sturmian sequences• Improved Breast Mass Segmentation in Mammograms with Conditional Residual U-net• Realizing quantum linear regression with auxiliary qumodes• Which Emoji Talks Best for My Picture?• Random generation under the Ewens distribution• Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems• Phase transition for the interchange and quantum Heisenberg models on the Hamming graph• Fair redistricting is hard• Statistics on Multisets• Communication-Rounds Tradeoffs for Common Randomness and Secret Key Generation• Efficient size estimation and impossibility of termination in uniform dense population protocols• Deep Learning for Stress Field Prediction Using Convolutional Neural Networks• Turning Cliques into Paths to Achieve Planarity• Opportunistic Treating Interference as Noise• Smoothed Dilated Convolutions for Improved Dense Prediction• Unsupervised Multilingual Word Embeddings• Locality of the critical probability for transitive graphs of exponential growth• Improving Information Extraction from Images with Learned Semantic Models• Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures• Dissecting Contextual Word Embeddings: Architecture and Representation
Like this:
Like Loading…
Related