Whats new on arXiv

Rethinking Numerical Representations for Deep Neural Networks

With ever-increasing computational demand for deep learning, it is critical to investigate the implications of the numeric representation and precision of DNN model weights and activations on computational efficiency. In this work, we explore unconventional narrow-precision floating-point representations as it relates to inference accuracy and efficiency to steer the improved design of future DNN platforms. We show that inference using these custom numeric representations on production-grade DNNs, including GoogLeNet and VGG, achieves an average speedup of 7.6x with less than 1% degradation in inference accuracy relative to a state-of-the-art baseline platform representing the most sophisticated hardware using single-precision floating point. To facilitate the use of such customized precision, we also present a novel technique that drastically reduces the time required to derive the optimal precision configuration.

Efficient and Effective $L_0$ Feature Selection

Because of continuous advances in mathematical programing, Mix Integer Optimization has become a competitive vis-a-vis popular regularization method for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. We tackle these challenges, reducing computational burden when tuning the sparsity bound (a parameter which is critical for effectiveness) and improving performance in the presence of feature collinearity and of signals that vary in nature and strength. Importantly, we render the approach efficient and effective in applications of realistic size and complexity – without resorting to relaxations or heuristics in the optimization, or abandoning rigorous cross-validation tuning. Computational viability and improved performance in subtler scenarios is achieved with a multi-pronged blueprint, leveraging characteristics of the Mixed Integer Programming framework and by means of whitening, a data pre-processing step.

Image Anomalies: a Review and Synthesis of Detection Methods

We review the broad variety of methods that have been proposed for anomaly detection in images. Most methods found in the literature have in mind a particular application. Yet we show that the methods can be classified mainly by the structural assumption they make on the ‘normal’ image. Five different structural assumptions emerge. Our analysis leads us to reformulate the best representative algorithms by attaching to them an a contrario detection that controls the number of false positives and thus derive universal detection thresholds. By combining the most general structural assumptions expressing the background’s normality with the best proposed statistical detection tools, we end up proposing generic algorithms that seem to generalize or reconcile most methods. We compare the six best representatives of our proposed classes of algorithms on anomalous images taken from classic papers on the subject, and on a synthetic database. Our conclusion is that it is possible to perform automatic anomaly detection on a single image.

Parallax: Automatic Data-Parallel Training of Deep Neural Networks

The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in machine learning (ML). ML frameworks, such as TensorFlow, MXNet, and Caffe2, have emerged to assist ML researchers to train their models in a distributed fashion. However, correctly and efficiently utilizing multiple machines and GPUs is still not a straightforward task for framework users due to the non-trivial correctness and performance challenges that arise in the distribution process. This paper introduces Parallax, a tool for automatic parallelization of deep learning training in distributed environments. Parallax not only handles the subtle correctness issues, but also leverages various optimizations to minimize the communication overhead caused by scaling out. Experiments show that Parallax built atop TensorFlow achieves scalable training throughput on multiple CNN and RNN models, while requiring little effort from its users.

Debugging Neural Machine Translations

Can Network Analysis Techniques help to Predict Design Dependencies? An Initial Study

The degree of dependencies among the modules of a software system is a key attribute to characterize its design structure and its ability to evolve over time. Several design problems are often correlated with undesired dependencies among modules. Being able to anticipate those problems is important for developers, so they can plan early for maintenance and refactoring efforts. However, existing tools are limited to detecting undesired dependencies once they appeared in the system. In this work, we investigate whether module dependencies can be predicted (before they actually appear). Since the module structure can be regarded as a network, i.e, a dependency graph, we leverage on network features to analyze the dynamics of such a structure. In particular, we apply link prediction techniques for this task. We conducted an evaluation on two Java projects across several versions, using link prediction and machine learning techniques, and assessed their performance for identifying new dependencies from a project version to the next one. The results, although preliminary, show that the link prediction approach is feasible for package dependencies. Also, this work opens opportunities for further development of software-specific strategies for dependency prediction.

• A simple analysis of flying capacitor converter• BayesReef: A Bayesian inference framework for modelling reef growth in response to environmental change and biological dynamics• Improved survival of cancer patients admitted to the ICU between 2002 and 2011 at a U.S. teaching hospital• Morphology of renormalization-group flow for the de Almeida-Thouless-Gardner universality class• A generalized scheme for BSDEs based on derivative approximation and its error estimates• A Class of Multirate Infinitesimal GARK Methods• Effective Character-augmented Word Embedding for Machine Reading Comprehension• eQASM: An Executable Quantum Instruction Set Architecture• Device-directed Utterance Detection• Message Passing Graph Kernels• Bipartite induced density in triangle-free graphs• Quantum Lyapunov control with machine learning• Width-Independence Beyond Linear Objectives: Distributed Fair Packing and Covering Algorithms• Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning• Spectral Efficiency Analysis of the Decoupled Access for Downlink and Uplink in Two Tier Network• Student Log-Data from a Randomized Evaluation of Educational Technology: A Causal Case Study• Circular critical exponents for Thue-Morse factors• Randomized sketch descent methods for non-separable linearly constrained optimization• SchiNet: Automatic Estimation of Symptoms of Schizophrenia from Facial Behaviour Analysis• Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection• Light-stimulable molecules/nanoparticles networks for switchable logical functions and reservoir computing• Efficient Multi-Robot Coverage of a Known Environment• Free energy of bipartite Sherrington-Kirkpatrick model• A Randomized Block Proximal Variable Sample-size Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization• Persistent Monitoring of Dynamically Changing Environments Using an Unmanned Vehicle• Parallel and Streaming Algorithms for K-Core Decomposition• Collaborative Planning for Mixed-Autonomy Lane Merging• On the Dimension of Unimodular Discrete Spaces, Part II: Relations with Growth Rate• Multi-robot Dubins Coverage with Autonomous Surface Vehicles• Deep context: end-to-end contextual speech recognition• A Joint Sequence Fusion Model for Video Question Answering and Retrieval• Belief likelihood function for generalised logistic regression• Description of closure operators in convex geometries of segments on a line• Design Challenges in Named Entity Transliteration• Good $r$-divisions Imply Optimal Amortised Decremental Biconnectivity• Machine Learning for Dynamic Models of Imperfect Information and Semiparametric Moment Inequalities• Outage Probability of the EH-based Full-Duplex AF and DF Relaying Systems in α-μEnvironment• Vertex-isoperimetric stability in the hypercube• The commuting complex of the symmetric group with bounded number of $p$-cycles• A Centralized Metropolitan-Scale Radio Resource Management Scheme• Reachability Analysis Using Dissipation Inequalities For Nonlinear Dynamical Systems• Adversarial Domain Adaptation for Variational Neural Language Generation in Dialogue Systems• The existence of square non-integer Heffter arrays• A Tutorial on Network Embeddings• A practical Single Source Shortest Path algorithm for random directed graphs with arbitrary weight in expecting linear time• The dynamical sine-Gordon model in the full subcritical regime• A Semi-Supervised Data Augmentation Approach using 3D Graphical Engines• PIVETed-Granite: Computational Phenotypes through Constrained Tensor Factorization• Unsupervised/Semi-supervised Deep Learning for Low-dose CT Enhancement• End-to-end Speech Recognition with Word-based RNN Language Models• L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data• Power domination in regular claw-free graphs• Learning to Write Notes in Electronic Health Records• Randomized box-ball systems, limit shape of rigged configurations and Thermodynamic Bethe ansatz• Training Compact Neural Networks with Binary Weights and Low Precision Activations• Question-Guided Hybrid Convolution for Visual Question Answering• Courteous Autonomous Cars• Reconciliation of probabilistic forecasts with an application to wind power• Cognitive system to achieve human-level accuracy in automated assignment of helpdesk email tickets• Social Community-Aware Content Placement in Wireless Device-to-Device Communication Networks• Accelerating wave-propagation algorithms with adaptive mesh refinement using the Graphics Processing Unit (GPU)• A Unified Framework for Testing High Dimensional Parameters: A Data-Adaptive Approach• Adversarial Geometry and Lighting using a Differentiable Renderer• Permutation patterns in genome rearrangement problems• Connected $k$-factors in bipartite graphs• Analogies of the Qi formula for some Dowling type numbers• An Occam’s Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets• Testing heteroscedasticity for regression models based on projections• Modified box dimension of trees and hierarchical scale-free graphs• The scaling limit of the $(\nabla+Δ)$-model• Age of Information Upon Decisions• On a mixture of Brenier and Strassen theorems• An Improved Bound for Weak Epsilon-Nets in the Plane• On the Monitoring of Decentralized Specifications Semantics, Properties, Analysis, and Simulation• The roll call interpretation of the Shapley value• New lower bounds on the size of arcs and new optimal projective linear codes• A Method for Estimating the Probability of Extremely Rare Accidents in Complex Systems• Memetic Algorithm-Based Path Generation for Multiple Dubins Vehicles Performing Remote Tasks• Asymptotics of maximum likelihood estimators based on Markov chain Monte Carlo methods• Learning to Focus when Ranking Answers• Limiting properties of random graph models with vertex and edge weights• On the convergence of closed-loop Nash equilibria to the mean field game limit• Natural Language Generation by Hierarchical Decoding with Linguistic Patterns• On lexicographic representatives in braid monoids• Omnidirectional DSO: Direct Sparse Odometry with Fisheye Cameras• Cache Aided Communications with Multiple Antennas at Finite SNR• Schools are segregated by educational outcomes in the digital space• On the Number of Acyclic Orientations of Complete $k$-Partite Graphs• Multiband SAS Imagery• Weighted models for level statistics across the many–body localization transition• Steiner Point Removal with distortion $O(\log k)$, using the Noisy-Voronoi algorithm• Extremal Norms for Fiber Bunched Cocycles• Highly Accelerated Multishot EPI through Synergistic Combination of Machine Learning and Joint Reconstruction• Separators for Planar Graphs that are Almost Trees• A Kernel Method for Positive 1-in-3-SAT• Backprop Evolution• Joint Frequency Reuse and Cache Optimization in Backhaul-Limited Small-Cell Wireless Networks• FLUX: Progressive State Estimation Based on Zakai-type Distributed Ordinary Differential Equations• Debunking Fake News One Feature at a Time• A Novel Disparity Transformation Algorithm for Road Segmentation• On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers• Pattern Recognition Approach to Violin Shapes of MIMO database• Relaxing and Restraining Queries for OBDA• Exotic matrix models: the Albert algebra and the spin factor• On the Solvability of Viewing Graphs• Hard to Solve Instances of the Euclidean Traveling Salesman Problem• Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance• Additional Representations for Improving Synthetic Aperture Sonar Classification Using Convolutional Neural Networks• Parkinson’s Disease Assessment from a Wrist-Worn Wearable Sensor in Free-Living Conditions: Deep Ensemble Learning and Visualization• Random directions stochastic approximation with deterministic perturbations• Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer’s Disease

Like this:

Like Loading…

Related