A snapshot on nonstandard supervised learning problems: taxonomy, relationships and methods
Machine learning is a field which studies how machines can alter and adapt their behavior, improving their actions according to the information they are given. This field is subdivided into multiple areas, among which the best known are supervised learning (e.g. classification and regression) and unsupervised learning (e.g. clustering and association rules). Within supervised learning, most studies and research are focused on well known standard tasks, such as binary classification, multiclass classification and regression with one dependent variable. However, there are many other less known problems. These are what we generically call nonstandard supervised learning problems. The literature about them is much more sparse, and each study is directed to a specific task. Therefore, the definitions, relations and applications of this kind of learners are hard to find. The goal of this paper is to provide the reader with a broad view on the distinct variations of nonstandard supervised problems. A comprehensive taxonomy summarizing their traits is proposed. A review of the common approaches followed to accomplish them and their main applications is provided as well.
Feature selection with optimal coordinate ascent (OCA)
In machine learning, Feature Selection (FS) is a major part of efficient algorithm. It fuels the algorithm and is the starting block for our prediction. In this paper, we present a new method, called Optimal Coordinate Ascent (OCA) that allows us selecting features among block and individual features. OCA relies on coordinate ascent to find an optimal solution for gradient boosting methods score (number of correctly classified samples). OCA takes into account the notion of dependencies between variables forming blocks in our optimization. The coordinate ascent optimization solves the issue of the NP hard original problem where the number of combinations rapidly explode making a grid search unfeasible. It reduces considerably the number of iterations changing this NP hard problem into a polynomial search one. OCA brings substantial differences and improvements compared to previous coordinate ascent feature selection method: we group variables into block and individual variables instead of a binary selection. Our initial guess is based on the k-best group variables making our initial point more robust. We also introduced new stopping criteria making our optimization faster. We compare these two methods on our data set. We found that our method outperforms the initial one. We also compare our method to the Recursive Feature Elimination (RFE) method and find that OCA leads to the minimum feature set with the highest score. This is a nice byproduct of our method as it provides empirically the most compact data set with optimal performance.
TEA-DNN: the Quest for Time-Energy-Accuracy Co-optimized Deep Neural Networks
Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing application-specific platforms for CNNs that provide improved inference performance and energy consumption as compared to GPUs. Embedded deep learning platforms differ in the amount of compute resources and memory-access bandwidth, which would affect performance and energy consumption of CNNs. It is therefore critical to consider the available hardware resources in the network architecture search. To this end, we introduce TEA-DNN, a NAS algorithm targeting multi-objective optimization of execution time, energy consumption, and classification accuracy of CNN workloads on embedded architectures. TEA-DNN leverages energy and execution time measurements on embedded hardware when exploring the Pareto-optimal curves across accuracy, execution time, and energy consumption and does not require additional effort to model the underlying hardware. We apply TEA-DNN for image classification on actual embedded platforms (NVIDIA Jetson TX2 and Intel Movidius Neural Compute Stick). We highlight the Pareto-optimal operating points that emphasize the necessity to explicitly consider hardware characteristics in the search process. To the best of our knowledge, this is the most comprehensive study of Pareto-optimal models across a range of hardware platforms using actual measurements on hardware to obtain objective values.
Towards Human-Friendly Referring Expression Generation
This paper addresses the generation of referring expressions that not only refer to objects correctly but also ease human comprehension. As the composition of an image becomes more complicated and a target becomes relatively less salient, identifying referred objects comes more difficult. However, the existing studies regarded all sentences that refer to objects correctly as equally good, ignoring whether they are easily understood by humans. If the target is not salient, humans utilize relationships with the salient contexts around it to help listeners to comprehend it better. To derive these information from human annotations, our model is designed to extract information from the inside and outside of the target. Moreover, we regard that sentences that are easily understood are those that are comprehended correctly and quickly by humans. We optimized it by using the time required to locate the referred objects by humans and their accuracies. To evaluate our system, we created a new referring expression dataset whose images were acquired from Grand Theft Auto V (GTA V), limiting targets to persons. Our proposed method outperformed previous methods both on machine evaluation and on crowd-sourced human evaluation. The source code and dataset will be available soon.
Bootstrapping Deep Neural Networks from Image Processing and Computer Vision Pipelines
Complex image processing and computer vision systems often consist of a ‘pipeline’ of ‘black boxes’ that each solve part of the problem. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased accuracy or reduced computational requirement. To acquire a large amounts of labeled data necessary to train the deep neural network, we propose a workflow that leverages the target pipeline to create a significantly larger labeled training set automatically, without prior domain knowledge of the target pipeline. We show experimentally that despite the noise introduced by automated labeling and only using a very small initially labeled data set, the trained deep neural networks can achieve similar or even better performance than the components they replace, while in some cases also reducing computational requirements.
Non-entailed subsequences as a challenge for natural language inference
Neural network models have shown great success at natural language inference (NLI), the task of determining whether a premise entails a hypothesis. However, recent studies suggest that these models may rely on fallible heuristics rather than deep language understanding. We introduce a challenge set to test whether NLI systems adopt one such heuristic: assuming that a sentence entails all of its subsequences, such as assuming that ‘Alice believes Mary is lying’ entails ‘Alice believes Mary.’ We evaluate several competitive NLI models on this challenge set and find strong evidence that they do rely on the subsequence heuristic.
A Machine-Learning Phase Classification Scheme for Anomaly Detection in Signals with Periodic Characteristics
In this paper we propose a novel machine-learning method for anomaly detection. Focusing on data with periodic characteristics where randomly varying period lengths are explicitly allowed, a multi-dimensional time series analysis is conducted by training a data-adapted classifier consisting of deep convolutional neural networks performing phase classification. The entire algorithm including data pre-processing, period detection, segmentation, and even dynamic adjustment of the neural nets is implemented for a fully automatic execution. The proposed method is evaluated on three example datasets from the areas of cardiology, intrusion detection, and signal processing, presenting reasonable performance.
Scaling up Probabilistic Inference in Linear and Non-Linear Hybrid Domains by Leveraging Knowledge Compilation
Weighted model integration (WMI) extends weighted model counting (WMC) in providing a computational abstraction for probabilistic inference in mixed discrete-continuous domains. WMC has emerged as an assembly language for state-of-the-art reasoning in Bayesian networks, factor graphs, probabilistic programs and probabilistic databases. In this regard, WMI shows immense promise to be much more widely applicable, especially as many real-world applications involve attribute and feature spaces that are continuous and mixed. Nonetheless, state-of-the-art tools for WMI are limited and less mature than their propositional counterparts. In this work, we propose a new implementation regime that leverages propositional knowledge compilation for scaling up inference. In particular, we use sentential decision diagrams, a tractable representation of Boolean functions, as the underlying model counting and model enumeration scheme. Our regime performs competitively to state-of-the-art WMI systems, but is also shown, for the first time, to handle non-linear constraints over non-linear potentials.
Chiller: Contention-centric Transaction Execution and Data Partitioning for Fast Networks
Distributed transactions on high-overhead TCP/IP-based networks were conventionally considered to be prohibitively expensive and thus were avoided at all costs. To that end, the primary goal of almost any existing partitioning scheme is to minimize the number of cross-partition transactions. However, with the next generation of fast RDMA-enabled networks, this assumption is no longer valid. In fact, recent work has shown that distributed databases can scale even when the majority of transactions are cross-partition. In this paper, we first make the case that the new bottleneck which hinders truly scalable transaction processing in modern RDMA-enabled databases is data contention, and that optimizing for data contention leads to different partitioning layouts than optimizing for the number of distributed transactions. We then present Chiller, a new approach to data partitioning and transaction execution, which minimizes data contention for both local and distributed transactions. Finally, we evaluate Chiller using TPC-C and a real-world workload, and show that our partitioning and execution strategy outperforms traditional partitioning techniques which try to avoid distributed transactions, by up to a factor of 2 under the same conditions.
Counterfactual Learning from Human Proofreading Feedback for Semantic Parsing
In semantic parsing for question-answering, it is often too expensive to collect gold parses or even gold answers as supervision signals. We propose to convert model outputs into a set of human-understandable statements which allow non-expert users to act as proofreaders, providing error markings as learning signals to the parser. Because model outputs were suggested by a historic system, we operate in a counterfactual, or off-policy, learning setup. We introduce new estimators which can effectively leverage the given feedback and which avoid known degeneracies in counterfactual learning, while still being applicable to stochastic gradient optimization for neural semantic parsing. Furthermore, we discuss how our feedback collection method can be seamlessly integrated into deployed virtual personal assistants that embed a semantic parser. Our work is the first to show that semantic parsers can be improved significantly by counterfactual learning from logged human feedback data.
BCCNet: Bayesian classifier combination neural network
Machine learning research for developing countries can demonstrate clear sustainable impact by delivering actionable and timely information to in-country government organisations (GOs) and NGOs in response to their critical information requirements. We co-create products with UK and in-country commercial, GO and NGO partners to ensure the machine learning algorithms address appropriate user needs whether for tactical decision making or evidence-based policy decisions. In one particular case, we developed and deployed a novel algorithm, BCCNet, to quickly process large quantities of unstructured data to prevent and respond to natural disasters. Crowdsourcing provides an efficient mechanism to generate labels from unstructured data to prime machine learning algorithms for large scale data analysis. However, these labels are often imperfect with qualities varying among different citizen scientists, which prohibits their direct use with many state-of-the-art machine learning techniques. We describe BCCNet, a framework that simultaneously aggregates biased and contradictory labels from the crowd and trains an automatic classifier to process new data. Our case studies, mosquito sound detection for malaria prevention and damage detection for disaster response, show the efficacy of our method in the challenging context of developing world applications.
Reliability Modeling, Analysis and Prediction of Wireless Mobile Communications
The future Fifth Generation (5G) mobile cellular networks that are currently in research phase today enable broad range of services/applications beyond classical mobile communications. One key enabler for Ultra-Reliable services to be integrated into mobile networks is the \textit{Reliability} of transmission success of a given data packet. This is harder mainly owing to the time-dependent effective link qualities of the communicating devices. However, successful indication of the availability of the instantaneous link quality (e.g., by the device) would allow opportunistic access of ultra reliable services/applications when the link conditions are fair enough. This paper introduces a framework for modeling, predicting and analyzing the theoretical reliability of the wireless link based on factors such as fading, mobility, interference etc. The analysis and prediction is based on the part stress method\cite{Birolini2010} by assuming time dependent factors as elements/components and their respective Transmission Times To Failure (TTTF). The proposed framework also supports other reliability analysis techniques such as Fault Tree Analysis and Accelerated testing. of wireless systems and to improve the components
On the Transferability of Representations in Neural Networks Between Datasets and Tasks
Deep networks, composed of multiple layers of hierarchical distributed representations, tend to learn low-level features in initial layers and transition to high-level features towards final layers. Paradigms such as transfer learning, multi-task learning, and continual learning leverage this notion of generic hierarchical distributed representations to share knowledge across datasets and tasks. Herein, we study the layer-wise transferability of representations in deep networks across a few datasets and tasks and note some interesting empirical observations.
Bayesian Adversarial Spheres: Bayesian Inference and Adversarial Examples in a Noiseless Setting
Modern deep neural network models suffer from adversarial examples, i.e. confidently misclassified points in the input space. It has been shown that Bayesian neural networks are a promising approach for detecting adversarial points, but careful analysis is problematic due to the complexity of these models. Recently Gilmer et al. (2018) introduced adversarial spheres, a toy set-up that simplifies both practical and theoretical analysis of the problem. In this work, we use the adversarial sphere set-up to understand the properties of approximate Bayesian inference methods for a linear model in a noiseless setting. We compare predictions of Bayesian and non-Bayesian methods, showcasing the advantages of the former, although revealing open challenges for deep learning applications.
Robust Bayesian Cluster Enumeration
Linux-Tomcat Application Performance on Amazon AWS
The need for Linux system administrators to do performance management has returned with a vengeance. Why? The cloud. Resource consumption in the cloud is all about pay-as-you-go. This article shows you how performance models can find the most cost-effective deployment of an application on Amazon’s cloud.
Graph Multiview Canonical Correlation Analysis
Multiview canonical correlation analysis (MCCA) seeks latent low-dimensional representations encountered with multiview data of shared entities (a.k.a. common sources). However, existing MCCA approaches do not exploit the geometry of the common sources, which may be available \emph{a priori}, or can be constructed using certain domain knowledge. This prior information about the common sources can be encoded by a graph, and be invoked as a regularizer to enrich the maximum variance MCCA framework. In this context, the present paper’s novel graph-regularized Multiview canonical correlation analysis (G) MCCA approach minimizes the distance between the wanted canonical variables and the common low-dimensional representations, while accounting for graph-induced knowledge of the common sources. Relying on a function capturing the extent low-dimensional representations of the multiple views are similar, a generalization bound of GMCCA is established based on Rademacher’s complexity. Tailored for setups where the number of data pairs is smaller than the data vector dimensions, a graph-regularized dual MCCA approach is also developed. To further deal with nonlinearities present in the data, graph-regularized kernel MCCA variants are put forward too. Interestingly, solutions of the graph-regularized linear, dual, and kernel MCCA, are all provided in terms of generalized eigenvalue decomposition. Several corroborating numerical tests using real datasets are provided to showcase the merits of the graph-regularized MCCA variants relative to several competing alternatives including MCCA, Laplacian-regularized MCCA, and (graph-regularized) PCA.
Learning with Labels of Existing and Nonexisting
We study the classification or detection problems where the label only suggests whether any instance of a class exists or does not exist in a training sample. No further information, e.g., the number of instances of each class, their locations or relative orders in the training data, is exploited. The model can be learned by maximizing the likelihood of the event that in a given training sample, instances of certain classes exist, while no instance of other classes exists. We use image recognition as the example task to develop our method, although it is applicable to data with higher or lower dimensions without much modification. Our method can be used to learn all convolutional neural networks for object detection and localization, e.g., reading street view house numbers in images with varying sizes, without using any further processing.
On the Implicit Assumptions of GANs
Generative adversarial nets (GANs) have generated a lot of excitement. Despite their popularity, they exhibit a number of well-documented issues in practice, which apparently contradict theoretical guarantees. A number of enlightening papers have pointed out that these issues arise from unjustified assumptions that are commonly made, but the message seems to have been lost amid the optimism of recent years. We believe the identified problems deserve more attention, and highlight the implications on both the properties of GANs and the trajectory of research on probabilistic models. We recently proposed an alternative method that sidesteps these problems.
• Target Control of Directed Networks based on Network Flow Problems• Attacks on State-of-the-Art Face Recognition using Attentional Adversarial Attack Generative Network• MOBIUS: Model-Oblivious Binarized Neural Networks• Grid R-CNN• On the largest two and smallest six distance Pareto eigenvalues of a graph• Utilizing Complex-valued Network for Learning to Compare Image Patches• The distributions of sliding block patterns in finite samples and the inclusion-exclusion principles for partially ordered sets• EV-SegNet: Semantic Segmentation for Event-based Cameras• The Power of The Hybrid Model for Mean Estimation• RAM: Residual Attention Module for Single Image Super-Resolution• Entanglement and Disordered-Enhanced Topology in the Kitaev Chain• Problems related to conformal slit-mappings• Progressive Recurrent Learning for Visual Recognition• Recurrent Deep Divergence-based Clustering for simultaneous feature learning and clustering of variable length time series• Stability and tail limits of transport-based quantile contours• Performance Analysis of Deep Learning based on Recurrent Neural Networks for Channel Coding• Gathering Problems for Autonomous Mobile Robots with Lights• Multi-Scale Distributed Representation for Deep Learning and its Application to b-Jet Tagging• The diffusion of opposite opinions in a random-trend environment• Deep Haar Scattering Networks in Pattern Recognition: A promising approach• Joint Service Pricing and Cooperative Relay Communication for Federated Learning• A Polynomial-time Fragment of Epistemic Probabilistic Argumentation (Technical Report)• Networks for Nonlinear Diffusion Problems in Imaging• An Exact Cutting Plane Algorithm to Solve the Selective Graph Coloring Problem in Perfect Graphs• The Cheeger constant of curved tubes• Estimation of health effects (morbidity and mortality) attributed to PM10 and PM2.5 exposure using an Air Quality model in Bukan city, from 2015-2016 exposure using air quality model• A note on the exact simulation of spherical Brownian motion• Multilayer coevolution dynamics of the nonlinear voter model• An increasing sequence of lower bounds for the Estrada index of graphs and matrices• Two-level Attention with Two-stage Multi-task Learning for Facial Emotion Recognition• Global optimization of expensive black-box models based on asynchronous hybrid-criterion with interval reduction• Learning to Reason with Third-Order Tensor Products• Game Tree Search in a Robust Multistage Optimization Framework: Exploiting Pruning Mechanisms• Improving Robustness of Neural Dialog Systems in a Data-Efficient Way with Turn Dropout• Robust consumption-investment problem Under CRRA and CARA utilities with time-varying confidence sets• Parameter-Free Spatial Attention Network for Person Re-Identification• Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects• The Multiple Random Dot Product Graph Model• Data-parallel distributed training of very large models beyond GPU capacity• Iterative Residual CNNs for Burst Photography Applications• A construction which relates c-freeness to infinitesimal freeness• Two-valenced association schemes and the Desargues theorem• Detecting edges from non-uniform Fourier data via sparse Bayesian learning• ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving• Design of a Highly Reliable Wireless Module for Ultra-Low-Latency Short Range Applications• Extremal particles of two-dimensional Coulomb gases and random polynomials on a positive background• Early Stratification of Patients at Risk for Postoperative Complications after Elective Colectomy• Machine Learning Based Obstacle Detection for Automatic Train Pairing• ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness• Machine Learning on Electronic Health Records: Models and Features Usages to predict Medication Non-Adherence• Perceiving Physical Equation by Observing Visual Scenarios• Incremental Sparse TFIDF & Incremental Similarity with Bipartite Graphs• Influence of dimension on the convergence of level-sets in total variation regularization• Rates of contraction of posterior distributions based on $p$-exponential priors• Discovering Spatio-Temporal Action Tubes• An Evaluation of Design-based Properties of Different Composite Estimators• Graph Isomorphism for $(H_1,H_2)$-free Graphs: An Almost Complete Dichotomy• The Effect of Heterogeneous Data for Alzheimer’s Disease Detection from Speech• Locally Differentially-Private Randomized Response for Discrete Distribution Learning• $Ψ$ec: A Local Spectral Exterior Calculus• Improving Hospital Mortality Prediction with Medical Named Entities and Multimodal Learning• Flow-Based Local Graph Clustering with Better Seed Set Inclusion• An Implementation of the Poisson Multi-Bernoulli Mixture Trajectory Filter via Dual Decomposition• Reactive explorers to unravel network topology• Tuplemax Loss for Language Identification• Some Problems in Differentiation• Regression by clustering using Metropolis-Hastings• Face Detection in the Operating Room: Comparison of State-of-the-art Methods and a Self-supervised Approach• Incremental Scene Synthesis• Reinforced urns and the subdistribution beta-Stacy process prior for competing risks analysis• The cellular automaton pulsing model, experiments with DDLab• Binary Sequence Set Design for Interferer Rejection in Multi-Branch Modulation• A Deep Latent-Variable Model Application to Select Treatment Intensity in Survival Analysis• Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision• InverseRenderNet: Learning single image inverse rendering• Quantum error correction for the toric code using deep reinforcement learning• Limbs and Cospectral Vertices in Trees• Combating Fake News with Interpretable News Feed Algorithm• Evaluation of Complex-Valued Neural Networks on Real-Valued Classification Tasks• Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments• Uniqueness for contagious McKean–Vlasov systems in the weak feedback regime• Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations• Smoothed Analysis in Unsupervised Learning via Decoupling• Image Translation to Mixed-Domain using Sym-Parameterized Generative Network• Generators, projectors, and the Jones-Wenzl algebra• There is no isolated interface edge in very supercritical percolation• Small Hazard-free Transducers• Diverse Image Synthesis from Semantic Layouts via Conditional IMLE• Network architecture of energy landscapes in mesoscopic quantum systems• Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling• CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks
Like this:
Like Loading…
Related