Distilled News

Puppet

In computing, Puppet is an open-source software configuration management tool. It runs on many Unix-like systems as well as on Microsoft Windows, and includes its own declarative language to describe system configuration. Puppet is designed to manage the configuration of Unix-like and Microsoft Windows systems declaratively. The user describes system resources and their state, either using Puppet’s declarative language or a Ruby DSL (domain-specific language). This information is stored in files called ‘Puppet manifests’. Puppet discovers the system information via a utility called Facter, and compiles the Puppet manifests into a system-specific catalog containing resources and resource dependency, which are applied against the target systems. Any actions taken by Puppet are then reported. Puppet consists of a custom declarative language to describe system configuration, which can be either applied directly on the system, or compiled into a catalog and distributed to the target system via client-server paradigm (using a REST API), and the agent uses system specific providers to enforce the resource specified in the manifests. The resource abstraction layer enables administrators to describe the configuration in high-level terms, such as users, services and packages without the need to specify OS specific commands (such as rpm, yum, apt). Puppet is model-driven, requiring limited programming knowledge to use. Puppet comes in two versions, Puppet Enterprise and Open Source Puppet. In addition to providing functionalities of Open Source Puppet, Puppet Enterprise also provides GUI, API and command line tools for node management.

3 Stages of Creating Smart

Okay, that’s a pretty bold statement on my part (especially to challenge the famous Steve Jobs statement about building insanely great products), but then again I’m an analytics dude and think that analytics should be a part of every product and space – smart cities, smart cars, smart vacuums, smart hospitals, smart Chipotle…Step 1: Identify, Validate, Value and Prioritize the Decisions that Power ‘Smart’Step 2: Create Product Self-AwarenessStep 3: Map Advanced Analytics to Your Smart-enabling ArchitectureOrganizations are facing a business model disruption like they have never experienced before, fueled by rapid advances in sensor technologies, an out of control avalanche of sensor data and the rapid democratization of advanced machine learning, deep learning and artificial intelligence capabilities. These are truly ‘weapons of business model mass destruction’.Unfortunately, organization’s IOT strategy looks like a giant game of ‘Twister’ with random, uncoordinated investments in architecture, technology, data, analytics and governance (see the blog ‘Avoiding the IOT ‘Twister’ Business Strategy’ for more on this giant game of IoT ‘Twister’).To avoid the IoT ‘Twister’ game and avoid becoming a victim of these ‘weapons of business model mass destruction’, one needs a coherent, coordinated plan for identifying the sources of customer and market value creation and capturing those sources of value. And more and more, those sources of customer and market value creation will be delivered via smart product and spaces that leverage these IoT and advanced analytic capabilities to create products and environments that self-monitor, self-diagnose, self-cure and continuously learn across every step in the process.

Anomaly Detection Using Replicator Neural Networks Trained on Examples of One Class

Anomaly detection aims to find patterns in data that are significantly different from what is defined as normal. One of the challenges of anomaly detection is the lack of labelled examples, especially for the anomalous classes. We describe a neural network based approach to detect anomalous instances using only examples of the normal class in training. In this work we train the net to build a model of the normal examples, which is then used to predict the class of previously unseen instances based on reconstruction error rate. The input to this network is also the desired output. We have tested the method on six benchmark data sets commonly used in the anomaly detection community. The results demonstrate that the proposed method is promising for anomaly detection. We achieve F-score of more than 90% on 3 data sets and outperform the original work of Hawkins et al. on the Wisconsin breast cancer set.

Stan: A Probabilistic Programming Language

Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

Causality in Databases

Provenance is often used to validate data, by verifying its origin and explaining its derivation. When searching for ’causes’ of tuples in the query results or in general observations, the analysis of lineage becomes an essential tool for providing such justifications. However, lineage can quickly grow very large, limiting its immediate use for providing intuitive explanations to the user. The formal notion of causality is a more refined concept that identifies causes for observations based on user-defined criteria, and that assigns to them gradual degrees of responsibility based on their respective contributions. In this paper, we initiate a discussion on causality in databases, give some simple definitions, and motivate this formalism through a number of example applications.

10 Interesting Use Cases for the K-Means Algorithm

Learn what the k-means algorithm is, learn about its origins, and learn about some key use cases for it. The k-means algorithm is one of the oldest and most commonly used clustering algorithms. It is a great starting point for new ML enthusiasts to pick up, given the simplicity of its implementation. As part of this post, we will review the origins of this algorithm and typical usage scenarios.

The Ultimate Guide to 12 Dimensionality Reduction Techniques (with Python codes)

Have you ever worked on a dataset with more than a thousand features? How about over 50,000 features? I have, and let me tell you it´s a very challenging task, especially if you don´t know where to start! Having a high number of variables is both a boon and a curse. It´s great that we have loads of data for analysis, but it is challenging due to size. It´s not feasible to analyze each and every variable at a microscopic level. It might take us days or months to perform any meaningful analysis and we´ll lose a ton of time and money for our business! Not to mention the amount of computational power this will take. We need a better way to deal with high dimensional data so that we can quickly extract patterns and insights from it. So how do we approach such a dataset?1 Missing Value Ratio2 Low Variance Filter3 High Correlation Filter4 Random Forest5 Backward Feature Elimination6 Forward Feature Selection7 Factor Analysis8 Principal Component Analysis9 Independent Component Analysis10 Methods Based on Projections11 t-Distributed Stochastic Neighbor Embedding (t-SNE)12 UMAP

The Artificial Neural Networks handbook: Part 1

I have written several articles on Artificial Neural Networks earlier but they were just random articles on random concepts. This series of articles will give you a detailed idea about Artificial neural networks and concepts related to it. The resources and references to all the contents will be mentioned at the end of series so you can study all concepts in depth. So, let´s start with a very basic question what is AI and what are artificial neural networks? In this very first article in this series I will try to answer this basic questions and then we will go ahead in depth in further articles.

Detect Anomalies with Anomalize in R

Learn how to detect anomalies in large time series data sets and present insights in a much simpler way.

The future of data-driven discovery in the cloud

Ryan Abernathey makes the case for the large-scale migration of scientific data and research to the cloud.

Jupyter notebooks and the intersection of data science and data engineering

David Schaaf explains how data science and data engineering can work together to deliver results to decision makers.

Adjacent-Categories and Continuation-Ratio Logit Models for Ordinal Outcomes

Simulating NXN dimensional Gaussian clusters in R

Datazar R/Python Scripts Now Executable

R and Python scripts you upload or create on Datazar are now executable. R files with an .r or .rscript and Python files with .pyextensions will be active for running. You can now analyze your datasets, import other people´s public data, load libraries and share results directly from your browser using your scripts. These scripts are also available for access from your notebooks and consoles so you can save reusable code in the scripts and call them from the notebooks and consoles.

The Chartmaker Directory: Data visualizations in every tool

Working with a new data visualization tool, and wondering how to create a specific type of chart? The Chartmaker Directory (designed by Andy Kirk) indexes more than 35 tools and over 50 charts, and provides links to examples from each combination. Here’s an intentionally small detail from the index, where each hollow dot in a column represents a sample chart created a tool (the rightmost column is R, for example), and solid dots also include code.

Beyond Basic R – Version Control with Git

Depending on how new you are to software development and/or R programming, you may have heard people mention version control, Git, or GitHub. Version control refers to the idea of tracking changes to files through time and various contributors. Git is an example of a version control tool, and GitHub is a popular web interface for Git. That´s it. Easy!

Like this:

Like Loading…

Related