Distilled News

29 Statistical Concepts Explained in Simple English – Part 3

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, decision trees, ensembles, correlation, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, cross-validation, model fitting, and many more

Windows Clipboard Access with R

The windows clipboard is a quick way to get data in and out of R. How can we exploit this feature to accomplish our basic data exploration needs and when might its use be inappropriate? Read on.

Explaining Black-Box Machine Learning Models – Code Part 2: Text classification with LIME

his is code that will accompany an article that will appear in a special edition of a German IT magazine. The article is about explaining black-box machine learning models.

Building a Repository of Alpine-based Docker Images for R, Part II

In the first article of this series, I built an Alpine-based Docker image with R base packages from Alpine’s native repositories, as well as one image with R compiled from source code. The images are hosted on Docker Hub, velaco/alpine-r repository. The next step was either to address the fatal errors I found while testing the installation of R or to proceed building an image with Shiny Server. The logical choice would have been to pass all tests with R’s base packages before proceeding, but I was a bit impatient and wanted to go through the process of building a Shiny Server as soon as possible. After two weeks of trial and error, I finally have a container that can start the server and run Shiny apps.

Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport

In this blog post, I will show you how you can quickly and easily forecast a univariate time series. I am going to use data from the EU Open Data Portal on air passenger transport. You can find the data here. I downloaded the data in the TSV format for Luxembourg Airport, but you could repeat the analysis for any airport.

AI for Good: slides and notebooks from the ODSC workshop

Last week at the ODSC West conference, I was thrilled with the interest in my Using AI for Good workshop: it was wonderful to find a room full of data scientists eager to learn how data science and artificial intelligence can be used to help people and the planet. The workshop was focused around projects from the Microsoft AI for Good program. I’ve included some details about the projects below, and you can also check out the workshop slides and the accompanying Jupyter Notebooks that demonstrate the underlying AI methods used in the projects.

Installing RStudio & Shiny Servers

I did a remote install of Ubuntu Server today. This was somewhat novel because it’s the first time that I have not had physical access to the machine I was installing on. The server install went very smoothly indeed.

Interchanging RMarkdown and ‘spinnable’ R

Interchanging RMarkdown and ‘spinnable’ R

Behaviour Analysis using Graphext

Why do people act the way they do? Why do they buy products, quit their jobs, or change partners? Many of these motives can be inducted from people’s behaviour, and these behaviours are reflected in data. Companies have lots of data about their clients, employees, suppliers… Let’s put that data to work to do some smart data discovery and see what we could learn.

Job Title Analysis in python and NLTK

A job title indicates a lot about someone’s role and responsibilities. It says if they manage a team, if they control a budget, and their level of specialization. Knowing this is useful when automating business development or client outreach. For example, a company that sells voice recognition software may want to send messages to:• CTOs and technical directors informing them of the price and benefits of using the voice recognition software.• Potential investors or advisors messages inviting them to see the company’s potential market size.• Founders and engineers instructing them how to use the software.Training a software to classify job titles is a multi-text text classification problem. For this task, we can use the Python Natural Language Toolkit (NLTK) and Bayesian classification.

Doing Machine Learning the Uber Way: Five Lessons From the First Three Years of Michelangelo

Uber has been one of the most active contributors to open source machine learning technologies in the last few years. While companies like Google or Facebook have focused their contributions in new deep learning stacks like TensorFlow, Caffe2 or PyTorch, the Uber engineering team has really focused on tools and best practices for building machine learning at scale in the real world. Technologies such as Michelangelo, Horovod, PyML, Pyro are some of examples of Uber’s contributions to the machine learning ecosystem. With only a small group of companies developing large scale machine learning solutions, the lessons and guidance from Uber becomes even more valuable for machine learning practitioners (I certainly learned a lot and have regularly written about Uber’s efforts).

https://www.kdnuggets.com/2018/11/best-python-ide-data-science.html

Before you start learning Python, choose the IDE that suits you the best. As Python is one of the leading programming languages, there is a multitude of IDEs available. So the question is, ‘Which is the best Python IDE for Data Science?’

Introduction to Image Recognition: Building a Simple Digit Detector

Digit recognition is not something that difficult or advanced. It is kind of ‘Hello world!’ program – not that cool, but you start exactly here. So I decided to share my work and at the same time refresh the knowledge – it’s being a long ago I played with images.

The 2×2 Data Science Skills Matrix that Harvard Business Review got completely wrong!

Data Science is the current buzzword in the market. Every company at the moment is looking to hire Data Science Professionals to solve some Data problem that they themselves are not aware of currently. Machine Learning has taken over the industry by storm and we have a bunch of self taught Data Scientists in the market. Since this Data Science word is an altogether different universe, it is very difficult to set up priorities on what to learn and what not to. So in this case the Harvard Business Review published an article on what you as a company or individual should give importance to. Let’s have a look.

Decision Tree in Machine Learning

A decision tree is a flowchart-like structure in which each internal node represents a test on a feature (e.g. whether a coin flip comes up heads or tails) , each leaf node represents a class label (decision taken after computing all features) and branches represent conjunctions of features that lead to those class labels. The paths from root to leaf represent classification rules. Below diagram illustrate the basic flow of decision tree for decision making with labels (Rain(Yes), No Rain(No)).

Using Bash for Data Pipelines

Using bash scripts to create data pipelines is incredibly useful as a data scientist. The possibilities with these scripts are almost endless, but here, I will be going through a tutorial on a very basic bash script to download data and count the number of rows and cols in a dataset. Once you get the hang of using bash scripts, you can have the basics for creating IoT devices, and much much more as this all works with a Raspberry Pi. One cool project that you could use this for is to download all of your twitter messages using the twitter api and then predict whether or not a message from a user on Twitter is spam or not. It could run on a Raspberry Pi server from your room! That is a little out of the scope of this tutorial though, so we will begin by looking at a dataset for cars speed in San Francisco!

From Scratch: Bayesian Inference, Markov Chain Monte Carlo and Metropolis Hastings, in python

I’ve been an avid reader on medium/towards data science for a while now, and I’ve enjoyed the diversity and openness of the subjects tackled by many authors. I wish to contribute to this awesome community by creating my own series of articles ‘From Scratch’, where I explain and implement/build anything from scratch (not necessarily in data science, you need only propose!) Why do I want to do that? In the current state of things, we are in possession of such powerful libraries and tools that can do a lot of the work for us. Most experienced authors are well aware of the complexities of implementing such tools. As such, they make use of them to provide short, accessible and to the point reads to users from diverse backgrounds. In many of the articles that I enjoyed, I failed to understand how this or that algorithm is implemented in practise. What are their limitations? Why were they invented? When should they be used?

Reverse Engineering Backpropagation

Sometimes starting with examples might be a faster way to learn something rather than going theory first before getting into detailed examples. That’s what I will attempt to do here using an example from the official PyTorch tutorial that implements backpropogation and reverse engineer the math and subsequently the concept behind it.

ML Intro 5: One hot Encoding, Cyclic Representations, Normalization

This post follows Machine Learning Introduction 4. In the previous post, we described Machine Learning for marketing attribution. In this post, we will illuminate some of the details we ignored in that section. We will inspect a dataset about Marketing Attribution, perform one-hot encoding of our brands, manipulate our one-hot encoding to learn custom business insights, normalize our features, inspect our model inputs once this is all done, and interpret our outputs in detail.

Machine Learning Bit by Bit – Multivariate Gradient Descent

In this post, we’re going to extend our understanding of gradient descent and apply it to a multivariate function.

Machine Learning Bit by Bit – Univariate Gradient Descent

This series aims to share my own endeavour to understand, explore and experiment on topics in machine learning. Mathematical notations in this particular post and the next one on multivariate gradient descent will be mostly in line with those used in the Machine Learning course by Andrew Ng. Understanding and being able to play around with the maths behind is the key in understanding machine learning. It allows us to choose the most suitable algorithms and tailor them according to the problems we want to solve. However, I have encountered many tutorials and lectures where equations used are simply impenetrable. All the symbols look cryptic and there seems to be a huge gap between what is being explained and those equations. I just can’t connect all the dots. Unfortunately, more often than not, maths hinders understanding when some knowledge is assumed and important steps are skipped. Therefore, wherever possible, I will expand the equations and avoid shortcuts, so that everyone can follow along how we reach from the left side of the equation to the right.

Like this:

Like Loading…

Related