Distilled News

automl package: part 1/2 why and how

automl package provides:• Deep Learning last tricks (those who have taken Andrew NG´s MOOC on Coursera will be in familiar territory)• hyperparameters autotune with metaheuristic (PSO)• experimental stuff and more to come (you´re welcome as coauthor!)

Faceted Graphs with cdata and ggplot2

In between client work, John and I have been busy working on our book, Practical Data Science with R, 2nd Edition. To demonstrate a toy example for the section I’m working on, I needed scatter plots of the petal and sepal dimensions of the iris data.

How does the popular XGBoost and LightGBM algorithms enforce monotonic constraint

Both XGBoost and LightGBM have this nice parameter to enforce a monotonic relationship between a feature and the prediction . This comes handy when when our ‘mental model’ tells us to expect such a monotonic relationship. For example, all else being equal, we expect a larger house to be more expensive than a smaller house in the same neighborhood. If you let the algorithm run freely, it may capture an overall positive square footage to price relationship (controlling other factors) but with some local areas having reversed relationship?-?larger square footage associated with lower price?-?that are counter-intuitive. This kind of situation often happens in the areas where data are sparse, or have a lot of noise. From my experience putting a monotonic constraint that truly makes sense often result in better model performance on the test data, meaning that the constraint model actually generalize better. So one can view monotonic constraint a type of regularization.

The 5 Basic Statistics Concepts Data Scientists Need to Know

Statistics can be a powerful tool when performing the art of Data Science (DS). From a high-level view, statistics is the use of mathematics to perform technical analysis of data. A basic visualisation such as a bar chart might give you some high-level information, but with statistics we get to operate on the data in a much more information-driven and targeted way. The math involved helps us form concrete conclusions about our data rather than just guesstimating.

Speech Recognition Challenge with Deep Learning Studio

Talking more voice interface Speech recognition technology has become an increasingly popular concept in recent years. From organizations to individuals, the technology is widely used for various advantages it provides. One of the most notable advantages of speech recognition technology includes the dictation ability it provides. With the help of the technology users can easily control devices and create documents by speaking. Speech recognition can allow documents to be created faster because the software generally produces words as fast as they are spoken, which is generally much faster than a person can type. No longer will spelling or writing hold you back. Voice recognition software, as well as being faster to complete tasks, is increasingly accurate when it comes to vocabulary and spelling. For most of us, the ultimate luxury would be an assistant who always listens for your call, anticipates your every need, and takes action when necessary. That luxury is now available thanks to artificial intelligence assistants, aka voice assistants. Voice assistants come in somewhat small packages and can perform a variety of actions after hearing a wake word or command. They can turn on lights, answer questions, play music, place online orders, etc. But, for independent makers and entrepreneurs, it’s hard to build a simple speech detector using free, open data and code. Many voice recognition datasets require preprocessing before a neural network model can be built on them. To help with this, TensorFlow had released the Speech Commands Datasets. It includes 65,000 one-second long utterances of 30 short words, by thousands of different people.

Explore the API Economy of Infrastructure Services

There is an API for that!’ is a common answer talking to digital-savvy organizations today. Indeed, according to ProgrammableWeb.com, there are almost 20,000 public APIs available. Digital-savvy organizations rely heavily on private, public or partner APIs to work in the current state. Companies like Salesforce generate 50% of the revenue just through APIs, eBay generates 60% and Expedia even 90%.

Raster Vision

Raster Vision is an open source framework for Python developers building computer vision models on satellite, aerial, and other large imagery sets (including oblique drone imagery). It allows for engineers to quickly and repeatably configure experiments that go through core components of a machine learning workflow: analyzing training data, creating training chips, training models, creating predictions, evaluating models, and bundling the model files and configuration for easy deployment. Raster Vision workflows begin when you have a set of images and training data, optionally with Areas of Interest (AOIs) that describe where the images are labeled. Raster Vision workflows end with a packaged model and configuration that allows you to easily utilize models in various deployment situations. Inside the Raster Vision workflow, there’s the process of running multiple experiments to find the best model or models to deploy.

10 top Python resources on O’Reilly’s online learning platform

Our most-used Python resources will help you stay on track in your journey to learn and apply Python.• Introduction to Python• Python for Data Analysis, 2nd Edition• Learning Python, 5th Edition• Deep Learning with Python• Learn Python 3 the Hard Way: A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code, 1st Edition• Fluent Python• Automate the Boring Stuff with Python• Python Crash Course• Effective Python: 59 Specific Ways to Write Better Python• Python Cookbook, 3rd Edition

Type-Driven Program Synthesis

A promising approach to improving software quality is to enhance programming languages with declarative constructs, such as logical specifications or examples of desired behavior, and to use program synthesis technology to translate these constructs into efficiently executable code. In order to produce code that provably satisfies a rich specification, the synthesizer needs an advanced program verification mechanism that is sufficiently expressive and fully automatic. In this talk, I will present a program synthesis technique based on refinement type checking: a verification mechanism that supports automatic reasoning about expressive properties through a combination of types and SMT solving. The talk will present two applications of type-driven synthesis. The first one is a tool called Synquid, which creates recursive functional programs from scratch given a refinement type as input. Synquid is the first synthesizer powerful enough to automatically discover programs that manipulate complex data structures, such as balanced trees and propositional formulas. The second application is a language called Lifty, which uses type-driven synthesis to repair information flow leaks. In Lifty, the programmer specifies expressive information flow policies by annotating the sources of sensitive data with refinement types, and the compiler automatically inserts access checks necessary to enforce these policies across the code.

Like this:

Like Loading…

Related