- Programming
Extract data from a PNG/TIFF
Sometimes it’s useful to be able to extract data from a published figure. If the figure isn’t a vector based format (for which the numeric data is probably still in the file), it’s possible to digitize the image with R, click the points and extract it that way. The digitize package is simple to use for this purpose…
The Quick Python Book
Sponsored Post.The Quick Python Book, Third Edition is a comprehensive guide to the Python language by Naomi Ceder, Founder of the Python Education Summit. With the personal touch of a skilled teacher, she beautifully balances details of the language with the insights and advice you need to handle any task. The extensive, relevant examples and exercises inside further help you master each important concept, whether you’re scraping websites or playing around with nested tuples!
Creating Tables Using R and Pure HTML
A problem with R is that its tables are not good enough to share with non-R users, both in terms of visual attractiveness and ease of reading – particularly when the table is large. Quite a few different packages, tools, and workflows have been developed to address this problem, from formattable through to R Markdown and Displayr, to name a few. Over the past few months I have found myself increasingly using R to write tables in pure HTML. Why? Because pure HTML gives the greatest level of control. In this post I am going to work through a simple but easily generalizable example, which can both be used within R and RStudio, as well as when building interactive dashboards.
My Self-Driving Presentation for TTS
Here is the presentation, I gave at Mozilla All-Hands Orlando about https://github.com/mozilla/TTS
Anomaly detection on Amazon DynamoDB Streams using the Amazon SageMaker Random Cut Forest algorithm
Have you considered introducing anomaly detection technology to your business? Anomaly detection is a technique used to identify rare items, events, or observations which raise suspicion by differing significantly from the majority of the data you are analyzing. The applications of anomaly detection are wide-ranging including the detection of abnormal purchases or cyber intrusions in banking, spotting a malignant tumor in an MRI scan, identifying fraudulent insurance claims, finding unusual machine behavior in manufacturing, and even detecting strange patterns in network traffic that could signal an intrusion.
Magister Dixit
“Current machine learning systems operate, almost exclusively, in a statistical, or model-free mode, which entails severe theoretical limits on their power and performance. Such systems cannot reason about interventions and retrospection and, therefore, cannot serve as the basis for strong AI. To achieve human level intelligence, learning machines need the guidance of a model of reality, similar to the ones used in causal inference tasks.” Judea Pearl ( July 2018 )
Kick Start Your Data Career! Tips From the Frontline
By Vaishali Lambe, Data Scientist
Learn to do Data Viz in R
One of the reasons that R is a top language for data science is that it’s great for data visualization. R users can take advantage of the wildly popular ggplot2
package to turn massive data sets into easily-readable charts in just a few lines of code. That can be incredibly valuable for presenting your data, but more importantly, when it’s done right, data viz is a tool for helping you understand what the data is telling you.
Document worth reading: “A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition”
Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and requires only a parallel corpus of speech and text for training. However, unlike in conventional approaches that combine separate acoustic and language models, it is not clear how to use additional (unpaired) text. While there has been previous work on methods addressing this problem, a thorough comparison among methods is still lacking. In this paper, we compare a suite of past methods and some of our own proposed methods for using unpaired text data to improve encoder-decoder models. For evaluation, we use the medium-sized Switchboard data set and the large-scale Google voice search and dictation data sets. Our results confirm the benefits of using unpaired text across a range of methods and data sets. Surprisingly, for first-pass decoding, the rather simple approach of shallow fusion performs best across data sets. However, for Google data sets we find that cold fusion has a lower oracle error rate and outperforms other approaches after second-pass rescoring on the Google voice search data set. A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition