Consider the following facts:
Trustworthy Data Analysis
Roger Peng ** 2018/06/04
rqdatatable: rquery Powered by data.table
rquery
is an R
package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL
and Apache Spark
. rqdatatable
is a new package that supplies a screaming fast implementation of the rquery
system in-memory using the data.table
package.
Data Links
Visibly the sentiment has quite considerably declined, there are much fewer tweets praising deep learning as the ultimate algorithm, the papers are becoming less “revolutionary” and much more “evolutionary”. Deepmind hasn’t shown anything breathtaking since their Alpha Go zero [and even that wasn’t that exciting, given the obscene amount of compute necessary and applicability to games only - see Moravec’s paradox]. OpenAI was rather quiet, with their last media outburst being the Dota 2 playing agent [which I suppose was meant to create as much buzz as Alpha Go, but fizzled out rather quickly]. In fact articles began showing up that even Google in fact does not know what to do with Deepmind, as their results are apparently not as practical as originally expected… As for the prominent researchers, they’ve been generally touring around meeting with government officials in Canada or France to secure their future grants, Yann Lecun even stepped down (rather symbolically) from the Head of Research to Chief AI scientist at Facebook. This gradual shift from rich, big corporations to government sponsored institutes suggests to me that the interest in this kind of research within these corporations (I think of Google and Facebook) is actually slowly winding down. Again these are all early signs, nothing spoken out loud, just the body language.
Lucy`s Secret Number puzzle
Since there are four questions, and each answer can be yes or no, there are sixteen possible combinations of answers (Think of these like binary bits of a four bit number).
3368a9b98a073e7ba296e1f5f41f6c4f
About a month ago, on a whim, I posted the #CraftyDataViz contest, hoping for some beautiful and wacky homemade visualizations, and you all sure came through! The entries were gorgeous and the judging was super difficult!
Bulk Loading Shapefiles Into Postgres/Postgis
Recently I’ve been doing a fair bit of work with geospatial data, mostly on the data preparation side. While there are common data formats, I have found that because so much of this data are sourced from government agencies, the data are often in many files that can be concatenated.
Python and Tidyverse
Introduction
Parallel, Disk-Efficient .zip to .gz Conversion
Similar to my last post about needing to merge shapefiles using Postgis, I recently downloaded a bunch of energy data from the federal government. 13,370 files to be exact. While the data size itself isn’t that large (~8GB, compressed), an open-source tool I was looking to evaluate only supports gzip compression instead of the zip compressed files I actually had.
A crystal clear book draw
As you might know, every month, a random Locke Data Twitter follower wins an excellent data science book! This month’s gift was “An Introduction to Statistical Learning: with Applications in R”, a classic and useful textbook. In this post I’ll give you some magick
-al tips from behind-the-scenes of this month’s winner announcement. It’ll feature learning from my mistakes, and reading from a crystal ball… or more seriously, image manipulation in R!