** Mon 22 February 2016
Calling RSiteCatalyst From Python
This will be a very short post, because the only “new” information I’m going to provide is the minimal example to answer the question. Yes, it is in fact possible to call RSiteCatalyst from Python and seems to work well. The most important things are 1) making sure you install rpy2 and 2) loading Pandas (since so much of RSiteCatalyst is API calls returning data frames). It doesn’t hurt to already have experience using RSiteCatalyst in R, since all we’re doing here is using Python to pass code to R.
SAGA algorithm in the lightning library
Recently I’ve implemented, together with Arnaud Rachez, the SAGA[] algorithm in the lightning machine learning library (which by the way, has been recently moved to the new scikit-learn-contrib project). The lightning library uses the same API as scikit-learn but is particularly adapted to online learning. As for the SAGA algorithm, its performance is similar to other variance-reduced stochastic algorithms such as SAG[] or SVRG[] but it has the advantage with respect to SAG[] that it allows non-smooth penalty terms (such as $\ell_1$ regularization). It is implemented in lightning as SAGAClassifier and SAGARegressor.
Data Science Learning Club Update
For anyone that hasn’t yet joined the Becoming a Data Scientist Podcast Data Science Learning Club, I thought I’d write up a summary of what we’ve been doing!
Learning in Brains and Machines (1): Temporal Differences
· Read in 10 minutes · 1800 words · collected posts ·
Two nugget problem
On a table infront of you are sixteen gold nuggets.You have been told that you can take any two of them as a gift.As is human nature, you wish to take the two heaviest nuggets. |
Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard
Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works.
Why Blog?
The first blog that really caught me was Coding Horror by Jeff Atwood. He is also one of the cofounders of Stack Overflow and Discourse. I was super late on the train when I discovered it in 2012. At the time I worked as a programmer at an enterprise software company and was constantly frustrated with the shitty code I was maintaining and writing myself. So Jeff hit a nerve when he wrote in the great enterprise software swindle:
Making Python on Apache Hadoop Easier with Anaconda and CDH
Enabling Python development on CDH clusters (for PySpark, for example) is now much easier thanks to new integration with Continuum Analytics’ Python platform (Anaconda).
Confluent Platform
The goal of this blog post is to evaluate the Confluent Platform.