SunJackson Blog

Why pandas users should be excited about Apache Arrow

转载自：http://wesmckinney.com/blog/pandas-and-apache-arrow/

Wes McKinney

发表于 2016-02-22

** Mon 22 February 2016

阅读全文 »

Calling RSiteCatalyst From Python

转载自：http://randyzwitch.com/rsitecatalyst-adobe-analytics-python/

未知

发表于 2016-02-22

This will be a very short post, because the only “new” information I’m going to provide is the minimal example to answer the question. Yes, it is in fact possible to call RSiteCatalyst from Python and seems to work well. The most important things are 1) making sure you install rpy2 and 2) loading Pandas (since so much of RSiteCatalyst is API calls returning data frames). It doesn’t hurt to already have experience using RSiteCatalyst in R, since all we’re doing here is using Python to pass code to R.

阅读全文 »

SAGA algorithm in the lightning library

转载自：http://fa.bianp.net/blog/2016/saga-algorithm-in-the-lightning-library/

Fabian Pedregosa

发表于 2016-02-21

Recently I’ve implemented, together with Arnaud Rachez, the SAGA[] algorithm in the lightning machine learning library (which by the way, has been recently moved to the new scikit-learn-contrib project). The lightning library uses the same API as scikit-learn but is particularly adapted to online learning. As for the SAGA algorithm, its performance is similar to other variance-reduced stochastic algorithms such as SAG[] or SVRG[] but it has the advantage with respect to SAG[] that it allows non-smooth penalty terms (such as $\ell_1$ regularization). It is implemented in lightning as SAGAClassifier and SAGARegressor.

阅读全文 »

Data Science Learning Club Update

转载自：https://www.becomingadatascientist.com/2016/02/20/data-science-learning-club-update/

Renee

发表于 2016-02-21

For anyone that hasn’t yet joined the Becoming a Data Scientist Podcast Data Science Learning Club, I thought I’d write up a summary of what we’ve been doing!

阅读全文 »

Learning in Brains and Machines (1)： Temporal Differences

转载自：http://blog.shakirm.com/2016/02/learning-in-brains-and-machines-1/

shakirm

发表于 2016-02-21

· Read in 10 minutes · 1800 words · collected posts ·

阅读全文 »

Two nugget problem

转载自：http://datagenetics.com/blog/february42016/index.html

未知

发表于 2016-02-21

On a table infront of you are sixteen gold nuggets.You have been told that you can take any two of them as a gift.As is human nature, you wish to take the two heaviest nuggets.

阅读全文 »

Introducing Apache Arrow： A Fast, Interoperable In-Memory Columnar Data Structure Standard

转载自：http://blog.cloudera.com/blog/2016/02/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/

Justin Kestelyn

发表于 2016-02-18

Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works.

阅读全文 »

Why Blog?

转载自：http://swanintelligence.com/why-blog.html

Dan T.

发表于 2016-02-18

The first blog that really caught me was Coding Horror by Jeff Atwood. He is also one of the cofounders of Stack Overflow and Discourse. I was super late on the train when I discovered it in 2012. At the time I worked as a programmer at an enterprise software company and was constantly frustrated with the shitty code I was maintaining and writing myself. So Jeff hit a nerve when he wrote in the great enterprise software swindle:

阅读全文 »

Making Python on Apache Hadoop Easier with Anaconda and CDH

转载自：http://blog.cloudera.com/blog/2016/02/making-python-on-apache-hadoop-easier-with-anaconda-and-cdh/

Justin Kestelyn

发表于 2016-02-17

Enabling Python development on CDH clusters (for PySpark, for example) is now much easier thanks to new integration with Continuum Analytics’ Python platform (Anaconda).

阅读全文 »

Confluent Platform

转载自：http://rnduja.github.io/2016/02/15/confluent_platform_initial_evaluation/

未知

发表于 2016-02-15

The goal of this blog post is to evaluate the Confluent Platform.

阅读全文 »