SunJackson Blog

Minimizing the Negative Log-Likelihood, in English

转载自：https://cavaunpeu.github.io/2017/05/18/minimizing_the_negative_log_likelihood_in_english/

Will Wolf

发表于 2017-05-18

Roughly speaking, my machine learning journey began on Kaggle. “There’s data, a model (i.e. estimator) and a loss function to optimize,” I learned. “Regression models predict continuous-valued real numbers; classification models predict ‘red,’ ‘green,’ ‘blue.’ Typically, the former employs the mean squared error or mean absolute error; the latter, the cross-entropy loss. Stochastic gradient descent updates the model’s parameters to drive these losses down.” Furthermore, to fit these models, just import sklearn.

阅读全文 »

Parallel computation with two lines of code

转载自：http://arseny.info/2017/parallel-computation-with-two-lines-of-code.html

Arseny Kravchenko

发表于 2017-05-18

It’s a naive advice for real beginners, however I’m sure I will copypaste snippets from here over and over again.

阅读全文 »

Workshop sur le Topic Modeling

转载自：https://alexisperrier.com/nlp/2017/05/17/topic_modeling_master_etudes_numeriques_UPEM.html

未知

发表于 2017-05-17

J’ai eu le plaisir de mener récemment un workshop sur le topic modeling dans le cadre du Master Méthode computationnelle et analyse de contenu à l’Université Paris Est Marne la vallée.

阅读全文 »

Python Deep Learning tutorial： Elman RNN implementation in Tensorflow

转载自：https://www.data-blogger.com/2017/05/17/elman-rnn-implementation-in-tensorflow/

Kevin Jacobs

发表于 2017-05-17

In this Python Deep Learning tutorial, an implementation and explanation is given for an Elman RNN. The implementation is done in Tensorflow, which is one of the many Python Deep Learning libraries.

阅读全文 »

Create conda recipe to use C extended Python library on PySpark cluster with Cloudera Data Science Workbench

转载自：http://blog.cloudera.com/blog/2017/05/create-conda-recipe-to-use-c-extended-python-library-on-pyspark-cluster-with-cloudera-data-science-workbench/

Cy Jervis

发表于 2017-05-15

Cloudera Data Science Workbench provides data scientists with secure access to enterprise data with Python, R, and Scala. In the previous article, we introduced how to use your favorite Python libraries on an Apache Spark cluster with PySpark. In Python world, data scientists often want to use Python libraries, such as XGBoost, which includes C/C++ extension. This post shows how to solve this problem creating a conda recipe with C extension. The sample repository is here.

阅读全文 »

Normal Distributions

转载自：http://efavdb.com/normal-distributions/

Jonathan Landy

发表于 2017-05-14

I review — and provide derivations for — some basic properties of Normal distributions. Topics currently covered: (i) Their normalization, (ii) Samples from a univariate Normal, (iii) Multivariate Normal distributions, (iv) Central limit theorem.

阅读全文 »

Voronoi Diagrams

转载自：http://datagenetics.com/blog/may12017/index.html

未知

发表于 2017-05-12

阅读全文 »

Getting Started with Cloudera Data Science Workbench

转载自：http://blog.cloudera.com/blog/2017/05/getting-started-with-cloudera-data-science-workbench/

Cy Jervis

发表于 2017-05-08

Last week, Cloudera announced the General Availability release of Cloudera Data Science Workbench. In this post, I’ll give a brief overview of its capabilities and architecture, along with a quick-start guide to connecting Cloudera Data Science Workbench to your existing CDH cluster in three simple steps.

阅读全文 »

Transfer Learning for Flight Delay Prediction via Variational Autoencoders

转载自：https://cavaunpeu.github.io/2017/05/08/transfer-learning-flight-delay-prediction/

Will Wolf

发表于 2017-05-08

In this work, we explore improving a vanilla regression model with knowledge learned elsewhere. As a motivating example, consider the task of predicting the number of checkins a given user will make at a given location. Our training data consist of checkins from 4 users across 4 locations in the week of May 1st, 2017 and looks as follows:

阅读全文 »

The Benefits of Migrating HPC Workloads To Apache Spark

转载自：http://blog.cloudera.com/blog/2017/05/the-benefits-of-migrating-hpc-workloads-to-spark/

Cy Jervis

发表于 2017-05-04

Recently we worked with a customer that needed to run a very significant amount of models in a given day to satisfy internal and government regulated risk requirements. Several thousand model executions would need to be supported per hour. Total execution time was very important to this client. In the past the customer used thousands of servers to meet the demand. They need to run many derivations of this model with different economic factors to satisfy their requirements. For example, a financial model may calculate risk to the bank based on many different runs with varying economic factors. This particular model was planned to consume up to 40K CPU cores once in production. The reason for so many cores is simple, they need these jobs to be completed as quickly as possible to make adjustments for the business and sometimes the government to test varying economic factors that affect a financial institution. The cycle that they run these jobs in are very compressed and allows very little room for error.

阅读全文 »