Deep Learning Dead-End?
Customizing Docker Images in Cloudera Data Science Workbench
This article shows how to build and publish a customized Docker image for usage as an engine in Cloudera Data Science Workbench. Such an image or engine customization gives you the benefit of being able to work with your favorite tool chain inside the web based application.
QuantConnect – the only Game in Town
At least in my town. Sometime back I kind of decided to use Quantopian as the backtesting platform of my choice. QuantConnect was a close second best. Now, a few months later, Quantopian has decided to end the live trading. No choice, but to go back to the second choice.QuantConnet has several advantages, it was a pretty close decision even when there was a choice. Now, there is little choice – makes things easier.
Joining ASAPP
The open-source “masters” has come to a close. I’m now joining ASAPP, Inc. as a Machine Learning Engineer.
Semantic trees for training word embeddings with hierarchical softmax
Word vector models represent each word in a vocabulary as a vector in a continuous space such that words that share the same context are “close” together. Being close is measured using a distance metric or similarity measure such as the Euclidean distance or cosine similarity. Once word vectors have been trained on a large corpus, one can form document vectors to compare documents based on their content similarity. A central question is how to obtain “good” word vectors in the first place. For this various models based on neural networks have been proposed, one of the most popular ones being word2vec. In the “continous-bag-of-words” (CBOW) architecture of word2vec, word vectors are trained by predicting the central word of a sliding window given its neighbouring words. This is formulated as a classification problem, where the correct central word has to be selected among the full vocabulary given the context. Usually one would use a softmax classifier as the top layer of such a network. However, for the softmax the training time grows linearly in the number of possible outcomes, making the method unsuitable for large vocabularies.
Making Smart Phones Dumb Again
** Thu 07 September 2017
What Killed the Curse of Dimensionality?
How does Deep Learning overcome this hurdle in machine learning and why?
Deep Learning with Intel’s BigDL and Apache Spark
Cloudera recently published a blog post on how to use Deeplearning4J (DL4J) along with Apache Hadoop and Apache Spark to get state-of-the-art results on an image recognition task. Continuing on a similar stream of work, in this post we discuss a viable alternative that is specifically designed to be used with Spark, and data available in Spark and Hadoop clusters via a Scala or Python API.
Software patents are evil, but BSD+Patents is probably not the solution
** Tue 05 September 2017