Renee Teate interviews Clare Corthell, founding partner of summer.ai (now Luminant Data) and creator of the Open Source Data Science Masters curriculum, about becoming a data scientist.Podcast Audio Links:Link to podcast Episode 5 audioPodcast’s RSS feed for podcast subscription appsPodcast on StitcherPodcast on iTunes
Developing effective data scientists
Last year I had the honor and pleasure of visiting my graduate school alma mater to speak at the kick-off event for the University of Michigan’s new Institute for Data Science. They invited me to share my work and teaching experiences, and offer guidance to students and educators of data science. I asked a bunch of my data scientist friends what they could have done differently to better prepare themselves for their careers, and there were a lot of similarities in the responses. I’ve attempted to distill them into three core competencies: communication, coding, and curiosity… and alliteration, when possible.
d20 stopping puzzle
Another die rolling probability puzzle game this week.This puzzle involves a single d20 (a twenty sided die); something that anyone who has played Dungeons and Dragons will be intimately familiar with. Typically they are made in the shape of an icosahedron, the most complex of the platonic solids. The puzzles goes like this: We play a game. You roll the die, and can elect to bank, or roll again. If you bank, you walk away with the dollar amount shown on the die, and the game ends. If you elect to re-roll, it costs you $1 for each new roll. You can re-roll as often as you like. (Your first roll is free).What is your Expected Return? |
The Best of Unpublished Machine Learning and Statistics Books
Nowadays authors in the fields of statistics and machine learning often choose to write their books openly by publishing early draft versions. For popular books this creates a lot of feedback and in the end clearly improves the final book when it is published.
Implement spelling correction using Language Models
Spelling correction is not a trivial task for a computer. Better and better models are invented to tackle problems such as spelling correction. Language models are the kind of models that are being used for this task. Language models are also used for correcting errors in speech recognition, machine translation, for language and authorship identification, text compression and topic relevance ranking. In this article, language models are being used for a simple spelling correction application.
Paris Meetup slides Topic Modeling of Twitter Followers
Analyse sémantique de collections de documents
Russian Roulette
Most people are familiar with the game of Russian Roulette.However, just in case, here’s a recap of the vanilla variant: You take a revolver which has six empty chambers, insert a single cartridge, and then give the cylinder a good spin. When the spindle stops (presumably in a random position), you pull the trigger.CLICK! If the chamber was empty you live. If the round stopped in the firing position, you die. |
Amazon Redshift Performance – Bigger Clusters, or Bigger Nodes?
Last week, I looked into options for increasing the performance of an Amazon Redshift cluster that was currently using 10 large dense compute nodes (dc1.large). While investigating, I noticed that a cluster of 32 dc1.large nodes (the maximum for that node type) had the same number of CPUs, the same amount of storage, and a comparable amount of RAM to a cluster of 2 dc1.8xlarge nodes (the minimum for that node type), while offering significant cost savings for anything less than a 3-year term. This got me wondering… if you want a cluster that’s around that size, which is a better bet for performance?
Class visualization with bilateral filters
A while ago I played with style visualizations and bilateral filters. The latter have the nice property of filtering out noise but preserving edges. Here are some example class from GoogLeNet (Inception network). Big shout out to Audun m. Øygard and Kyle McDonald who were among the first to use filters (e.g. gaussian blurs) essentially as image regularizers for single class visualizations. These visualizations here were directly inspired by their ideas.
Six Roll Dice Game
You visit a casino, and are offered the chance to play a dice game. The rules are very simple: A single (fair) die is rolled, up-to, six times. At the end of every roll, you are given the chance to “Bank”.If you bank, you win the dollar amount shown on the dice, and the game stops.If you elect not to bank, you are committed to roll the die again (you cannot go back). If you do not bank during the first five rolls, then you will automatically bank after the sixth roll and receive whatever it shows.Question: How much would you pay to play this game?(Specifically, what is the expected outcome of this game? What is the optimal play strategy?) |