My wife has a super power! She is totally immune to the constant nagging of her iPhone. She has an amazing ability to resist checking her emails every 5 minutes, texting back on the spot and playing the whack-a-notification game all day long. Maybe it’s in her DNA. After all, some genes do influence predisposition to addiction. I used to be annoyed by her nonchalance. Now I’m just envious.
Superbowl Helmet Puzzle
It’s Superbowl LII day today. Superbowls, clock-faces, and occasionally fancy documents still use Roman Numerals (along with Kings, Queens, Popes and the Olympic Games).
Hiring Data Scientists
Chicago’s a big city that feels small – everyone seems only a degree or two away from one another. This feels especially true within Chicago’s tech and data science communities.
A Practical Guide to the "Open-Source Machine Learning Masters"
As an “in” professional discipline, Machine Learning exhibits a curious behavior: though talent is frustratingly scarce, it’s immensely easy for the individual to obtain. Why? It’s all available online, (largely) free for the taking.
Getting Started With MapD, Part 1: Docker Install and Loading Data
It’s been nearly five years since I wrote about Getting Started with Hadoop for big data. In those years, there have been incremental improvements in columnar file formats and dramatic computation speed improvements with Apache Spark, but I still wouldn’t call the Hadoop ecosystem convenient for actual data analysis. During this same time period, thanks to NVIDIA and their CUDA library for general-purpose calculations on GPUs, graphics cards went from enabling visuals on a computer to enabling massively-parallel calculations as well.
Static Blog: Jekyll, Hyde and GitHub Pages
This post is a short tutorial on setting up a static Jekyll blog using GitHub pages.
Moravec's Paradox
||
Counting Efficiently with Bounter pt. 2: CountMinSketch
In my previous post on the new open source Python Bounter library we discussed how we can use its HashTable to quickly count approximate item frequencies in very large item sequences. Now we turn our attention to the second algorithm in Bounter, CountMinSketch (CMS), which is also optimized in C for top performance.
k-server, part 3: entropy regularization for weighted k-paging
If you have been following the first two posts (post 1, post 2), now is time to reap the rewards! I will show here how to obtain a
-competitive algorithm for (weighted) paging, i.e., when the metric space corresponds to the leafs of a weighted star. This was viewed as a breakthrough result 10 years ago (with a JACM publication by Bansal, Buchbinder and Naor in 2012), and for good reasons as this simplest instance of
-server was in fact the one studied in the seminal paper by Sleator and Tarjan in 1985 which introduced the competitive analysis of online algorithms (actually to be precise Sleator and Tarjan considered the unweighted case, for which a
algorithm was known much before).
Neural Networks and the generalisation problem
Over the last few weeks, a robust debate has been taking place online about the prospects that Deep Learning neural networks would lead to advances in the quest for Artificial General Intelligence.