Data Trusts
Data is at its most powerful when it is interconnected. A major challenge for modern data is interconnection of different data types to obtain a fuller picture of the data subject. Questions about an individual’s mental health, for example, might benefit from interlinking social media with the medical record. Obviously, such data would be extremely sensitive.
A tour of Factor: 2
Parsing words
Finding Similar Sounding Names – Some Basics
Since my wife and I have a baby on the way, we’ve spent a lot of time thinking about names lately. We’ve poured through dozens of lists of thousands of names, we’ve used sites and other tools, we’ve researched histories - everything. And we’ve found that most of the tools weren’t terribly helpful.
Blending independent estimates
Using Xcode with Github
You’ve found a nice open-source project you want to play with on GitHub. You’ve cloned it to your own repository and use Xcode 7 as your development environment. How do you make Xcode and GitHub play nicely with each other?
How to make a good data-driven web app
Developing a successful app or project is no easy task; there are always more moving parts than you’d expect. Even beyond the technical pieces, there are the ever-important elements of getting the word out, making your app easy to use, and making sure it’s solving the right problem in the first place.
Maximum Likelihood estimates follow a normal distribution
I was quite surprised when I learnt that a maximum likelihood estimate follows asymptotically a normal distribution with the mean being the estimated value and the variance being the inverse of the Fisher Information multiplied by the number of observations.
Hyperparameter optimization with approximate gradient
TL;DR: I describe a method for hyperparameter optimization by gradient descent.
Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
In a previous post, I outlined how to load daily Adobe Analytics Clickstream data feeds into a PostgreSQL database. While this isn’t a long-term scalable solution for large e-commerce companies doing millions of page views per day, for exploratory analysis a relational database structure can work well until a more robust solution is put into place (such as Hadoop/Spark).