“discover feature relationships” – new EDA tool

I’ve built a new Exploratory Data Analysis tool, I used it in a few presentations last year with the code on github and have now (finally) published it to PyPI.

The goal is to quickly check in a DataFrame using machine learning (sklearn’s Random Forests) if any column predicts any other column. I’m interested in the question “what relationships exist in my data” – particularly if I’m working in an unknown domain and on new data. I’ve used this on client projects during the discovery phase to learn more about the sort of questions I should ask a client.

The GitHub Readme includes a screenshot which will give you an idea using the Titanic classification and Boston regression examples.

This is a very light project at the moment, I think the idea has value, I’m very open to feedback.