How to score 0.8134 in Titanic Kaggle Challenge

This post is the opportunity to share my solution with you.

To make this tutorial more “academic” so that anyone could benefit, I will first start with an exploratory data analysis (EDA) then I’ll follow with feature engineering and finally present the predictive model I set up.

Throughout this jupyter notebook, I will be using Python at each level of the pipeline.

The main libraries involved in this tutorial are:

  • Pandas for data manipulation and ingestion

  • Matplotlib and ** seaborn** for data visualization

  • Numpy for multidimensional array computing

  • sklearn for machine learning and predictive modeling

Installation procedure¶

A very easy way to install these packages is to download and install the Conda distribution that encapsulates them all. This distribution is available on all platforms (Windows, Linux and Mac OSX).

Nota Bene¶

This is my first attempt as a blogger and as a machine learning practitioner.

If you have a question about the code or the hypotheses I made, do not hesitate to post a comment in the comment section below.

If you also have a suggestion on how this notebook could be improved, please reach out to me.

This tutorial is available on my github account.

Hope you’ve got everything set on your computer. Let’s get started.