At the PyCon 2018 conference, I presented a tutorial called “Using pandas for Better (and Worse) Data Science”. Through a series of exercises, I demonstrated best practices with pandas to help students become more fluent at using pandas to answer data science questions and avoid data science errors.
I split the tutorial into 10 videos. The first video introduces the tutorial and the dataset, and the other nine videos contain the exercises we discuss. I recommend that you watch the videos in order:
-
Introducing the dataset (19:40)
-
Removing columns (6:27)
-
Comparing groups (8:42)
-
Examining relationships (8:44)
-
Handling missing values (5:02)
-
Using string methods (5:55)
-
Combining dates and times (9:11)
-
Plotting a time series (8:48)
-
Creating useful plots (8:47)
-
Fixing bad data (16:31)
If you want to follow along with the exercises at home, you can download the dataset and code from GitHub. The dataset was collected by the Stanford Open Policing Project, and includes a decade of traffic stop data from the state of Rhode Island.
This is an intermediate tutorial, so if you’re brand new to pandas, I recommend that you start with my other video series, Easier data analysis in Python with pandas.
Please enjoy the series, and I hope to hear from you in the comments section!
1. Introducing the dataset (19:40)
This video covers the following topics: reading a CSV file, DataFrame shape, data types, NaN
, missing values, booleans.
2. Removing columns (6:27)
This video covers the following topics: missing values, dropping a column, axis
parameter, inplace
parameter, dropna
method.
3. Comparing groups (8:42)
This video covers the following topics: filtering a DataFrame, value_counts
method, normalization, groupby
method.
4. Examining relationships (8:44)
This video covers the following topics: value_counts
method, math with booleans, groupby
with multiple columns, correlation versus causation.
5. Handling missing values (5:02)
This video covers the following topics: math with booleans, value_counts
method, filtering a DataFrame, dropna
parameter.
6. Using string methods (5:55)
This video covers the following topics: searching strings, math with booleans, value_counts
method, dropna
parameter.
7. Combining dates and times (9:11)
This video covers the following topics: string slicing, string concatenation, converting to datetime format, datetime attributes, value_counts
method.
8. Plotting a time series (8:48)
This video covers the following topics: math with booleans, groupby
method, datetime attributes, line plots.
9. Creating useful plots (8:47)
This video covers the following topics: datetime attributes, value_counts
method, line plots, sorting, groupby
method.
10. Fixing bad data (16:31)
This video covers the following topics: value_counts
method, filtering by multiple conditions, missing values, NaN
, loc
accessor, SettingWithCopyWarning
.
P.S. Want to be the first to know when I launch an online course about pandas? Subscribe to the Data School newsletter.