Manipulate dates easily with {lubridate}

There are several helpful functions included in {lubridate} to convert columns to dates. For instanceif the column you want to convert is of the form “2012-11-21”, then you would use the function ymd(),for “year-month-day”. If, however the column is “2012-21-11”, then you would use ydm(). There’sa few of these helper functions, and they can handle a lot of different formats for dates. In our case,having the name of the month instead of the number might seem quite problematic, but it turns outthat this is a case that {lubridate} handles painfully:

1
library(lubridate)
1
2
## 
## Attaching package: 'lubridate'
1
2
3
## The following object is masked from 'package:base':
## 
## date
1
2
independence <- independence %>%
 mutate(independence_date = dmy(independence_date))
1
## Warning: 5 failed to parse.

Some dates failed to parse, for instance for Morocco. This is because these countries have severalindependence dates; this means that the string to convert looks like:

1
2
3
4
"2 March 1956
7 April 1956
10 April 1958
4 January 1969"

which obviously cannot be converted by {lubridate} without further manipulation. I ignore these cases forsimplicity’s sake.

Let’s take a look at the data now:

1
independence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## # A tibble: 54 x 6
## country colonial_name colonial_power independence_da… first_head_of_s…
## 
## 1 Liberia Liberia United States 1847-07-26 Joseph Jenkins …
## 2 South … Cape Colony … United Kingdom 1910-05-31 Louis Botha 
## 3 Egypt Sultanate of… United Kingdom 1922-02-28 Fuad I 
## 4 Eritrea Italian Erit… Italy 1947-02-10 Haile Selassie 
## 5 Libya British Mili… United Kingdo… 1951-12-24 Idris 
## 6 Sudan Anglo-Egypti… United Kingdo… 1956-01-01 Ismail al-Azhari
## 7 Tunisia French Prote… France 1956-03-20 Muhammad VIII a…
## 8 Morocco French Prote… France Spain NA Mohammed V 
## 9 Ghana Gold Coast United Kingdom 1957-03-06 Kwame Nkrumah 
## 10 Guinea French West … France 1958-10-02 Ahmed Sékou Tou…
## # ... with 44 more rows, and 1 more variable:
## # independence_won_through 

As you can see, we now have a date column in the right format. We can now answer questions such asWhich countries gained independence before 1960? quite easily, by using the functions year(),month() and day(). Let’s see which countries gained independence before 1960:

1
2
3
independence %>%
 filter(year(independence_date) <= 1960) %>%
 pull(country)
1
2
3
4
5
6
7
8
9
10
11
12
## [1] "Liberia" "South Africa" 
## [3] "Egypt" "Eritrea" 
## [5] "Libya" "Sudan" 
## [7] "Tunisia" "Ghana" 
## [9] "Guinea" "Cameroon" 
## [11] "Togo" "Mali" 
## [13] "Madagascar" "Democratic Republic of the Congo"
## [15] "Benin" "Niger" 
## [17] "Burkina Faso" "Ivory Coast" 
## [19] "Chad" "Central African Republic" 
## [21] "Republic of the Congo" "Gabon" 
## [23] "Mauritania"

You guessed it, year() extracts the year of the date column and converts it as a numeric so that we can workon it. This is the same for month() or day(). Let’s try to see if countries gained their independence onChristmas Eve:

1
2
3
4
independence %>%
 filter(month(independence_date) == 12,
 day(independence_date) == 24) %>%
 pull(country)
1
## [1] "Libya"

Seems like Libya was the only one! You can also operate on dates. For instance, let’s compute the difference betweentwo dates, using the interval() column:

1
2
3
4
independence %>%
 mutate(today = lubridate::today()) %>%
 mutate(independent_since = interval(independence_date, today)) %>%
 select(country, independent_since)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
## # A tibble: 54 x 2
## country independent_since 
## 
## 1 Liberia 1847-07-26 UTC--2018-12-15 UTC
## 2 South Africa 1910-05-31 UTC--2018-12-15 UTC
## 3 Egypt 1922-02-28 UTC--2018-12-15 UTC
## 4 Eritrea 1947-02-10 UTC--2018-12-15 UTC
## 5 Libya 1951-12-24 UTC--2018-12-15 UTC
## 6 Sudan 1956-01-01 UTC--2018-12-15 UTC
## 7 Tunisia 1956-03-20 UTC--2018-12-15 UTC
## 8 Morocco NA--NA 
## 9 Ghana 1957-03-06 UTC--2018-12-15 UTC
## 10 Guinea 1958-10-02 UTC--2018-12-15 UTC
## # ... with 44 more rows

The independent_since column now contains an interval object that we can convert to years:

1
2
3
4
5
independence %>%
 mutate(today = lubridate::today()) %>%
 mutate(independent_since = interval(independence_date, today)) %>%
 select(country, independent_since) %>%
 mutate(years_independent = as.numeric(independent_since, "years"))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
## # A tibble: 54 x 3
## country independent_since years_independent
## 
## 1 Liberia 1847-07-26 UTC--2018-12-15 UTC 171. 
## 2 South Africa 1910-05-31 UTC--2018-12-15 UTC 109. 
## 3 Egypt 1922-02-28 UTC--2018-12-15 UTC 96.8
## 4 Eritrea 1947-02-10 UTC--2018-12-15 UTC 71.8
## 5 Libya 1951-12-24 UTC--2018-12-15 UTC 67.0
## 6 Sudan 1956-01-01 UTC--2018-12-15 UTC 63.0
## 7 Tunisia 1956-03-20 UTC--2018-12-15 UTC 62.7
## 8 Morocco NA--NA NA 
## 9 Ghana 1957-03-06 UTC--2018-12-15 UTC 61.8
## 10 Guinea 1958-10-02 UTC--2018-12-15 UTC 60.2
## # ... with 44 more rows

We can now see for how long the last country to gain independence has been independent.Because the data is not tidy (in some cases, an African country was colonized by two powers,see Libya), I will only focus on 4 European colonial powers: Belgium, France, Portugal and the United Kingdom:

1
2
3
4
5
6
7
independence %>%
 filter(colonial_power %in% c("Belgium", "France", "Portugal", "United Kingdom")) %>%
 mutate(today = lubridate::today()) %>%
 mutate(independent_since = interval(independence_date, today)) %>%
 mutate(years_independent = as.numeric(independent_since, "years")) %>%
 group_by(colonial_power) %>%
 summarise(last_colony_independent_for = min(years_independent, na.rm = TRUE))
1
2
3
4
5
6
7
## # A tibble: 4 x 2
## colonial_power last_colony_independent_for
## 
## 1 Belgium 56.5
## 2 France 41.5
## 3 Portugal 43.1
## 4 United Kingdom 42.5

{lubridate} contains many more functions. If you often work with dates, duration or interval data, {lubridate}is a package that you have to master.

Hope you enjoyed! If you found this blog post useful, you might want to followme on twitter for blog post updates andbuy me an espresso or paypal.me.