Inspired by David Schoch’s blog post,Traveling Beerdrinker Problem.Check out his blog, he has some amazing posts!
Introduction
Luxembourg, as any proper European country, is full of castles. According to Wikipedia,
“By some optimistic estimates, there are as many as 130 castles in Luxembourg but more realisticallythere are probably just over a hundred, although many of these could be considered large residencesor manor houses rather than castles”.
I see the editors are probably German or French, calling our castles manor houses! They only saythat because Luxembourg is small, so our castles must be small too, right?
Banter aside, with that many castles, what is the best way to visit them all? And by best way I meanshortest way. This is a classical Travelling salesman problem. To solve this, I need the following elements:
-
A list of castles to visit, with their coordinates
-
The distances between these castles to each other
-
A program to solve the TSP
Let’s start by loading some packages:
1 |
|
First step; scrape the data.
Scraping the data (that’s the data science part)
Let’s start by having a list of castles. For this, I go to the French Wikipedia page ofLuxembourguish castles.
The Luxembourguish page is more exhaustive,but the names are in Luxembourguish, and I doubt thatOpenStreetMap, which I’ll use to get the coordinates, understands Luxembourguish.
This list has around 50 castles, a reasonable amount of castles. Scraping the table is quite easy:
1 |
|
I also add a query
column which concatenates the name of the castle (“Nom”) to where it is found(“Localité”). The query should be a better choice that simply the castle name to get the coordinates.
Now, I need to add the coordinates to this data frame. For this, I use a function I found onlinethat gets the coordinates from OpenStreetMap:
1 |
|
I can now easily add the coordinates by mapping the nominatim_osm()
function to thequery
column I built before:
1 |
|
Let’s take a look at castles_osm
:
1 |
|
1 |
|
I now clean the data. There were several mistakes or castles that where not found, which I addedmanually. I did not notice these mistakes immediately, but when I computed the distances matrix Inotices several inconsistencies; 0’s in positions other than the diagonal, as well as NAs. So I wentback to the raw data and corrected what was wrong, this time by looking at Google Maps. Thankfullythere were not that many mistakes. Below the whole workflow:
1 |
|
In the end, I have 48 castles, 2 of them were not found neither by OpenStreetMap norGoogle Maps.
Now I can get the distances matrix. For this, I opened an account at Graphhopperand used their Matrix API. When you opena free account, you get a standard account for free for two weeks, which was perfect for this littleexercise.
To use the Matrix API you can make a call with curl from your terminal, like this:
1 |
|
To use this from R, I use the {curl}
package and the curl_download()
function to download andwrite the output to disk.
I built the url like this. First, the “points” part:
1 |
|
Click if you want to see the “points” string
1 |
|
1 |
|
Then, I added my key, and pasted these elements together to form the correct url:
1 |
|
Then, I get the matrix like this:
1 |
|
Let’s take a look at the object:
1 |
|
Click if you want to see the distance object
1 |
|
1 |
|
distances
is a list where the first element is the distances from the first castle to all the others.Let’s make it a matrix:
1 |
|
Click if you want to see the distance matrix
1 |
|
1 |
|
Let’s baptize the rows and columns:
1 |
|
Now that we have the data, we can solve the TSP.
Solving the Travelling salesman problem (that’s the combinatorial optimization part)
Let’s first coerce the distances_matrix
to an ATSP
object, which is needed for the solver.ATSP
stands for asymmetrical TSP. Asymmetrical because the distances_matrix
is not symmetric,meaning that going from Castle A to Castle B is longer than going from Castle B to Castle A (forexample).
1 |
|
I then define a list of all the available methods:
1 |
|
And solve the problem with all the methods:
1 |
|
1 |
|
I do this because the results vary depending on the methods, and I want to be exhaustive (solvingthis problem is quite fast, so there’s no reason not to do it):
1 |
|
solutions_df
is a data frame with the order of the castles to visit in rows and the method usedin columns.
Click if you want to see the solutions
1 |
|
1 |
|
Now, let’s extract the tour lengths, see which one is the minimum, then plot it.
1 |
|
1 |
|
The total length of the tour is 474 kilometers(that’s 295 miles). Before plotting the data, let’sre-order it according to the solution:
1 |
|
Plot the solution
To plot the solution, I first use a data frame I created with the longitude and latitude ofLuxembourguish communes, from the geojson
file available on theOpenData Portal.I converted it to a data frame because it is easier to manipulate this way. The code to do that isin the appendix of this blog post:
1 |
|
1 |
|
Now I can use {ggplot2}
to create the map with the tour. I use geom_polygon()
to build the map,geom_point()
to add the castles, geom_path()
to connect the points according to the solution Ifound and geom_point()
again to highlight the starting castle:
1 |
|
The white point is the starting point of the tour. As a bonus, let’s do the same plot withoutpoints, but castles emojis instead (using the {ggimage}
package):
1 |
|
1 |
|
It’s horrible.
Hope you enjoyed! If you found this blog post useful, you might want to followme on twitter for blog post updates andbuy me an espresso.