Sometimes packages have functions that don’t do the things the way you want them to do them and you have to either re-build the function, or work with it as-is and add code around it to solve your issue.
I’ve had to do this recently with the googleway
package and it’s google_distance()
function so I wanted to take you through step by step how I wrote code to go from a single value function to a function that handles many inputs and returns 4 rows per input. I won’t be dwelling on how to write a function specifically, just showing you the workflow I often go through.
Requirements
Key functionality we’ll need today is:
-
googleway for providing the base function
-
the tidyverse, namely purrr and dplyr, for lots of the data manipulation
-
memoise for caching requests so we spend less cash
1 |
|
Google Distance
To calculate distances we can use the google distance API.
This needs a key in order to use it. Note that this service does not have a free tier to use, however it is ~$5 per 1,000 requests and a trial of Google Cloud is available.
1 |
|
Then we need to prep our desired information.
1 |
|
Handling google_distance()
The API is used to working with just a single address at a time so we need to do a bit of prep here to make it work with lots of accounts.
For starters, we can use the memoise package to cache results so if we send the same address multiple times it doesn’t need to go back to the API. Phew, since that API costs money to call!
1 |
|
Giving this a go with a single example, let’s see what google gives us:
1 |
|
1 |
|
The possibly()
function will mean that if there’s an error for a call that it doesn’t break everything and we won’t have to start all over again.
1 |
|
1 |
|
Then to make the function work over multiple addresses, we need to change it slightly. The map()
function will iterate over all the addresses.
1 |
|
1 |
|
So our code is working over multiple cases and handling bad inputs pretty well, but how do we get some meaningful stuff out of it. Looking at the data, we get back a part of a table that contains a response.
1 |
|
1 |
|
We see that there seems to be no way someone can use public transport between the two locations. Perhaps another way of getting there will return a result?
1 |
|
1 |
|
When a commute is possible, we get a response back that includes the number of seconds it might take someone to travel to work for 9am on a Monday.
First of all, we’ll need to reliably extract this information from a batch of repsonses. This takes multiple steps due to the way the API gives us info.
1 |
|
1 |
|
So now we’re going to need to ask about the different transit options for each address to find out the range of values in order to cope with “ZERO_RETURN” records. Once we have this information, we can then use the google_distance_all
function to find out how long it’ll take someone to drive, walk, cycle, or use public transport to travel between two points.
1 |
|
1 |
|
Having this many functions though clutters things up and makes it difficult to refactor and improve things. We should unpack all the functionality into one big function.
1 |
|
1 |
|
I will undoubtedly want to do some cleaning after this and there’s certainly room for improvement on the function but this is a good starting point for getting some data to work with. The iterative way I build functions means I can try to solve a bit at a time – hopefully this will help you when you’re faced with needing to build your own functions.