Motivation
Recently I spent a considerable amount of time writing the postman_problems python library implementing solvers for the Chinese and Rural Postman Problems (CPP and RPP respectively). I wrote about my initial motivation for the project: finding the optimal route through a trail system in a state park here. Although I’ve still yet to run the 34 mile optimal trail route, I am pleased with the optimization procedure. However, I couldn’t help but feel that all those nights and weekends hobbying on this thing deserved a more satisfying visual than my static SVGs and hacky GIF. So to spice it up, I decided to solve the RPP on a graph derived from geodata and visualize on an interactive Leaflet map.
The Problem
In short, ride all 50 state named avenues in DC end-to-end following the shortest route possible.
There happens to be an annual 50 states ride sponsored by our regional bike association, WABA, that takes riders to each of the 50† state named avenues in DC. Each state’s avenue is touched, but not covered in full. This problem takes it a step further by instituting this requirement. Thus, it boils to the RPP where the required edges are state avenues (end-to-end) and the optional edges are every other road within DC city limits.
For those unfamiliar with DC street naming convention, that can (and should) be remedied with a read through the history behind the street system here. Seriously, it’s an interesting read. Basically there are 50 state named avenues in DC ranging from 0.3 miles (Indiana Avenue) to 10 miles (Massachusetts Avenue) comprising 115 miles in total.
The Solution
The data is grabbed from Open Street Maps (OSM). Most of the post is spent wrangling the OSM geodata into shape for the RPP algorithm using NetworkX graphs. The final route (and intermediate steps) are visualized using Leaflet maps through mplleaflet which enable interactivity using tiles from Mapbox and CartoDB among others.
Note to readers: the rendering of these maps can work the browser pretty hard; allow a couple extra seconds for loading.
The Approach
Most of the heavy lifting leverages functions from the graph.py module in the postman_problems_examples repo. The majority of pre-RPP processing employs heuristics that simplify the computation such that this code can run in a reasonable amount of time. The parameters employed here, which I believe get pretty darn close to the optimal solution, run in about 50 minutes. By tweaking a couple parameters, accuracy can be sacrificed for time to get run time down to ~5 minutes on a standard 4 core laptop.
Verbose technical details about the guts of each step are omitted from this post for readability. However the interested reader can find these in the docstrings in graph.py.
The table of contents below provides the best high-level summary of the approach. All code needed to reproduce this analysis is in the postman_problems_examples repo, including the jupyter notebook used to author this blog post and a conda environment.
† While there are 50 roadways, there are technically only 48 state named avenues: Ohio Drive and California Street are the stubborn exceptions.
Table of Contents
1 |
|
0: Get the data
There are many ways to grab Open Street Map (OSM) data, since it’s, well, open. I grabbed the DC map from GeoFabrik here.
1: Load OSM to NetworkX
While some libraries like OSMnx provide an elegant interface to downloading, transforming and manipulating OSM data in NetworkX, I decided to start with the raw
data itself. I adopted an OSM-to-nx parser from a hodge podge of Gists (here and there) to read_osm
.
read_osm
creates a directed graph. However, for this analysis, we’ll use undirected graphs with the assumption that all roads are bidirectional on a bike one way or another.
1 |
|
<class ‘networkx.classes.digraph.DiGraph’> CPU times: user 46.6 s, sys: 2.1 s, total: 48.7 s Wall time: 50.2 s
1 |
|
2: Make Graph w State Avenues only
Generate state avenue names
1 |
|
Most state avenues are written in the long form (ex. Connecticut Avenue Northwest). However, some, such as Florida Ave NW, are written in the short form. To be safe, we grab any permutation OSM could throw at us.
1 |
|
[‘Alabama Ave Southeast’, ‘Alabama Ave Southwest’, ‘Alabama Ave Northeast’, ‘Alabama Ave Northwest’, ‘Alabama Ave SE’, ‘Alabama Ave SW’, ‘Alabama Ave NE’, ‘Alabama Ave NW’, ‘Alabama Avenue Southeast’, ‘Alabama Avenue Southwest’, ‘Alabama Avenue Northeast’, ‘Alabama Avenue Northwest’, ‘Alabama Avenue SE’, ‘Alabama Avenue SW’, ‘Alabama Avenue NE’, ‘Alabama Avenue NW’, ‘Alaska Ave Southeast’, ‘Alaska Ave Southwest’, ‘Alaska Ave Northeast’, ‘Alaska Ave Northwest’]
1 |
|
g_st = subset_graph_by_edge_name(g, candidate_state_avenue_names)
Add state edge attribute from full streetname (with avenue/drive and quandrant)
for e in g_st.edges(data=True): e[2][‘state’] = e[2][‘name’].rsplit(‘ ‘, 2)[0]
1 |
|
But every state is represented:
1 |
|
Here they are by edge count:
edge_count_by_state
1 |
|
2.1 Viz state avenues
As long as your NetworkX graph has lat
and lon
node attributes, mplleaflet can be used to pretty effortlessly plot your NetworkX graph on an interactive map.
Here’s the map with all the state avenues…
1 |
|
You can even customize with your favorite tiles. For example:
1 |
|
…But there’s a wrinkle. Zoom in on bigger avenues, like New York or Rhode Island, and you’ll notice that there are two parallel edges representing each direction as a separate one-way road. This usually occurs when there are several lanes of traffic in each direction, or physical dividers between directions. Example below:
This is great for OSM and point A to B routing problems, but for the Rural Postman problem it imposes the requirement that each main avenue be cycled twice. We’re not into that.
Example: Rhode Island Ave (parallel edges) vs Florida Ave (single edge)
3. Remove Redundant State Avenues
As it turns out, removing these parallel (redundant) edges is a nontrivial problem to solve. My approach is the following:
-
Build graph with one-way state avenue edges only.
-
For each state avenue, create list of connected components that represent sequences of OSM ways in the same direction (broken up by intersections and turns).
-
Compute distance between each node in a component to every other node in the other candidate components.
-
Identify redundant components as those with the majority of their nodes below some threshold distance away from another component.
-
Build graph without redundant edges.
3.1 Create state avenue graph with one-way edges only
g_st1 = keep_oneway_edges_only(g_st)
1 |
|
fig, ax = plt.subplots(figsize=(1,6))
pos = {k: (g_st1.node[k][‘lon’], g_st1.node[k][‘lat’]) for k in g_st1.nodes()} nx.draw_networkx_edges(g_st1, pos, width=3.0, edge_color=’red’, alpha=0.7)
save viz
mplleaflet.save_html(fig, ‘oneway_state_avenues.html’, tiles=’cartodb_positron’)
1 |
|
There are 163 distinct components in the graph above.
len(comps)
1 |
|
create list of comps (graphs) without kinked nodes
comps_unkinked = create_unkinked_connected_components(comps=comps, bearing_thresh=60)
comps in dict form for easy lookup
comps_dict = {comp.graph[‘id’]:comp for comp in comps_unkinked}
1 |
|
Viz components without kinked nodes
Example: Here’s the Massachusetts Ave example from above after we remove kinked nodes:
Full map: Zoom in on the map below and you’ll see that we split up most of the obvious components that should be. There are a few corner cases that we miss, but I’d estimate we programmatically split about 95% of the components correctly.
1 |
|
3.3 & 3.4 Match connected components
Now that we’ve crafted the right components, we calculate how close (parallel) each component is to one another.
This is a relatively coarse approach, but performs surprisingly well:
1. Find closest nodes from candidate components to each node in each component (pseudo code below):
1 |
|
2. Calculate overlap between components. Using the distances calculated in 1., we say that a node from component C
is matched to a component C_cand
if the distance is less than
thresh_distance
specified in calculate_component_overlap
. 75 meters seemed to work pretty well. Essentially we’re saying these nodes are close enough to be considered interchangeable.
3. Use the node-wise matching calculated in 2. to calculate which components are redundant. If thresh_pct
of nodes in component C
are close enough (within thresh_distance
) to nodes in
component C_cand
, we call C
redundant and discard it.
1 |
|
Viz redundant component solution
The map below visualizes the solution to the redundant parallel edges problem. There are some misses, but overall this simple approach works surprisingly well:
-
red: redundant one-way edges to remove
-
black: one-way edges to keep
-
blue: all state avenues
1 |
|
3.5 Build graph without redundant edges
This is the essentially the graph with just black and blue edges from the map above.
1 |
|
After deduping the redundant edges, our connected component count drops from 246 to 96.
len(list(nx.connected_components(g_st_nd)))
1 |
|
Create graph with contracted edges only
g_st_contracted = create_contracted_edge_graph(graph=g_st_nd, edge_weight=’length’)
1 |
|
print(‘Number of nodes in contracted graph: {}’.format(len(g_st_contracted.nodes()))) print(‘Number of nodes in original graph: {}’.format(len(g_st_nd.nodes())))
1 |
|
4.2 Calculate haversine distance between components
The 345 nodes from the contracted edge graph translate to >100,000 possible node pairings. That means >100,000 distance calculations. While applying a shortest path algorithm over the graph would certainly be more exact, it is painfully slow compared to simple haversine distance. This is mainly due to the high number of nodes and edges in the DC OSM map (over 250k edges).
On my laptop I averaged about 4 shortest path calculations per second. Not too bad for a handful, but 115k would take about 7 hours. Haversine distance, by comparison, churns through 115k in a couple seconds.
1 |
|
4.3 Find minimum distance connectors
This gets a bit tricky. Basically we iterate through the top (closest) candidate pairs of components and connect them iteration-by-iteration with the shortest path edge. We use pre-calculated haversine distance to get in the right ballpark, then refine with true shortest path for the closest 20 candidates. This helps us avoid the scenario where we naively connect two nodes that are geographically close as the crow flies (haversine), but far away via available roads. Two nodes separated by highways or train tracks, for example.
1 |
|
We had 96 components to connect, so it makes sense that we have 95 connectors.
len(connector_edges)
1 |
|
adding connector edges to create one single connected component
for e in connector_edges: g_st_contracted.add_edge(e[0], e[1], distance=e[2][‘distance’], path=e[2][‘path’], required=1, connector=True)
1 |
|
So that leaves us with a single component of 125 miles of required edges to optimize a route through. That means the distance of deduped state avenues alone, without connectors (~112 miles) is just a couple miles away from what Wikipedia reports (115 miles).
print(sum([e[2][‘distance’] for e in g_st_contracted.edges(data=True)])/1609.34)
1 |
|
g1comp = g_st_contracted.copy() for e in g_st_contracted.edges(data=True): if ‘path’ in e[2]: granular_type = ‘connector’ if ‘connector’ in e[2] else ‘state’
# add granular connector edges to graph for pair in list(zip(e[2][‘path’][:-1], e[2][‘path’][1:])): g1comp.add_edge(pair[0], pair[1], granular=’True’, granular_type=granular_type)
# add granular connector nodes to graph for n in e[2][‘path’]: g1comp.add_node(n, lat=g.node[n][‘lat’], lon=g.node[n][‘lon’])
1 |
|
fig, ax = plt.subplots(figsize=(1,6))
g1comp_conn = g1comp.copy() g1comp_st = g1comp.copy()
for e in g1comp.edges(data=True): if (‘granular_type’ not in e[2]) or (e[2][‘granular_type’] != ‘connector’): g1comp_conn.remove_edge(e[0], e[1])
for e in g1comp.edges(data=True): if (‘granular_type’ not in e[2]) or (e[2][‘granular_type’] != ‘state’): g1comp_st.remove_edge(e[0], e[1])
pos = {k: (g1comp_conn.node[k][‘lon’], g1comp_conn.node[k][‘lat’]) for k in g1comp_conn.nodes()} nx.draw_networkx_edges(g1comp_conn, pos, width=5.0, edge_color=’red’)
pos_st = {k: (g1comp_st.node[k][‘lon’], g1comp_st.node[k][‘lat’]) for k in g1comp_st.nodes()} nx.draw_networkx_edges(g1comp_st, pos_st, width=3.0, edge_color=’black’)
save viz
mplleaflet.save_html(fig, ‘single_connected_comp.html’, tiles=’cartodb_positron’)
1 |
|
create list with edge attributes and “from” & “to” nodes
tmp = [] for e in g_st_contracted.edges(data=True): tmpi = e[2].copy() # so we don’t mess w original graph tmpi[‘start_node’] = e[0] tmpi[‘end_node’] = e[1] tmp.append(tmpi)
create dataframe w node1 and node2 in order
eldf = pd.DataFrame(tmp) eldf = eldf[[‘start_node’, ‘end_node’] + list(set(eldf.columns)-{‘start_node’, ‘end_node’})]
1 |
|
start_node | end_node | comp | name | connector | path | distance | required | |
---|---|---|---|---|---|---|---|---|
0 | 3788079770 | 649840206 | 0.0 | Texas Avenue Southeast | NaN | [649840206, 649840209, 649840220, 30100500, 64… | 1244.893849 | 1 |
1 | 3788079770 | 49744479 | NaN | NaN | True | [3788079770, 49751669, 4630443674, 49751671, 4… | 674.458290 | 1 |
2 | 49765126 | 49765287 | 1.0 | Georgia Avenue Northwest | NaN | [49765126, 49765129, 49765130, 49765131, 49765… | 559.251509 | 1 |
5.2 CPP solver
Starting point
I fix the starting node for the solution to OSM node 49765113
which corresponds to (38.917002, -77.0364987): the intersection of New Hampshire Avenue NW, 16th St NW and U St NW… and also
the close to my house:
Solve
1 |
|
5.3: CPP results
The CPP solution covers roughly 390,000 meters, about 242 miles. The optimal CPP route doubles the required distance, doublebacking every edge on average… definitely not ideal.
1 |
|
OrderedDict([(‘distance_walked’, 391523.723281576), (‘distance_doublebacked’, 190249.27638658223), (‘distance_walked_once’, 201274.44689499377), (‘distance_walked_optional’, 0), (‘distance_walked_required’, 391523.723281576), (‘edges_walked’, 688), (‘edges_doublebacked’, 337), (‘edges_walked_once’, 351), (‘edges_walked_optional’, 0), (‘edges_walked_required’, 688)])
1 |
|
%%time dfrpp = create_rpp_edgelist(g_st_contracted=g_st_contracted, graph_full=g_ud, edge_weight=’length’, max_distance=3200)
1 |
|
Check how many optional edges are considered (0=optional, 1=required):
Counter(dfrpp[‘required’])
1 |
|
6.2 RPP solver
Apply the RPP solver to the processed dataset.
1 |
|
CPU times: user 7min 25s, sys: 1.68 s, total: 7min 27s Wall time: 7min 31s
1 |
|
RPP route distance (miles)
print(sum([e[3][‘distance’] for e in circuit_rpp])/1609.34)
1 |
|
OrderedDict([(‘distance_walked’, 260045.958535698), (‘distance_doublebacked’, 58771.511640704295), (‘distance_walked_once’, 201274.4468949937), (‘distance_walked_optional’, 51891.485571184385), (‘distance_walked_required’, 208154.47296451364), (‘edges_walked’, 447), (‘edges_doublebacked’, 96), (‘edges_walked_once’, 351), (‘edges_walked_optional’, 57), (‘edges_walked_required’, 390)])
1 |
|
print(‘Number of edges in RPP circuit (with contracted edges): {}’.format(len(circuit_rpp))) print(‘Number of edges in RPP circuit (with granular edges): {}’.format(rppdf.shape[0]))
1 |
|
6.4 Viz RPP graph
Create RPP granular graph
Add the granular edges (that we contracted for computation) back to the graph.
1 |
|
Visualize RPP solution by edge type
-
black: required state avenue edges
-
red: required non-state avenue edges added to form single component
-
blue: optional non-state avenue roads
1 |
|
Visualize RPP solution by edge walk count
Edge walks per color:
black: 1 magenta: 2 orange: 3
Edges walked more than once are also widened.
This solution feels pretty reasonable with surprisingly little doublebacking. After staring at this for several minutes, I could think of roads I’d prefer not to cycle on, but no obvious shorter paths.
1 |
|
fig, ax = plt.subplots(figsize=(1,12))
pos = {k: (grppviz.node[k][‘lon’], grppviz.node[k][‘lat’]) for k in grppviz.nodes()} e_width = [e[2][‘linewidth’] for e in grppviz.edges(data=True)] e_color = [e[2][‘color_cnt’] for e in grppviz.edges(data=True)] nx.draw_networkx_edges(grppviz, pos, width=e_width, edge_color=e_color, alpha=0.7)
save viz
mplleaflet.save_html(fig, ‘rpp_solution_edge_cnt.html’, tiles=’cartodb_positron’)
1 |
|
fill in RPP solution edgelist with granular nodes
rpplist = [] for ee in circuit_rpp: path = list(zip(ee[3][‘path’][:-1], ee[3][‘path’][1:])) for e in path: rpplist.append({ ‘start_node’: e[0], ‘end_node’: e[1], ‘start_lat’: g_ud.node[e[0]][‘lat’], ‘start_lon’: g_ud.node[e[0]][‘lon’], ‘end_lat’: g_ud.node[e[1]][‘lat’], ‘end_lon’: g_ud.node[e[1]][‘lon’], ‘street_name’: g_ud[e[0]][e[1]].get(‘name’) })
write solution to disk
rppdf = pd.DataFrame(rpplist) rppdf.to_csv(‘rpp_solution.csv’, index=False)
1 |
|
geojson = {‘features’:[], ‘type’: ‘FeatureCollection’} time = 0 path = list(reversed(circuit_rpp[0][3][‘path’]))
for e in circuit_rpp: if e[3][‘path’][0] != path[-1]: path = list(reversed(e[3][‘path’])) else: path = e[3][‘path’]
for n in path: time += 1 doc = {‘type’: ‘Feature’, ‘properties’: { ‘latitude’: g.node[n][‘lat’], ‘longitude’: g.node[n][‘lon’], ‘time’: time, ‘id’: e[3].get(‘id’) }, ‘geometry’:{ ‘type’: ‘Point’, ‘coordinates’: [g.node[n][‘lon’], g.node[n][‘lat’]] } } geojson[‘features’].append(doc)
1 |
|