This post was originally published as a tutorial for DataCamp here on September 12 2017 using NetworkX 1.11. On September 20 2017, NetworkX announced the release of a new
version 2.0, after two years in the making. While 2.0 introduces lots of great features (some have already been used to improve this project in postman_problems), it also introduced
backwards incompatible API changes that broke the original tutorial :(. I’ve commented out lines deprecated by 2.0 and tagged with # deprecated after NX 1.11
, so the changes made here are
explicit. Most of the changes are around the passing and setting of attributes and return values deprecating lists for generators.
So… TL;DR:
-
This is the NetworkX 2.0 compatible version of the Chinese Postman DataCamp tutorial originally posted here.
-
The ideas introduced in this tutorial are packaged into the postman_problems library which is the mature implementation of these concepts.
A note on the making of this post. The original post was created in a Jupyter notebook and converted to HTML with some style tweaks by the DataCamp publishing team. This post was converted from an updated notebook to a Jekyll flavored markdown document for my blog using nb2jekyll with just a few tweaks of my own. This was the first Jupyter notebook I’ve converted to a blog post, but the conversion was smoother than I might have expected. I would recommend nb2jekyll and this post to comrade Jekyll bloggers looking to generate posts directly from Jupyter notebooks.
Intro to Graph Optimization with NetworkX in Python
Solving the Chinese Postman Problem
With this tutorial, you’ll tackle an established problem in graph theory called the Chinese Postman Problem. There are some components of the algorithm that while conceptually simple, turn out to be computationally rigorous. However, for this tutorial, only some prior knowledge of Python is required: no rigorous math, computer science or graph theory background is needed.
This tutorial will first go over the basic building blocks of graphs (nodes, edges, paths, etc) and solve the problem on a real graph (trail network of a state park) using the NetworkX library in Python. You’ll focus on the core concepts and implementation. For the interested reader, further reading on the guts of the optimization are provided.
Motivating Graph Optimization
The Problem
You’ve probably heard of the Travelling Salesman Problem which amounts to finding the shortest route (say, roads) that connects a set of nodes (say, cities). Although lesser known, the Chinese Postman Problem (CPP), also referred to as the Route Inspection or Arc Routing problem, is quite similar. The objective of the CPP is to find the shortest path that covers all the links (roads) on a graph at least once. If this is possible without doubling back on the same road twice, great; That’s the ideal scenario and the problem is quite simple. However, if some roads must be traversed more than once, you need some math to find the shortest route that hits every road at least once with the lowest total mileage.
Personal Motivation
(The following is a personal note: cheesy, cheeky and 100% not necessary for learning graph optimization in Python)
I had a real-life application for solving this problem: attaining the rank of Giantmaster Marathoner.
What is a Giantmaster? A Giantmaster is one (canine or human) who has hiked every trail of Sleeping Giant State Park in Hamden CT (neighbor to my hometown of Wallingford)… in their lifetime. A Giantmaster Marathoner is one who has hiked all these trails in a single day.
Thanks to the fastidious record keeping of the Sleeping Giant Park Association, the full roster of Giantmasters and their level of Giantmastering can be found here. I have to admit this motivated me quite a bit to kick-start this side-project and get out there to run the trails. While I myself achieved Giantmaster status in the winter of 2006 when I was a budding young volunteer of the Sleeping Giant Trail Crew (which I was pleased to see recorded in the SG archive), new challenges have since arisen. While the 12-month and 4-season Giantmaster categories are impressive and enticing, they’d also require more travel from my current home (DC) to my formative home (CT) than I could reasonably manage… and they’re not as interesting for graph optimization, so Giantmaster Marathon it is!
For another reference, the Sleeping Giant trail map is provided below:
Introducing Graphs
The nice thing about graphs is that the concepts and terminology are generally intuitive. Nonetheless, here’s some of the basic lingo:
Graphs are structures that map relations between objects. The objects are referred to as nodes and the connections between them as edges in this tutorial. Note that edges and nodes are commonly referred to by several names that generally mean exactly the same thing:
1 |
|
The starting graph is undirected. That is, your edges have no orientation: they are bi-directional. For example: A<--->B == B<--->A
.
By contrast, the graph you might create to specify the shortest path to hike every trail could be a directed graph, where the order and direction of edges matters. For example: A—>B !=
B—>A.
The graph is also an edge-weighted graph where the distance (in miles) between each pair of adjacent nodes represents the weight of an edge. This is handled as an edge attribute named “distance”.
Degree refers to the number of edges incident to (touching) a node. Nodes are referred to as odd-degree nodes when this number is odd and even-degree when even.
The solution to this CPP problem will be a Eulerian tour: a graph where a cycle that passes through every edge exactly once can be made from a starting node back to itself (without backtracking). An Euler Tour is also known by several names:
1 |
|
A matching is a subset of edges in which no node occurs more than once. A minimum weight matching finds the matching with the lowest possible summed edge weight.
NetworkX: Graph Manipulation and Analysis
NetworkX is the most popular Python package for manipulating and analyzing graphs. Several packages offer the same basic level of graph manipulation, notably igraph which also has bindings for R and C++. However, I found that NetworkX had the strongest graph algorithms that I needed to solve the CPP.
Installing Packages
If you’ve done any sort of data analysis in Python or have the Anaconda distribution, my guess is you probably have pandas
and matplotlib
. However, you might not have networkx
. These should be
the only dependencies outside the Python Standard Library that you’ll need to run through this tutorial. They are easy to install with pip
:
1 |
|
These should be all the packages you’ll need for now. imageio
and numpy
are imported at the very end to create the GIF animation of the CPP solution. The animation is embedded within this post,
so these packages are optional.
1 |
|
Load Data
Edge List
The edge list is a simple data structure that you’ll use to create the graph. Each row represents a single edge of the graph with some edge attributes.
-
node1 & node2: names of the nodes connected.
-
trail: edge attribute indicating the abbreviated name of the trail for each edge. For example: rs = red square
-
distance: edge attribute indicating trail length in miles.
-
color: trail color used for plotting.
estimate: edge attribute indicating whether the edge distance is estimated from eyeballing the trailmap (1=yes, 0=no) as some distances are not provided. This is solely for reference; it is not used for analysis.
1 |
|
Preview edgelist
edgelist.head(10)
1 |
|
Grab node list data hosted on Gist
nodelist = pd.read_csv(‘https://gist.githubusercontent.com/brooksandrew/f989e10af17fb4c85b11409fea47895b/raw/a3a8da0fa5b094f1ca9d82e1642b384889ae16e8/nodelist_sleeping_giant.csv’)
1 |
|
id | X | Y | |
---|---|---|---|
0 | b_bv | 1486 | 732 |
1 | b_bw | 716 | 1357 |
2 | b_end_east | 3164 | 1111 |
3 | b_end_west | 141 | 1938 |
4 | b_g | 1725 | 771 |
Create Graph
Now you use the edge list and the node list to create a graph object in networkx
.
1 |
|
Loop through the rows of the edge list and add each edge and its corresponding attributes to graph g
.
1 |
|
To illustrate what’s happening here, let’s print the values from the last row in the edge list that got added to graph g
:
1 |
|
o_gy2 y_gy2 {‘estimate’: 0, ‘distance’: 0.12, ‘color’: ‘yellowgreen’, ‘trail’: ‘gy2’}
1 |
|
Add node attributes
for i, nlrow in nodelist.iterrows(): # g.node[nlrow[‘id’]] = nlrow[1:].to_dict() # deprecated after NX 1.11 nx.set_node_attributes(g, {nlrow[‘id’]: nlrow[1:].to_dict()})
1 |
|
Node list example
print(nlrow)
1 |
|
Inspect Graph
Edges
Your graph edges are represented by a list of tuples of length 3. The first two elements are the node names linked by the edge. The third is the dictionary of edge attributes.
1 |
|
[(‘v_end_west’, ‘b_v’, {‘color’: ‘violet’, ‘distance’: 0.13, ‘estimate’: 0, ‘trail’: ‘v’}), (‘rt_end_north’, ‘v_rt’, {‘color’: ‘red’, ‘distance’: 0.3, ‘estimate’: 0, ‘trail’: ‘rt’}), (‘b_o’, ‘park_east’, {‘color’: ‘orange’, ‘distance’: 0.11, ‘estimate’: 0, ‘trail’: ‘o’}), (‘b_o’, ‘o_gy2’, {‘color’: ‘orange’, ‘distance’: 0.06, ‘estimate’: 0, ‘trail’: ‘o’}), (‘b_o’, ‘b_y’, {‘color’: ‘blue’, ‘distance’: 0.08, ‘estimate’: 0, ‘trail’: ‘b’})]
1 |
|
Preview first 10 nodes
g.nodes(data=True)[0:10] # deprecated after NX 1.11
list(g.nodes(data=True))[0:10]
1 |
|
Summary Stats
Print out some summary statistics before visualizing the graph.
1 |
|
of edges: 123
of nodes: 77
1 |
|
Define node positions data structure (dict) for plotting
node_positions = {node[0]: (node[1][‘X’], -node[1][‘Y’]) for node in g.nodes(data=True)}
Preview of node_positions with a bit of hack (there is no head/slice method for dictionaries).
dict(list(node_positions.items())[0:5])
1 |
|
Colors: Now you manipulate the edge colors from the graph into a simple list so that you can visualize the trails by their color.
1 |
|
[‘violet’, ‘red’, ‘orange’, ‘orange’, ‘blue’, ‘blue’, ‘red’, ‘red’, ‘black’, ‘black’]
1 |
|
plt.figure(figsize=(8, 6)) nx.draw(g, pos=node_positions, edge_color=edge_colors, node_size=10, node_color=’black’) plt.title(‘Graph Representation of Sleeping Giant Trail Map’, size=15) plt.show()
1 |
|
Calculate list of nodes with odd degree
nodes_odd_degree = [v for v, d in g.degree_iter() if d % 2 == 1] # deprecated after NX 1.11
nodes_odd_degree = [v for v, d in g.degree() if d % 2 == 1]
Preview
nodes_odd_degree[0:5]
1 |
|
Counts
print(‘Number of nodes of odd degree: {}’.format(len(nodes_odd_degree))) print(‘Number of total nodes: {}’.format(len(g.nodes())))
1 |
|
CPP Step 2: Find Min Distance Pairs
This is really the meat of the problem. You’ll break it down into 5 parts:
-
Compute all possible pairs of odd degree nodes.
-
Compute the shortest path between each node pair calculated in 1.
-
Create a complete graph connecting every node pair in 1. with shortest path distance attributes calculated in 2.
-
Compute a minimum weight matching of the graph calculated in 3. (This boils down to determining how to pair the odd nodes such that the sum of the distance between the pairs is as small as possible).
-
Augment the original graph with the shortest paths between the node pairs calculated in 4.
Step 2.1: Compute Node Pairs
You use the itertools combination
function to compute all possible pairs of the odd degree nodes. Your graph is undirected, so we don’t care about order: For example, (a,b) == (b,a)
.
1 |
|
[(‘v_end_west’, ‘rt_end_north’), (‘v_end_west’, ‘rh_end_north’), (‘v_end_west’, ‘rh_end_tt_1’), (‘v_end_west’, ‘o_y_tt_end_west’), (‘v_end_west’, ‘b_v’), (‘v_end_west’, ‘y_gy2’), (‘v_end_west’, ‘nature_end_west’), (‘v_end_west’, ‘y_rh’), (‘v_end_west’, ‘g_gy2’), (‘v_end_west’, ‘b_bv’)]
1 |
|
Let’s confirm that this number of pairs is correct with a the combinatoric below. Luckily, you only have 630 pairs to worry about. Your computation time to solve this CPP example is trivial (a couple seconds).
However, if you had 3,600 odd node pairs instead, you’d have ~6.5 million pairs to optimize. That’s a ~10,000x increase in output given a 100x increase in input size.
Step 2.2: Compute Shortest Paths between Node Pairs
This is the first step that involves some real computation. Luckily networkx
has a convenient implementation of Dijkstra’s algorithm to compute the shortest path between two nodes. You apply
this function to every pair (all 630) calculated above in odd_node_pairs
.
1 |
|
Compute shortest paths. Return a dictionary with node pairs keys and a single value equal to shortest path distance.
odd_node_pairs_shortest_paths = get_shortest_paths_distances(g, odd_node_pairs, ‘distance’)
Preview with a bit of hack (there is no head/slice method for dictionaries).
dict(list(odd_node_pairs_shortest_paths.items())[0:10])
1 |
|
Step 2.3: Create Complete Graph
A complete graph is simply a graph where every node is connected to every other node by a unique edge.
Here’s a basic example from Wikipedia of a 7 node complete graph with 21 (7 choose 2) edges:
The graph you create below has 36 nodes and 630 edges with their corresponding edge weight (distance).
create_complete_graph
is defined to calculate it. The flip_weights
parameter is used to transform the distance
to the weight
attribute where smaller numbers reflect large distances and high
numbers reflect short distances. This sounds a little counter intuitive, but is necessary for Step 2.4 where you calculate the minimum weight matching on the complete graph.
Ideally you’d calculate the minimum weight matching directly, but NetworkX only implements a max_weight_matching
function which maximizes, rather than minimizes edge weight. We hack this a bit by
negating (multiplying by -1) the distance
attribute to get weight
. This ensures that order and scale by distance are preserved, but reversed.
1 |
|
Generate the complete graph
g_odd_complete = create_complete_graph(odd_node_pairs_shortest_paths, flip_weights=True)
Counts
print(‘Number of nodes: {}’.format(len(g_odd_complete.nodes()))) print(‘Number of edges: {}’.format(len(g_odd_complete.edges())))
1 |
|
For a visual prop, the fully connected graph of odd degree node pairs is plotted below. Note that you preserve the X, Y coordinates of each node, but the edges do not necessarily represent actual trails. For example, two nodes could be connected by a single edge in this graph, but the shortest path between them could be 5 hops through even degree nodes (not shown here).
1 |
|
Step 2.4: Compute Minimum Weight Matching
This is the most complex step in the CPP. You need to find the odd degree node pairs whose combined sum (of distance between them) is as small as possible. So for your problem, this boils down to selecting the optimal 18 edges (36 odd degree nodes / 2) from the hairball of a graph generated in 2.3.
Both the implementation and intuition of this optimization are beyond the scope of this tutorial… like 800+ lines of code and a body of academic literature beyond this scope.
However, a quick aside for the interested reader:
A huge thanks to Joris van Rantwijk for writing the orginal implementation on his blog way back in 2008. I stumbled into the problem a similar way with the same intention as Joris. From Joris’s 2008 post:
Since I did not find any Perl implementations of maximum weighted matching, I lightly decided to write some code myself. It turned out that I had underestimated the problem, but by the time I realized my mistake, I was so obsessed with the problem that I refused to give up.
However, I did give up. Luckily Joris did not.
This Maximum Weight Matching has since been folded into and maintained within the NetworkX package. Another big thanks to the 10+ contributors on GitHub who have maintained this hefty codebase.
This is a hard and intensive computation. The first breakthrough in 1965 proved that the Maximum Matching problem could be solved in polynomial time. It was published by Jack Edmonds with perhaps one of the most beautiful academic paper titles ever: “Paths, trees, and flowers” [1]. A body of literature has since built upon this work, improving the optimization procedure. The code implemented in the NetworkX function max_weight_matching is based on Galil, Zvi (1986) [2] which employs an O(n3) time algorithm.
1 |
|
Number of edges in matching: 36
1 |
|
Preview of matching with dupes
odd_matching_dupes
1 |
|
You convert this dictionary to a list of tuples since you have an undirected graph and order does not matter. Removing duplicates yields the unique 18 edge-pairs that cumulatively sum to the least possible distance.
1 |
|
Number of edges in matching (deduped): 18
1 |
|
[(‘o_tt’, ‘rh_end_tt_2’), (‘nature_end_west’, ‘o_y_tt_end_west’), (‘b_tt_3’, ‘rt_end_north’), (‘rs_end_south’, ‘y_gy2’), (‘b_bw’, ‘rh_end_tt_1’), (‘rd_end_south’, ‘v_end_west’), (‘b_bv’, ‘v_bv’), (‘b_end_west’, ‘b_v’), (‘rh_end_south’, ‘y_rh’), (‘g_gy1’, ‘rc_end_north’), (‘rc_end_south’, ‘y_gy1’), (‘rd_end_north’, ‘rh_end_north’), (‘rt_end_south’, ‘y_rt’), (‘rh_end_tt_3’, ‘rh_end_tt_4’), (‘b_end_east’, ‘g_gy2’), (‘rs_end_north’, ‘v_end_east’), (‘g_w’, ‘w_bw’), (‘o_rt’, ‘o_w_1’)]
1 |
|
plt.figure(figsize=(8, 6))
Plot the complete graph of odd-degree nodes
nx.draw(g_odd_complete, pos=node_positions, node_size=20, alpha=0.05)
Create a new graph to overlay on g_odd_complete with just the edges from the min weight matching
g_odd_complete_min_edges = nx.Graph(odd_matching) nx.draw(g_odd_complete_min_edges, pos=node_positions, node_size=20, edge_color=’blue’, node_color=’red’)
plt.title(‘Min Weight Matching on Complete Graph’) plt.show()
1 |
|
plt.figure(figsize=(8, 6))
Plot the original trail map graph
nx.draw(g, pos=node_positions, node_size=20, alpha=0.1, node_color=’black’)
Plot graph to overlay with just the edges from the min weight matching
nx.draw(g_odd_complete_min_edges, pos=node_positions, node_size=20, alpha=1, node_color=’red’, edge_color=’blue’)
plt.title(‘Min Weight Matching on Orginal Graph’) plt.show()
1 |
|
def add_augmenting_path_to_graph(graph, min_weight_pairs): “”” Add the min weight matching edges to the original graph Parameters: graph: NetworkX graph (original graph from trailmap) min_weight_pairs: list[tuples] of node pairs from min weight matching Returns: augmented NetworkX graph “””
# We need to make the augmented graph a MultiGraph so we can add parallel edges graph_aug = nx.MultiGraph(graph.copy()) for pair in min_weight_pairs: graph_aug.add_edge(pair[0], pair[1], **{‘distance’: nx.dijkstra_path_length(graph, pair[0], pair[1]), ‘trail’: ‘augmented’} # attr_dict={‘distance’: nx.dijkstra_path_length(graph, pair[0], pair[1]), # ‘trail’: ‘augmented’} # deprecated after 1.11 ) return graph_aug
1 |
|
Create augmented graph: add the min weight matching edges to g
g_aug = add_augmenting_path_to_graph(g, odd_matching)
Counts
print(‘Number of edges in original graph: {}’.format(len(g.edges()))) print(‘Number of edges in augmented graph: {}’.format(len(g_aug.edges())))
1 |
|
Let’s also confirm that every node now has even degree:
1 |
|
4 54 2 18 6 5 dtype: int64
1 |
|
As expected, the length of the naive Eulerian circuit is equal to the number of the edges in the augmented graph.
print(‘Length of eulerian circuit: {}’.format(len(naive_euler_circuit)))
1 |
|
The output is just a list of tuples which represent node pairs. Note that the first node of each pair is the same as the second node from the preceding pair.
1 |
|
[(‘b_end_east’, ‘b_y’), (‘b_y’, ‘y_gy2’), (‘y_gy2’, ‘rs_end_south’), (‘rs_end_south’, ‘y_rs’), (‘y_rs’, ‘y_gy2’), (‘y_gy2’, ‘o_gy2’), (‘o_gy2’, ‘o_rs’), (‘o_rs’, ‘o_w_2’), (‘o_w_2’, ‘w_rc’), (‘w_rc’, ‘y_rc’)]
1 |
|
def create_eulerian_circuit(graph_augmented, graph_original, starting_node=None): “"”Create the eulerian path using only edges from the original graph.””” euler_circuit = [] naive_circuit = list(nx.eulerian_circuit(graph_augmented, source=starting_node))
for edge in naive_circuit: edge_data = graph_augmented.get_edge_data(edge[0], edge[1])
if edge_data[0][‘trail’] != ‘augmented’:
# If edge
exists in original graph, grab the edge attributes and add to eulerian circuit.
edge_att = graph_original[edge[0]][edge[1]]
euler_circuit.append((edge[0], edge[1], edge_att))
else:
aug_path = nx.shortest_path(graph_original, edge[0], edge[1], weight=’distance’)
aug_path_pairs = list(zip(aug_path[:-1], aug_path[1:]))
print(‘Filling in edges for augmented edge: {}’.format(edge)) print(‘Augmenting path: {}’.format(‘ => ‘.join(aug_path))) print(‘Augmenting path pairs: {}\n’.format(aug_path_pairs))
# If edge
does not exist in original graph, find the shortest path between its nodes and
# add the edge attributes for each link in the shortest path.
for edge_aug in aug_path_pairs:
edge_aug_att = graph_original[edge_aug[0]][edge_aug[1]]
euler_circuit.append((edge_aug[0], edge_aug[1], edge_aug_att))
return euler_circuit
1 |
|
Create the Eulerian circuit
euler_circuit = create_eulerian_circuit(g_aug, g, ‘b_end_east’)
1 |
|
You see that the length of the Eulerian circuit is longer than the naive circuit, which makes sense.
print(‘Length of Eulerian circuit: {}’.format(len(euler_circuit)))
1 |
|
CPP Solution
Text
Here’s a printout of the solution in text:
1 |
|
0 (‘b_end_east’, ‘b_y’, {‘estimate’: 0, ‘distance’: 1.32, ‘color’: ‘blue’, ‘trail’: ‘b’}) 1 (‘b_y’, ‘y_gy2’, {‘estimate’: 0, ‘distance’: 0.28, ‘color’: ‘yellow’, ‘trail’: ‘y’}) 2 (‘y_gy2’, ‘y_rs’, {‘estimate’: 0, ‘distance’: 0.16, ‘color’: ‘yellow’, ‘trail’: ‘y’}) 3 (‘y_rs’, ‘rs_end_south’, {‘estimate’: 0, ‘distance’: 0.39, ‘color’: ‘red’, ‘trail’: ‘rs’}) 4 (‘rs_end_south’, ‘y_rs’, {‘estimate’: 0, ‘distance’: 0.39, ‘color’: ‘red’, ‘trail’: ‘rs’}) 5 (‘y_rs’, ‘y_gy2’, {‘estimate’: 0, ‘distance’: 0.16, ‘color’: ‘yellow’, ‘trail’: ‘y’}) 6 (‘y_gy2’, ‘o_gy2’, {‘estimate’: 0, ‘distance’: 0.12, ‘color’: ‘yellowgreen’, ‘trail’: ‘gy2’}) 7 (‘o_gy2’, ‘o_rs’, {‘estimate’: 0, ‘distance’: 0.33, ‘color’: ‘orange’, ‘trail’: ‘o’}) 8 (‘o_rs’, ‘o_w_2’, {‘estimate’: 0, ‘distance’: 0.15, ‘color’: ‘orange’, ‘trail’: ‘o’}) 9 (‘o_w_2’, ‘w_rc’, {‘estimate’: 0, ‘distance’: 0.23, ‘color’: ‘gray’, ‘trail’: ‘w’}) 10 (‘w_rc’, ‘y_rc’, {‘estimate’: 0, ‘distance’: 0.14, ‘color’: ‘red’, ‘trail’: ‘rc’}) 11 (‘y_rc’, ‘rc_end_south’, {‘estimate’: 0, ‘distance’: 0.36, ‘color’: ‘red’, ‘trail’: ‘rc’}) 12 (‘rc_end_south’, ‘y_rc’, {‘estimate’: 0, ‘distance’: 0.36, ‘color’: ‘red’, ‘trail’: ‘rc’}) 13 (‘y_rc’, ‘y_gy1’, {‘estimate’: 0, ‘distance’: 0.18, ‘color’: ‘yellow’, ‘trail’: ‘y’}) 14 (‘y_gy1’, ‘y_rc’, {‘estimate’: 0, ‘distance’: 0.18, ‘color’: ‘yellow’, ‘trail’: ‘y’}) 15 (‘y_rc’, ‘y_rs’, {‘estimate’: 0, ‘distance’: 0.53, ‘color’: ‘yellow’, ‘trail’: ‘y’}) 16 (‘y_rs’, ‘o_rs’, {‘estimate’: 0, ‘distance’: 0.12, ‘color’: ‘red’, ‘trail’: ‘rs’}) 17 (‘o_rs’, ‘w_rs’, {‘estimate’: 0, ‘distance’: 0.21, ‘color’: ‘red’, ‘trail’: ‘rs’}) 18 (‘w_rs’, ‘b_w’, {‘estimate’: 1, ‘distance’: 0.06, ‘color’: ‘gray’, ‘trail’: ‘w’}) 19 (‘b_w’, ‘b_gy2’, {‘estimate’: 0, ‘distance’: 0.41, ‘color’: ‘blue’, ‘trail’: ‘b’})
1 |
|
Computing some stats
total_mileage_of_circuit = sum([edge[2][‘distance’] for edge in euler_circuit]) total_mileage_on_orig_trail_map = sum(nx.get_edge_attributes(g, ‘distance’).values()) _vcn = pd.value_counts(pd.value_counts([(e[0]) for e in euler_circuit]), sort=False) node_visits = pd.DataFrame({‘n_visits’: _vcn.index, ‘n_nodes’: _vcn.values}) _vce = pd.value_counts(pd.value_counts([sorted(e)[0] + sorted(e)[1] for e in nx.MultiDiGraph(euler_circuit).edges()])) edge_visits = pd.DataFrame({‘n_visits’: _vce.index, ‘n_edges’: _vce.values})
Printing stats
print(‘Mileage of circuit: {0:.2f}’.format(total_mileage_of_circuit)) print(‘Mileage on original trail map: {0:.2f}’.format(total_mileage_on_orig_trail_map)) print(‘Mileage retracing edges: {0:.2f}’.format(total_mileage_of_circuit-total_mileage_on_orig_trail_map)) print(‘Percent of mileage retraced: {0:.2f}%\n’.format((1-total_mileage_of_circuit/total_mileage_on_orig_trail_map)*-100))
print(‘Number of edges in circuit: {}’.format(len(euler_circuit))) print(‘Number of edges in original graph: {}’.format(len(g.edges()))) print(‘Number of nodes in original graph: {}\n’.format(len(g.nodes())))
print(‘Number of edges traversed more than once: {}\n’.format(len(euler_circuit)-len(g.edges())))
print(‘Number of times visiting each node:’) print(node_visits.to_string(index=False))
print(‘\nNumber of times visiting each edge:’) print(edge_visits.to_string(index=False))
1 |
|
Visualize CPP Solution
While NetworkX also provides functionality to visualize graphs, they are notably humble in this department:
NetworkX provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization. In the future, graph visualization functionality may be removed from NetworkX or only available as an add-on package.
Proper graph visualization is hard, and we highly recommend that people visualize their graphs with tools dedicated to that task. Notable examples of dedicated and fully-featured graph visualization tools are Cytoscape, Gephi, Graphviz and, for LaTeX typesetting, PGF/TikZ.
That said, the built-in NetworkX drawing functionality with matplotlib is powerful enough for eyeballing and visually exploring basic graphs, so you stick with NetworkX draw
for this tutorial.
I used graphviz and the dot graph description language to visualize the solution in my Python package postman_problems. Although it took some legwork to convert the NetworkX graph structure to a dot graph, it does unlock enhanced quality and control over visualizations.
Create CPP Graph
Your first step is to convert the list of edges to walk in the Euler circuit into an edge list with plot-friendly attributes.
create_cpp_edgelist
Creates an edge list with some additional attributes that you’ll use for plotting:
-
sequence: records a sequence of when we walk each edge.
-
visits: is simply the number of times we walk a particular edge.
1 |
|
Let’s create the CPP edge list:
cpp_edgelist = create_cpp_edgelist(euler_circuit)
1 |
|
Number of edges in CPP edge list: 123
1 |
|
Preview CPP plot-friendly edge list
cpp_edgelist[0:3]
1 |
|
Now let’s make the graph:
1 |
|
Visualization 1: Retracing Steps
Here you illustrate which edges are walked once (gray) and more than once (blue). This is the “correct” version of the visualization created in 2.4 which showed the naive (as the crow flies) connections between the odd node pairs (red). That is corrected here by tracing the shortest path through edges that actually exist for each pair of odd degree nodes.
If the optimization is any good, these blue lines should represent the least distance possible. Specifically, the minimum distance needed to generate a matching of the odd degree nodes.
1 |
|
Visualization 2: CPP Solution Sequence
Here you plot the original graph (trail map) annotated with the sequence numbers in which we walk the trails per the CPP solution. Multiple numbers indicate trails we must double back on.
You start on the blue trail in the bottom right (0th and the 157th direction).
1 |
|
Visualization 3: Movie
The movie below that traces the Euler circuit from beginning to end is embedded below. Edges are colored black the first time they are walked and red the second time.
Note that this gif doesn’t do give full visual justice to edges which overlap another or are too small to visualize properly. A more robust visualization library such as graphviz could address this by plotting splines instead of straight lines between nodes.
The code that creates it is presented below as a reference.
First a PNG image is produced for each direction (edge walked) from the CPP solution.
1 |
|
Then the the PNG images are stitched together to make the nice little gif above.
First the PNGs are sorted in the order from 0 to 157. Then they are stitched together using imageio
at 3 frames per second to create the gif.
1 |
|
Next Steps
Congrats, you have finished this tutorial solving the Chinese Postman Problem in Python. You have covered a lot of ground in this tutorial (33.6 miles of trails to be exact). For a deeper dive into network fundamentals, you might be interested in Datacamp’s Network Analysis in Python course which provides a more thorough treatment of the core concepts.
Don’t hesitate to check out the NetworkX documentation for more on how to create, manipulate and traverse these complex networks. The docs are comprehensive with a good number of examples and a series of tutorials.
If you’re interested in solving the CPP on your own graph, I’ve packaged the functionality within this tutorial into the postman_problems Python package on Github. You can also piece together the code blocks from this tutorial with a different edge and node list, but the postman_problems package will probably get you there more quickly and cleanly.
One day I plan to implement the extensions of the CPP (Rural and Windy Postman Problem) here as well. I also have grand ambitions of writing about these extensions and experiences testing the routes out on the trails on my blog here. Another application I plan to explore and write about is incorporating lat/long coordinates to develop (or use) a mechanism to send turn-by-turn directions to my Garmin watch.
And of course one last next step: getting outside and trail running the route!
References
1: Edmonds, Jack (1965). “Paths, trees, and flowers”. Canad. J. Math. 17: 449–467. 2: Galil, Z. (1986). “Efficient algorithms for finding maximum matching in graphs”. ACM Computing Surveys. Vol. 18, No. 1: 23-38.