In my previous post, I showed how to use cdata package along with ggplot2‘s faceting facility to compactly plot two related graphs from the same data. This got me thinking: can I use cdata to produce a ggplot2 version of a scatterplot matrix, or pairs plot?
A pairs plot compactly plots every (numeric) variable in a dataset against every other one. In base plot, you would use the pairs() function. Here is the base version of the pairs plot of the iris dataset:
1 | |

There are other ways to do this, too:
1 | |
But I wanted to see if cdata was up to the task. So here we go….
First, load the packages:
1 | |
To create the pairs plot in ggplot2, I need to reshape the data appropriately. For cdata, I need to specify what shape I want the data to be in, using a control table. See the last post for how the control table works. For this task, creating the control table is slightly more involved.
Here, I specify the variables I want to plot.
1 | |
The function expand_grid() returns a data frame of all combinations of its arguments; in this case, I want all pairs of variables.
1 | |
1 | |
The control table specifies that for every row of iris, sixteen new rows get produced, one for each possible pair of variables. The column pair_key will be the key column in the new data frame; there’s one key level for every possible pair of variables. The columns x and y will be the value columns in the new data frame — these will be the columns that we plot.
Now I can create the new data frame, using rowrecs_to_blocks(). I’ll also carry along the Species column to color the points in the plot.
1 | |
1 | |
Note that the data is now sixteen times larger, which I admit is perverse.
If I didn’t care about how the individual subplots were arranged, I’d be done: I’d plot y vs x, and facet_wrap on pair_key. But I want the subplots arranged in a grid. To do this I use facet_grid, which will require two key columns:
1 | |
1 | |
And now I can produce the graph, using facet_grid.
1 | |

This pair plot has x = y plots on the diagonals instead of the names of the variables, but you can confirm that it is otherwise the same as the pair plot produced by pairs().
Of course, calling pairs() (or ggpairs(), or splom()) is a lot easier than all this, but now I’ve proven to myself that cdata with ggplot2 can do the job. This version does have a few advantages. It comes with a legend by default, which is nice. And it’s not obvious how to change the color palette in ggpairs() — I prefer the Brewer Dark2 palette, myself.
Luckily, this code is straightforward to wrap as a function, so if you like the cdata version, I’ve now added the PairPlot() function to WVPlots. Now it’s a one-liner, too.
1 | |
Like this:
Like Loading…
Related