In my previous post, I showed how to use cdata
package along with ggplot2
‘s faceting facility to compactly plot two related graphs from the same data. This got me thinking: can I use cdata
to produce a ggplot2
version of a scatterplot matrix, or pairs plot?
A pairs plot compactly plots every (numeric) variable in a dataset against every other one. In base plot, you would use the pairs()
function. Here is the base version of the pairs plot of the iris
dataset:
1 |
|
There are other ways to do this, too:
1 |
|
But I wanted to see if cdata
was up to the task. So here we go….
First, load the packages:
1 |
|
To create the pairs plot in ggplot2
, I need to reshape the data appropriately. For cdata
, I need to specify what shape I want the data to be in, using a control table. See the last post for how the control table works. For this task, creating the control table is slightly more involved.
Here, I specify the variables I want to plot.
1 |
|
The function expand_grid()
returns a data frame of all combinations of its arguments; in this case, I want all pairs of variables.
1 |
|
1 |
|
The control table specifies that for every row of iris
, sixteen new rows get produced, one for each possible pair of variables. The column pair_key
will be the key column in the new data frame; there’s one key level for every possible pair of variables. The columns x
and y
will be the value columns in the new data frame — these will be the columns that we plot.
Now I can create the new data frame, using rowrecs_to_blocks()
. I’ll also carry along the Species
column to color the points in the plot.
1 |
|
1 |
|
Note that the data is now sixteen times larger, which I admit is perverse.
If I didn’t care about how the individual subplots were arranged, I’d be done: I’d plot y
vs x
, and facet_wrap
on pair_key
. But I want the subplots arranged in a grid. To do this I use facet_grid
, which will require two key columns:
1 |
|
1 |
|
And now I can produce the graph, using facet_grid
.
1 |
|
This pair plot has x = y
plots on the diagonals instead of the names of the variables, but you can confirm that it is otherwise the same as the pair plot produced by pairs()
.
Of course, calling pairs()
(or ggpairs()
, or splom()
) is a lot easier than all this, but now I’ve proven to myself that cdata
with ggplot2
can do the job. This version does have a few advantages. It comes with a legend by default, which is nice. And it’s not obvious how to change the color palette in ggpairs()
— I prefer the Brewer Dark2 palette, myself.
Luckily, this code is straightforward to wrap as a function, so if you like the cdata
version, I’ve now added the PairPlot()
function to WVPlots
. Now it’s a one-liner, too.
1 |
|
Related