In between client work, John and I have been busy working on our book, Practical Data Science with R, 2nd Edition. To demonstrate a toy example for the section I’m working on, I needed scatter plots of the petal and sepal dimensions of the iris
data, like so:
I wanted a plot for petal dimensions and sepal dimensions, but I also felt that two plots took up too much space. So, I thought, why not make a faceted graph that shows both:
Except — which columns do I plot and what do I facet on?
1 |
|
Here’s one way to create the plot I want, using the cdata
package along with ggplot2
.
First, load the packages and data:
1 |
|
Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the x
and y
columns of the plot (call these the value columns of the data frame) and the column that I am faceting by (call this the key column of the data frame). And I also need to specify how the key and value columns relate to the existing columns of the original data frame.
Here’s what the control table looks like:
The control table specifies that the new data frame will have the columns flower_part
, Length
and Width
. Every row of iris
will produce two rows in the new data frame: one with a flower_part
value of Petal
, and another with a flower_part
value of Sepal
. The Petal
row will take the Petal.Length
and Petal.Width
values in the Length
and Width
columns respectively. Similarly for the Sepal
row.
Here I create the control table in R, using the convenience function wrapr::build_frame()
to create the controlTable
data frame in a legible way.
1 |
|
1 |
|
Now I apply the transform to iris
using the function rowrecs_to_blocks()
. I also want to carry along the Species
column so I can color the scatterplot points by species.
1 |
|
1 |
|
And now I can create the plot!
1 |
|
In the next post, I will show how to use cdata
and ggplot2
to create a scatterplot matrix.
Related