Faceted Graphs with cdata and ggplot2

In between client work, John and I have been busy working on our book, Practical Data Science with R, 2nd Edition. To demonstrate a toy example for the section I’m working on, I needed scatter plots of the petal and sepal dimensions of the iris data, like so:

I wanted a plot for petal dimensions and sepal dimensions, but I also felt that two plots took up too much space. So, I thought, why not make a faceted graph that shows both:

Except — which columns do I plot and what do I facet on?

1
2
3
4
5
6
7
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa

Here’s one way to create the plot I want, using the cdata package along with ggplot2.

First, load the packages and data:

1
2
3
4
library("ggplot2")
library("cdata")

iris <- data.frame(iris)

Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the x and y columns of the plot (call these the value columns of the data frame) and the column that I am faceting by (call this the key column of the data frame). And I also need to specify how the key and value columns relate to the existing columns of the original data frame.

Here’s what the control table looks like:

The control table specifies that the new data frame will have the columns flower_part, Length and Width. Every row of iris will produce two rows in the new data frame: one with a flower_part value of Petal, and another with a flower_part value of Sepal. The Petal row will take the Petal.Length and Petal.Width values in the Length and Width columns respectively. Similarly for the Sepal row.

Here I create the control table in R, using the convenience function wrapr::build_frame() to create the controlTable data frame in a legible way.

1
2
3
4
(controlTable <- wrapr::build_frame(
 "flower_part", "Length" , "Width" |
 "Petal" , "Petal.Length", "Petal.Width" |
 "Sepal" , "Sepal.Length", "Sepal.Width" ))
1
2
3
## flower_part Length Width
## 1 Petal Petal.Length Petal.Width
## 2 Sepal Sepal.Length Sepal.Width

Now I apply the transform to iris using the function rowrecs_to_blocks(). I also want to carry along the Species column so I can color the scatterplot points by species.

1
2
3
4
5
6
iris_aug <- rowrecs_to_blocks(
 iris,
 controlTable,
 columnsToCopy = c("Species"))

head(iris_aug)
1
2
3
4
5
6
7
## Species flower_part Length Width
## 1 setosa Petal 1.4 0.2
## 2 setosa Sepal 5.1 3.5
## 3 setosa Petal 1.4 0.2
## 4 setosa Sepal 4.9 3.0
## 5 setosa Petal 1.3 0.2
## 6 setosa Sepal 4.7 3.2

And now I can create the plot!

1
2
3
4
5
ggplot(iris_aug, aes(x=Length, y=Width)) +
 geom_point(aes(color=Species, shape=Species)) + 
 facet_wrap(~flower_part, labeller = label_both, scale = "free") +
 ggtitle("Iris dimensions") + 
 scale_color_brewer(palette = "Dark2")

In the next post, I will show how to use cdata and ggplot2 to create a scatterplot matrix.

Like this:

Like Loading…

Related