Quoting in R

Many R users appear to be big fans of “code capturing” or “non standard evaluation” (NSE) interfaces. In this note we will discuss quoting and non-quoting interfaces in R.

The above terms are simply talking about interfaces where a name to be used is captured from the source code the user typed, and thus does not need quote marks. For example:

1
2
3
d <- data.frame(x = 1)

d$x
1
1
## [1] 1

Notice both during data.frame creation and column access: the column name is given without quotes and also accessed without quotes.

This differs from using a standard value oriented interface as in the following:

1
1
## [1] 1

A natural reason for R users to look for automatic quoting is: it helps make working with columns in data.frames (R‘s primary data analysis structure) look much like working with variables in the environment. Without the quotes a column name looks very much like a variable name. And thinking of columns as variables is a useful mindset.

Another place implicit quoting shows up is with R‘s “combine” operator where one can write either of the following.

1
2
1
2
1
2
1
2
1
2
## a 
## "b"
1
2
1
2
1
2
1
2
1
2
## a 
## "b"

The wrapr package brings in a new function: qc() or “quoting c()” that gives a very powerful and convenient way to elide quotes.

1
2
3
library(wrapr)

qc(a = b)
1
2
1
2
1
2
1
2
1
2
## a 
## "b"

Notice quotes are not required on either side of the name assignment. Again, eliding quotes is not that big a deal, and not to everyone’s taste. For example I have never seen a Python user feel they are missing anything because they write “{"a" : "b"}” to construct their own named dictionary structure.

That being said, qc() is a very convenient and consistent notation if you do want to work in an NSE style.

For example, if it ever bothered you that dplyr join takes the join column names as a character vector you can use qc() to instead write:

1
2
3
4
5
dplyr::full_join(
 iris, iris, 
 by = qc(Sepal.Length, Sepal.Width, 
 Petal.Length, Petal.Width, 
 Species))

(Actually I very much like that the join takes the columns as a vector, as it is much easier to program over.) I feel the qc() grouping of the columns makes it easier for a reader to see which arguments are the column set than a use of ... would. Please take, as an example, the following dplyr::group_by():

1
2
3
4
5
library(dplyr)

starwars %>%
 group_by(homeworld, species, add = FALSE) %>%
 summarize(mass = mean(mass, na.rm = TRUE))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## # A tibble: 58 x 3
## # Groups: homeworld [?]
## homeworld species mass
## 
## 1 Alderaan Human 64 
## 2 Aleen Minor Aleena 15 
## 3 Bespin Human 79 
## 4 Bestine IV Human 110 
## 5 Cato Neimoidia Neimodian 90 
## 6 Cerea Cerean 82 
## 7 Champala Chagrian NaN 
## 8 Chandrila Human NaN 
## 9 Concord Dawn Human 79 
## 10 Corellia Human 78.5
## # ... with 48 more rows

When coming back to such code later, I find the following notation to be easier to read:

1
2
3
4
5
library(seplyr)

starwars %>%
 group_by_se(qc(homeworld, species), add = FALSE) %>%
 summarize(mass = mean(mass, na.rm = TRUE))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## # A tibble: 58 x 3
## # Groups: homeworld [?]
## homeworld species mass
## 
## 1 Alderaan Human 64 
## 2 Aleen Minor Aleena 15 
## 3 Bespin Human 79 
## 4 Bestine IV Human 110 
## 5 Cato Neimoidia Neimodian 90 
## 6 Cerea Cerean 82 
## 7 Champala Chagrian NaN 
## 8 Chandrila Human NaN 
## 9 Concord Dawn Human 79 
## 10 Corellia Human 78.5
## # ... with 48 more rows

In the above we can clearly see which arguments to the grouping command are intended to be column names, and which are not.

qc() is a powerful NSE tool that annotates and contains where we are expecting quoting behavior. Some possible applications include examples such as the following.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# install many packages
install.packages(qc(testthat, knitr, rmarkdown, R.rsp))

# select columns
iris[, qc(Petal.Length, Petal.Width, Species)]

# control a for-loop
for(col in qc(Petal.Length, Petal.Width)) {
 iris[[col]] <- sqrt(iris[[col]])
}

# control a vapply
vapply(qc(Petal.Length, Petal.Width), 
 function(col) {
 sum(is.na(iris[[col]]))
 }, numeric(1))

The idea is: with qc() the user can switch name capturing notation at will, with no prior-arrangement needed in the functions or packages used. Also the parenthesis in qc() make for more legible code: a reader can see which arguments are being quoted and taken as a group.

As of wrapr 1.7.0 qc() incorporates bquote() functionality. bquote() is R‘s built-in quasi-quotation facility. It was added to R in August of 2003 by Thomas Lumley, and doesn’t get as much attention as it deserves.

A quoting tool such as qc() becomes a quasi-quoting tool if we add a notation that signals we do not wish to quote. In R the standard notation for this is “.()” (Lisp uses a back-tick, and the rlang package uses “!!”). The bquote()-enabled version of qc() lets us write code such as the following.

1
2
3
extra_column = "Species"

qc(Petal.Length, Petal.Width, extra_column)
1
## [1] "Petal.Length" "Petal.Width" "extra_column"
1
qc(Petal.Length, Petal.Width, .(extra_column))
1
## [1] "Petal.Length" "Petal.Width" "Species"

Notice it is un-ambiguous what is going on above. The first qc() quotes all of its arguments into strings. The second works much the same, with the exception of names marked with .(). This ability to “break out” or turn off quoting is convenient if we are working with a combination of values we wish to type in directly and others we wish to take from variables.

qc() allows substitution on the left-hand sides of assignments, if we use the alternate := notation for assignment (a convention put forward by data.table, and later adopted by dplyr).

1
2
3
4
5
6
library(wrapr)

left_name = "a"
right_value = "b"

qc(.(left_name) := .(right_value))
1
2
1
2
1
2
1
2
1
2
## a 
## "b"

The wrapr package also exports an implementation for :=. So one could also write:

1
2
3
library(wrapr)

left_name := right_value
1
2
1
2
1
2
1
2
1
2
## a 
## "b"

The hope is that the qc() and := operators are well behaved enough to commute in the sense the following two statements should return the same value.

1
2
3
library(wrapr)

qc(a := b, c := d)
1
2
1
2
## a c 
## "b" "d"
1
2
1
2
## a c 
## "b" "d"

The idea is: when there is a symmetry it is often evidence you are using the right concepts.

In conclusion: the goal of wrapr::qc() is to put a very regular and controllable quoting facility directly into the hands of the R user. This allows the R user to treat just about any R function or package as if the function or package itself implemented argument quoting and quasi-quotation capabilities.

Related