R Tip: be wary of “
The following code example contains an easy error in using the R function
Notice none of the novel values from
vec2 are present in the result. Our mistake was: we (improperly) tried to use
unique() with multiple value arguments, as one would use
union(). Also notice no error or warning was signaled. We used
unique() incorrectly and nothing pointed this out to us. What compounded our error was
...” function signature feature.
In this note I will talk a bit about how to defend against this kind of mistake. I am going to apply the principle that a design that makes committing mistakes more difficult (or even impossible) is a good thing, and not a sign of carelessness, laziness, or weakness. I am well aware that every time I admit to making a mistake (I have indeed made the above mistake) those who claim to never make mistakes have a laugh at my expense. Honestly I feel the reason I see more mistakes is I check a lot more.
Data science coding is often done in a rush (deadlines, wanting to see results, and so on). Instead of moving fast, let’s take the time to think a bit about programming theory using a very small concrete issue. This lets us show how one can work a bit safer (saving time in the end), without sacrificing notational power or legibility.
A confounding issue is:
unique() failed to alert me of my mistake because,
unique()‘s function signature (like so many
R functions) includes a “
...” argument. I might have been careless or in a rush, but it seems like
unique was also in a rush and did not care to supply argument inspection.
R a function that includes a “
...” in its signature will accept arbitrary arguments and values in addition to the ones explicitly declared and documented. There are three primary uses of “
R: accepting unknown arguments that are to be passed to later functions, building variadic functions, and forcing later arguments to be bound by name (my favorite use). Unfortunately, “
...” is also often used to avoid documenting arguments and turns off a lot of very useful error checking.
An example of the “accepting unknown arguments” use is
lapply() passes what it finds in “
...” to whatever function it is working with. For example:
Notice the arguments “”!”, sep = ““” were passed on to
lapply() can not know what function the user will use ahead of time it uses the “
...” abstraction to forward arguments. Personally I never use this form and tend to write the somewhat more explicit and verbose style shown below.
I feel this form is more readable as the arguments are seen where they are actually used. (Note: this, is a notional example- in practice we would use “
paste0(c("a", "b"), "!")” to directly produce the result as a vector.)
An example of using “
...” to supply a variadic interface is
Other important examples include
c(). In fact I like
c() as they only take a “
...” and no other arguments. Being variadic is so important to
c() is that is essentially all they do. One can often separate out the variadic intent with lists or vectors as in:
Even I don’t write code such as the above (that is too long even for me), unless the values are coming from somewhere else (such as a variable). However with
wrapr‘s reduce/expand operator we can completely separate the collection of variadic arguments and their application. The notation looks like the following:
Essentially reduce/expand calls variadic functions with items taken from a vector or list as individual arguments (allowing one to program easily over variadic functions).
%.|% is intended to place values in the “
...” slot of a function (the variadic term). It is designed for a more perfect world where when a function declares “
...” in its signature it is then the only user facing part of the signature. This is hardly ever actually the case in
R as common functions such as
sum() have additional optional named arguments (which we are here leaving at their default values), whereas
list() are pure (take only “
With a few non-standard (name capturing) and variadic value constructors one does not in fact need other functions to be name capturing or variadic. With such tools one can have these conveniences everywhere. For example we can convert our incorrect use of
unique() into correct code using
In the above code roles are kept separate:
c() is collecting values and
unique() is applying a calculation. We don’t need a powerful “super unique” or “super union” function,
unique() is good enough if we remember to use
In the spirit of our earlier article on function argument names we have defined a convenience function
wrapr::uniques() that enforces the use of value carrying arguments. With
wrapr::uniques() if one attempts the mistake I have been discussing one immediately gets a signaling error (instead of propagating incorrect calculations forward).
uniques() we either get the right answer, or we get immediately stopped at the mistaken step. This is a good way to work.