FeaturedIT topics

Tidy eval in R: A simple example

Tidy evaluation in R, or tidy eval for short, is a pretty complex topic. But for some specific uses, it’s not all that complex. One important task that tidy eval helps handle is both useful and easy: incorporating functions from packages such as dplyr and ggplot2 inside your own custom functions.

Let me go through an example. Using the ubiquitous mtcars sample data set, here’s how I might do a scatterplot of miles per gallon by weight, using dplyr to filter for higher-mpg cars only (I’ve also added a few style tweaks):

graph_title <- "MPG by Weight (thousands lbs) for higher MPG cars"
filter(mtcars, mpg >= 20) %>%
ggplot(aes(x=wt, y=mpg)) +
geom_point(color = "darkblue") +
theme_hc() + xlab("") + ylab("") +
ggtitle(graph_title) +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
geom_smooth(method = 'lm', formula = y~x, se = FALSE, linetype = "dotted", color = "darkgrey")

Next, I’d like to create a function with this code, so I can easily reuse these customizations to plot other data, not just mtcars. That’s where problems arise that tidy eval can solve.

Here’s a first try at creating that function. It simply wraps my graph code in a new function:

fun1 <- function(mydf, mytitle, myxcol, myycol, minyval = 20){
dplyr::filter(mydf, myycol >= minyval) %>%
ggplot(aes(x=myxcol, y=myycol)) +
geom_point(color = "darkblue") +
theme_hc() + xlab("") + ylab("") +
ggtitle(mytitle) +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
geom_smooth(method = 'lm', formula = y~x, se = FALSE, linetype = "dotted", color = "darkgrey")

To use this function, it’s pretty straightforward to add the graph title, the data frame, and a default minimum value as arguments. But you’ll run into problems with the data frame column names.

If I use wt and mpg unquoted as arguments in my function, like ggplot does, I get an error. You’ll see this if you run fun1(mydf = mtcars, mytitle =" MPG by Weight (thousands of pounds", wt, mpg)

That’s because my function is looking for objects called wt and mpg, and no such standalone objects exist. ggplot and dplyr understand that unquoted arguments aren’t standalone objects, but “regular” R doesn’t.

If I try wt and mpg as quoted arguments—you can do this as well with

 fun1(mydf = mtcars, mytitle =" MPG by Weight (thousands of pounds", "wt", "mpg")

I get a graph, but not the graph I want.

Sharon Machlis/IDG

Not the graph I was hoping to see

That’s because my function now thinks these arguments are character strings, not representations of mtcars data columns.

If I try using the full columns for mtcars$wt and mtcars$mpg as arguments, such as in the code below, I get another error.

fun1(mydf = mtcars, mytitle =" MPG by Weight (thousands of pounds", mtcars$wt, mtcars$mpg)

The problem here is that tidyverse functions use tidy evaluation. They don’t evaluate the value of a variable right away. But my regular R function does evaluate the value of an argument right away, because it uses standard evaluation of variables.

I somehow need my function to do the same thing as the tidyverse functions do, and use tidy evaluation.

So how do I do that?

I need what I sometimes think of as a metavariable—a special kind of variable that refers to another variable instead of containing one or more values.

That’s something the rlang package calls a “quosure.”  Hadley Wickham much more accurately describes a quosure as capturing both an R expression and that expression’s environment. One thing that’s special about a quosure, as opposed to a regular variable, is that if you use a quosure in your code, R won’t try to get the value of what’s in there until you tell it to.

In conventional R programming—and most programming—I can create a variable x that refers to 3, with simple code such as x <- 3. Any time I want to use the value 3 in my code, I can use actual 3 or, if I’ve assigned 3 to the variable x, I can use x to represent that value. R will immediately evaluated the value of x. If you type x ^ 2 in the console (after assigning 3 to the variable x), you get the value of 3 squared. Using a variable in general makes code more reusable.

Quosure behavior is different. You can create a quosure by using the rlang package’s enquo() function. To create a quosure named x2 that refers to my “regular” variable x, I can use the code

x2 <- enquo(x)

If you type x2 in the R console, you no longer see the simple value inside, the way it works with typing x. Instead, it shows a more complex structure:

expr: ^3
env:  empty

To access the value of what that quosure contains, you have to tell R when and how to do that. I sometimes think of the quosure having a timing element, so your code doesn’t try to get what’s inside it until you say so. Interactively in the console, you can use quo_get_expr() to see a quosure’s stored value, such as  quo_get_expr(x2) .

But inside a function you’re writing, you want the quosure’s value and its environment. You access both with what’s called the “bang bang” operator: 2 exclamation marks. Any time you need the full expression that’s stored inside a quosure, just refer to the quosure object with !! in front.

One last tip: When you’re writing an R function that needs quosures, make sure to create those quosures before you need those variables in your code.

Here’s a revised function to generate my graph, with changes in italics:

fun2 <- function(mydf, mytitle, myxcol, myycol, minyval = 20){
myxcol_quosure <- rlang::enquo(myxcol)
myycol_quosure <- rlang::enquo(myycol)
dplyr::filter(mydf, !!myycol_quosure >= minyval) %>%
ggplot(aes(x=!!myxcol_quosure, y=!!myycol_quosure)) +
geom_point(color = "darkblue") +
theme_hc() + xlab("") + ylab("") +
ggtitle(mytitle) +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
geom_smooth(method = 'lm', formula = y~x, se = FALSE,
linetype = "dotted", color = "darkgrey")

The first two lines of the function create quosures; the next two lines use bang bangs to use the quosures.

This function now works with unquoted variables if you run:

fun2(mydf = mtcars, mytitle ="MPG by Weight (thousands of pounds)", myxcol = wt, myycol = mpg)

You should see a graph like this:

ggplot2 graph of mpg by weightSharon Machlis/IDG

Graph of mpg by weight using a function with quosures

This is a very simple case. I hope that Lionel Henry and Hadley Wickham, authors of the rlang package, aren’t cringing in horror at this explanation if they’re reading. But it works for me for this basic use case. I hope it works for you, too.

There’s a lot more to know about tidy eval, and more operators in the rlang package. You can read more from RStudio at tidyeval.tidyverse.org.

Related Articles

Back to top button