It’s not the large things that send a man to the madhouse. Death he’s
ready for, or murder, incest, robbery, fire, flood… No, it’s the continuing
series of small tragedies that send a man to the madhouse… Not the death
of his love but a shoelace that snaps with no time left…
I absolutely love the R language; I think it’s both amazingly powerful and also very simple to work with. Moreover, R has essentially become the lingua franca for statistical computing. The depth and breath of its statistical packages are unrivaled.
In the last few years we’ve seen the popularity of R increase quite a bit. Of course, there are a multitude of reasons behind the growth, but at least one reason would have to be the monumentally popular packages by Hadley Wickham. His packages—ggplot2, dplyr, tidyr, to name a few—are among the most popular of any R packages, and have truly changed the way that R code is written.
As the quote above suggests, this post is about minor annoyances. I was recently annoyed by a not-so-obvious feature of the
tibble data type (Hadley’s answer to the
data.frame). In addition to an odd name,
tibble type objects have a few nice features that traditional R dataframes do not (e.g., column type information, sane printing behavior, etc). But as I recently discovered, they also have a surprising nuance. Consider the following code.
R> library(dplyr) R> dat <- data.frame("x1" = c(31, 43, 59), "x2" = c("a", "b", "c"), "x3" = c(7, 8, 9)) R> dat_tib <- as_data_frame(dat) R> dat_tib[1, 1]
What does the code above do, and what makes it annoying? I’m glad you asked about that. The code above creates a
data.frame object with three columns that are cleverly called “x1”, “x2”, and “x3”. Next, we use the
as_data_frame() function to create an object of type
Incidentally, I don’t love the fact that the function
as_data_frame() returns an object of type
tibble, but let’s save that for another time. Instead, I’m bothered by what
dat_tib[1, 1] returns. You might have the exceedingly reasonable expectation that
dat_tib[1, 1] would return the first element of the first column in
dat_tib, which in this case would be the value 31.
Well, I’m afraid to say that your reasonable expectation is not correct. The call
dat_tib[1, 1] actually returns a 1-by-1
tibble with a single element: 31. If you want the actual value 31 returned, you would need to use either double brackets commonly used with
list objects in R (i.e.,
dat_tib[[1, 1]]), or you’d need the dollar-sign notation for column indexing (i.e.,
This nuance isn’t exactly a secret; it’s actually prominently displayed in the docs. But I wasn’t aware of it until recently, and I found myself baffled by the result of an expression along the lines of
dat_tib[1, 1] == 31. Of course, the expression was returning
FALSE because the object on the left-hand side was a
tibble, and thus, clearly not equal to the value 31.
There is a lot to be said for consistency. And from that perspective, the fact that the single-bracket notation always returns a
tibble is quite nice—even if it disagrees with
data.frame object conventions. However, from a practical point of view, I would have thought it was widely accepted that when trying to get a single element from an array-like object, you’re interested in the actual value. You’re probably not interested in getting a 1-by-1 slice of the original array.
I suppose this is a good reminder to always read the docs. And be careful when you tie your shoes.