Rebuilding Map Example With Apply Functions

Yesterday Hadley’s functional programming package purrr was published to CRAN. It is designed to bring convenient functional programming paradigma and add another data manipulation framework to R.

“Where dplyr focusses on data frames, purrr focusses on vectors” – Hadley Wickham in a blogpost

The core of the package consists of map functions, which operate similar to base apply functions. So I tried to rebuild the first example used to present the map function in the blog and Github repo with apply. The example looks like this:

library(purrr)

mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")
##         4         6         8 
## 0.5086326 0.4645102 0.4229655

Let’s do it with apply now:

mtcars %>% 
  split(.$cyl) %>% 
  lapply(function(x) lm(mpg~wt, data = x)) %>% 
  lapply(summary) %>% 
  sapply(function(x) x$r.squared)
##         4         6         8 
## 0.5086326 0.4645102 0.4229655

As you can see, map works with 3 different inputs - function names (exactly like apply), anonymous functions as formulae or character to select elements. Apply on the other hand only accepts functions, but with a little piping voodoo we can also shortcut the anonymous functions like this:

mtcars %>% 
  split(.$cyl) %>% 
  lapply(. %>% lm(mpg~wt, data = .)) %>% 
  lapply(summary) %>% 
  sapply(. %>% .$r.squared)
##         4         6         8 
## 0.5086326 0.4645102 0.4229655

Which is almost as appealing as the map alternative. Checking the speed of both approaches also reveals no significant time differences:

library(microbenchmark)

map <- function() {mtcars %>% split(.$cyl) %>% map(~ lm(mpg ~ wt, data = .)) %>% map(summary) %>% map_dbl("r.squared")}
apply <- function() {mtcars %>% split(.$cyl) %>% lapply(. %>% lm(mpg~wt, data = .)) %>% lapply(summary) %>% sapply(. %>% .$r.squared)}

microbenchmark(map, apply)
## Unit: nanoseconds
##   expr min lq   mean median uq   max neval
##    map  14 27  31.69     27 27   513   100
##  apply  14 27 281.34     27 27 24556   100

Now don’t get me wrong, I don’t want to say, that purrr is worthless or reduntant. I just picked the most basic function of the package and explained what it does by rewriting with apply functions. The map function and their derivatives are convenient alternatives to apply in my eyes without computational overhead. Furthermore the package offers very interesting functions like zip_n and lift. If you love apply and work with lists, you definitely should check out purrr.

comments powered by Disqus