purrr’s pmap function for mapping functions to data

library(purrr)

This post is part of a series lead by the fearless Isabella R. Ghement. In this series we use the #purrrResolution wherein Twitter statisticians and programmers teach themselves and others one new purrr function per week! Come join us!

Great programmers seek leverage. One common path to leverage is by making the language more terse and contextual to the problem at hand. Some call this ‘bending the language to the problem’ or ‘elevating the language to the problem.’ Functional programming tools like dplyr use this concept to provide the analyst tremendous speed and quality improvements. Code and analysis can be done faster with fewer defects.

These sentiments are what motivated my adoption of the purrr library and in return I’ve received a tremendous productivity boost similar in magnitude to the boost I received from dplyr.

Some of my current favorite functions include: safely, transpose, map and keep. In this post I’m looking to add a new function to my repertoire: I choose pmap.

for loops are very fragile. They’re susceptible to “off-by-one” errors and complect the order of things with “doing to each.” purrr::map avoids the reliance on order and is more robust than a for loop (although perhaps not as fast).

I’ve been using map like,

map(seq_len(3), function(x) { x })

## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3

In this usage, the function being applied is often called a “lambda function.” This is in contrast to “named functions” like add1 <- function(x) { x + 1 }. Lambda functions are a great way to add code functionality while minimizing the geographical distance between important code functions. They can help keep code in context. Named functions are great for abstraction in cases where the details become distracting to the overall point of a code piece.

Recently I noticed cases where I’ve wanted to simultaneously map over two inputs like,

map2(seq_len(3),  # first input, can be named with .x
     c("thing", "items", "objects"),  # second input, can be named with .y
     function(x, y) { paste(x, y) }) # function to apply

## [[1]]
## [1] "1 thing"
## 
## [[2]]
## [1] "2 items"
## 
## [[3]]
## [1] "3 objects"

Though this seems like a small improvement on map, I’ve found the function to be really powerful in every day use.

Recently on Twitter, @cantabile raised to me that a more general function exists for mapping as many inputs as one wants!

The next step is purrr:pmap!
— Charles T. Gray (@cantabile) December 29, 2017

Let’s give it a try!

pmap(list(
  x = seq_len(3),
  y = c("thing", "items", "objects"),
  z = c(".", "?", "!")
),
.f = function(x, y, z) { paste(x, paste0(y, z)) })

## [[1]]
## [1] "1 thing."
## 
## [[2]]
## [1] "2 items?"
## 
## [[3]]
## [1] "3 objects!"

I think the difference in interface syntax is important to call out. Both map and map2 take vector arguments directly like x = and y = while pmap takes a list of arguments like list(x = , y = ).

Also, the arguments must be vectors of the same length, for example I tried,

pmap(list(
  x = seq_len(3),
  y = c("thing", "items", "objects"),
  capitalize = function(x) { paste0(substr(x, 1, 1), substr(x, 2, nchar(x))) }
),
.f = function(x, y, z) { paste(x, capitalize(paste0(y, z))) })

## Error: Element 3 is not a vector (closure)

But the result is an error stating that closure (functions) may not be passed. My expectation was that the closure would be valid and recycled for the length of the longest vector argument, but that’s not the case!

Thanks for joining me in learning pmap! I hope you’ll consider joining our collective 2018 learning group. All are welcome!

Many more purrr functions to go! For more information about the library, please see @jennybc’s set of resources here.

ls("package:purrr")

##   [1] "%@%"                 "%>%"                 "%||%"               
##   [4] "accumulate"          "accumulate_right"    "array_branch"       
##   [7] "array_tree"          "as_function"         "as_mapper"          
##  [10] "as_vector"           "at_depth"            "attr_getter"        
##  [13] "auto_browse"         "compact"             "compose"            
##  [16] "cross"               "cross_d"             "cross_df"           
##  [19] "cross_n"             "cross2"              "cross3"             
##  [22] "detect"              "detect_index"        "discard"            
##  [25] "every"               "flatten"             "flatten_chr"        
##  [28] "flatten_dbl"         "flatten_df"          "flatten_dfc"        
##  [31] "flatten_dfr"         "flatten_int"         "flatten_lgl"        
##  [34] "has_element"         "head_while"          "imap"               
##  [37] "imap_chr"            "imap_dbl"            "imap_dfc"           
##  [40] "imap_dfr"            "imap_int"            "imap_lgl"           
##  [43] "invoke"              "invoke_map"          "invoke_map_chr"     
##  [46] "invoke_map_dbl"      "invoke_map_df"       "invoke_map_dfc"     
##  [49] "invoke_map_dfr"      "invoke_map_int"      "invoke_map_lgl"     
##  [52] "is_atomic"           "is_bare_atomic"      "is_bare_character"  
##  [55] "is_bare_double"      "is_bare_integer"     "is_bare_list"       
##  [58] "is_bare_logical"     "is_bare_numeric"     "is_bare_vector"     
##  [61] "is_character"        "is_double"           "is_empty"           
##  [64] "is_formula"          "is_function"         "is_integer"         
##  [67] "is_list"             "is_logical"          "is_null"            
##  [70] "is_numeric"          "is_scalar_atomic"    "is_scalar_character"
##  [73] "is_scalar_double"    "is_scalar_integer"   "is_scalar_list"     
##  [76] "is_scalar_logical"   "is_scalar_numeric"   "is_scalar_vector"   
##  [79] "is_vector"           "iwalk"               "keep"               
##  [82] "lift"                "lift_dl"             "lift_dv"            
##  [85] "lift_ld"             "lift_lv"             "lift_vd"            
##  [88] "lift_vl"             "list_along"          "list_merge"         
##  [91] "list_modify"         "lmap"                "lmap_at"            
##  [94] "lmap_if"             "map"                 "map_at"             
##  [97] "map_call"            "map_chr"             "map_dbl"            
## [100] "map_df"              "map_dfc"             "map_dfr"            
## [103] "map_if"              "map_int"             "map_lgl"            
## [106] "map2"                "map2_chr"            "map2_dbl"           
## [109] "map2_df"             "map2_dfc"            "map2_dfr"           
## [112] "map2_int"            "map2_lgl"            "modify"             
## [115] "modify_at"           "modify_depth"        "modify_if"          
## [118] "negate"              "partial"             "pluck"              
## [121] "pmap"                "pmap_chr"            "pmap_dbl"           
## [124] "pmap_df"             "pmap_dfc"            "pmap_dfr"           
## [127] "pmap_int"            "pmap_lgl"            "possibly"           
## [130] "prepend"             "pwalk"               "quietly"            
## [133] "rbernoulli"          "rdunif"              "reduce"             
## [136] "reduce_right"        "reduce2"             "reduce2_right"      
## [139] "rep_along"           "rerun"               "safely"             
## [142] "set_names"           "simplify"            "simplify_all"       
## [145] "some"                "splice"              "tail_while"         
## [148] "transpose"           "update_list"         "vec_depth"          
## [151] "walk"                "walk2"               "when"

@statwonk