5 Scoping and flow control
This chapter discusses a few details regarding functions.
5.1 Scope
What happens if a function defines variables that have a variable by the same name in the global environment?
We start with a variable defined in the global environment:
a <- 5A function can access global variables:
f <- function() {
  a
}
f()## [1] 5On the other hand, a variable which is defined inside a function is contained in that function. It will not be known outside of that function. Respectively, it won’t overwrite the value of global variables.
f <- function() {
  a <- 2
  a
}
f()## [1] 2a## [1] 5Global variables are a (hidden) part of a function’s interface. Ideally, functions are be self-contained, independent of global variables. Notable exceptions are objects are used across your entire analysis, such as “the dataset”. (Otherwise you would need to pass them across many layers.)
5.1.1 Exercises
- Double-check what happens if two functions declare/use a variable of the same name. - # Variables in different functions f1 <- function() { a <- 3 a + f2() } f2 <- function() { a } f1() f2() a
5.2 Pure functions and side effects
Click here to show setup code.
library(tidyverse)Functions should do one thing, and do it well.3
A pure function is one that is called for its return value and which has no side effects:
pure_function <- function(x) {
  x + 1
}
pure_function(1)## [1] 2For functions with side effect, it is good practice to return the input invisibly:
side_effect_function <- function(x) {
  file <- tempfile()
  writeLines(format(x), tempfile())
  print(x)
  message(x, " written to ", file)
  invisible(x)
}
side_effect_function(2)## [1] 2## 2 written to /tmp/RtmpCquLue/file2db818c0ce09Separation helps isolate the side effects. If side effect functions return the input, they remain composable with pure functions:
5 %>%
  pure_function() %>%
  side_effect_function() %>%
  pure_function()## [1] 6## 6 written to /tmp/RtmpCquLue/file2db81bfd1887## [1] 75.2.1 Exercises
- In the above example, which part of the pipe triggers the display of - 6and- 7, respectively?
- How do you create a function that returns more than one value? 
- Implement your own purely functional version of - sum()by using- reduce(). (Hint:- `+`is a function that takes two arguments and returns the sum.)- reduce(1:5, ___)- ## [1] 15
- Implement your own purely functional version of - cumsum()by using- accumulate().- accumulate(1:5, ___)- ## [1] 1 3 6 10 15
- Implement your own purely functional version of - cumsum()by using- reduce()only. (Hint: Use- tail(., 1)to access the last element of a vector.)- reduce(1:5, ~ _____)- ## [1] 1 3 6 10 15
5.3 Control flow
Click here to show setup code.
library(tidyverse)
library(here)
weather_path <- function(filename) {
  # Returned value
  here("data/weather", filename)
}
read_weather_file <- function(filename) {
  readxl::read_excel(weather_path(filename))
}We start once more with the functions weather_path() from section “Arguments” and read_weather_file() from section “Intermediate variables”.
A way to regulate the control flow is by using if ():
read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
  # Create ensemble dataset from files on disk
  weather_data <- bind_rows(
    berlin = read_weather_file("berlin.xlsx"),
    toronto = read_weather_file("toronto.xlsx"),
    tel_aviv = read_weather_file("tel_aviv.xlsx"),
    zurich = read_weather_file("zurich.xlsx"),
    .id = "city_code"
  )
  # Filter, conditionally
  if (omit_zurich) {
    weather_data <-
      weather_data %>%
      filter(city_code != "zurich")
  }
  if (omit_toronto) {
    weather_data <-
      weather_data %>%
      filter(city_code != "toronto")
  }
  # Return result
  weather_data
}
read_weather_data(omit_toronto = TRUE, omit_zurich = TRUE) %>%
  count(city_code)## # A tibble: 2 x 2
##   city_code     n
##   <chr>     <int>
## 1 berlin       49
## 2 tel_aviv     49
read_weather_data(omit_toronto = TRUE, omit_zurich = FALSE) %>%
  count(city_code)## # A tibble: 3 x 2
##   city_code     n
##   <chr>     <int>
## 1 berlin       49
## 2 tel_aviv     49
## 3 zurich       49
This can be useful if aiming at a possible early return:
read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
  # Create ensemble dataset from files on disk
  weather_data <- bind_rows(
    berlin = read_weather_file("berlin.xlsx"),
    toronto = read_weather_file("toronto.xlsx"),
    tel_aviv = read_weather_file("tel_aviv.xlsx"),
    zurich = read_weather_file("zurich.xlsx"),
    .id = "city_code"
  )
  # Can keep original data?
  if (!omit_zurich && !omit_toronto) {
    return(weather_data)
  }
  # Filter, conditionally
  if (omit_zurich) {
    weather_data <-
      weather_data %>%
      filter(city_code != "zurich")
  }
  if (omit_toronto) {
    weather_data <-
      weather_data %>%
      filter(city_code != "toronto")
  }
  # Return result
  weather_data
}Conditional branching with if-else-logic. (This is just for illustration, you should not implement code like this!)
read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
  # Create ensemble dataset from files on disk
  weather_data <- bind_rows(
    berlin = read_weather_file("berlin.xlsx"),
    toronto = read_weather_file("toronto.xlsx"),
    tel_aviv = read_weather_file("tel_aviv.xlsx"),
    zurich = read_weather_file("zurich.xlsx"),
    .id = "city_code"
  )
  # Filter, conditionally, and return
  if (!omit_zurich && !omit_toronto) {
    weather_data
  } else if (omit_zurich && !omit_toronto) {
    weather_data %>%
      filter(city_code != "zurich")
  } else if (!omit_zurich && omit_toronto) {
    weather_data %>%
      filter(city_code != "toronto")
  } else {
    # Filter both
    weather_data %>%
      filter(city_code != "zurich") %>%
      filter(city_code != "toronto")
  }
}
read_weather_data(omit_toronto = TRUE) %>%
  count(city_code)## # A tibble: 3 x 2
##   city_code     n
##   <chr>     <int>
## 1 berlin       49
## 2 tel_aviv     49
## 3 zurich       49
read_weather_data(omit_zurich = TRUE) %>%
  count(city_code)## # A tibble: 3 x 2
##   city_code     n
##   <chr>     <int>
## 1 berlin       49
## 2 tel_aviv     49
## 3 toronto      49
5.3.1 Exercises
- Implement a function that branches over an argument and returns the sum or the product of the input, respectively. - agg <- function(_____) { if (fun == "___") { sum(x) } else if (_____) { prod(___) } else { rlang::abort('`fun` must be "sum" or "prod".') } }- agg(1:4, "sum")- ## [1] 10- agg(1:4, "prod")- ## [1] 24
5.4 Closures
Click here to show setup code.
library(tidyverse)
library(here)
weather_path <- function(filename) {
  # Returned value
  here("data/weather", filename)
}
read_weather_file <- function(filename) {
  readxl::read_excel(weather_path(filename))
}
get_weather_file_for <- function(city_code) {
  paste0(city_code, ".xlsx")
}
get_weather_data_for <- function(city_code) {
  read_weather_file(get_weather_file_for(city_code))
}Closures can e.g. be used during function definition.
We start once more with the functions weather_path() from section “Arguments” and read_weather_file() from section “Intermediate variables”.
Here we create a function that loads a particular dataset:
make_read_weather_file <- function(filename) {
  # Avoid odd effects due to lazy evaluation
  force(filename)
  # This function (closure) accesses the filename from the
  # outer function
  f <- function() {
    read_weather_file(filename)
  }
  f
}
read_berlin <- make_read_weather_file("berlin.xlsx")
read_toronto <- make_read_weather_file("toronto.xlsx")
read_tel_aviv <- make_read_weather_file("tel_aviv.xlsx")
read_zurich <- make_read_weather_file("zurich.xlsx")
read_berlin## function() {
##     read_weather_file(filename)
##   }
## <environment: 0x7a47180>read_toronto## function() {
##     read_weather_file(filename)
##   }
## <environment: 0x78f2b18>read_berlin()## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
read_toronto()## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
Use closures as wrappers for other verbs/functions (such functions are also called “adverbs”):
loudly <- function(f) {
  force(f)
  function(...) {
    args <- list(...)
    msg <- paste0(length(args), " argument(s)")
    message(msg)
    f(...)
  }
}
read_loudly <- loudly(read_weather_file)
read_loudly## function(...) {
##     args <- list(...)
##     msg <- paste0(length(args), " argument(s)")
##     message(msg)
## 
##     f(...)
##   }
## <environment: 0x7b40f88>read_loudly("berlin.xlsx")## 1 argument(s)## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
The safely() function is another example from the purrr package:
cities <- list("berlin", "toronto", "milan", "tel_aviv")
try(map(cities, get_weather_data_for))## Error : `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'map(cities, safely(get_weather_data_for))## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[1]]$error
## NULL
## 
## 
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[2]]$error
## NULL
## 
## 
## [[3]]
## [[3]]$result
## NULL
## 
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
## 
## 
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
## 
## [[4]]$error
## NULL
safely(get_weather_data_for)## function (...) 
## capture_error(.f(...), otherwise, quiet)
## <bytecode: 0x93fe5d0>
## <environment: 0x9094e30>map(cities, ~ safely(get_weather_data_for)(.))## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[1]]$error
## NULL
## 
## 
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[2]]$error
## NULL
## 
## 
## [[3]]
## [[3]]$result
## NULL
## 
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
## 
## 
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
## 
## [[4]]$error
## NULL
safe_get_weather_data_for <- safely(get_weather_data_for)
map(cities, ~ safe_get_weather_data_for(.))## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[1]]$error
## NULL
## 
## 
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[2]]$error
## NULL
## 
## 
## [[3]]
## [[3]]$result
## NULL
## 
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
## 
## 
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
## 
## [[4]]$error
## NULL
5.4.1 Exercises
- Review the help and the implementation of - safely()and- possibly().- safely- ## function (.f, otherwise = NULL, quiet = TRUE) ## { ## .f <- as_mapper(.f) ## function(...) capture_error(.f(...), otherwise, quiet) ## } ## <bytecode: 0x93fead8> ## <environment: namespace:purrr>- possibly- ## function (.f, otherwise, quiet = TRUE) ## { ## .f <- as_mapper(.f) ## force(otherwise) ## function(...) { ## tryCatch(.f(...), error = function(e) { ## if (!quiet) ## message("Error: ", e$message) ## otherwise ## }, interrupt = function(e) { ## stop("Terminated by user", call. = FALSE) ## }) ## } ## } ## <bytecode: 0x7c3b2e8> ## <environment: namespace:purrr>
- Unix philosophy, originated by Ken Thompson↩