5 Scoping and flow control

This chapter discusses a few details regarding functions.

5.1 Scope

What happens if a function defines variables that have a variable by the same name in the global environment?

We start with a variable defined in the global environment:

a <- 5

A function can access global variables:

f <- function() {
  a
}

f()
## [1] 5

On the other hand, a variable which is defined inside a function is contained in that function. It will not be known outside of that function. Respectively, it won’t overwrite the value of global variables.

f <- function() {
  a <- 2
  a
}

f()
## [1] 2
a
## [1] 5

Global variables are a (hidden) part of a function’s interface. Ideally, functions are be self-contained, independent of global variables. Notable exceptions are objects are used across your entire analysis, such as “the dataset”. (Otherwise you would need to pass them across many layers.)

5.1.1 Exercises

  1. Double-check what happens if two functions declare/use a variable of the same name.

    # Variables in different functions
    f1 <- function() {
      a <- 3
      a + f2()
    }
    
    f2 <- function() {
      a
    }
    
    f1()
    f2()
    a

5.2 Pure functions and side effects

Click here to show setup code.

library(tidyverse)

Functions should do one thing, and do it well.3

A pure function is one that is called for its return value and which has no side effects:

pure_function <- function(x) {
  x + 1
}

pure_function(1)
## [1] 2

For functions with side effect, it is good practice to return the input invisibly:

side_effect_function <- function(x) {
  file <- tempfile()
  writeLines(format(x), tempfile())
  print(x)
  message(x, " written to ", file)

  invisible(x)
}

side_effect_function(2)
## [1] 2
## 2 written to /tmp/RtmpCquLue/file2db818c0ce09

Separation helps isolate the side effects. If side effect functions return the input, they remain composable with pure functions:

5 %>%
  pure_function() %>%
  side_effect_function() %>%
  pure_function()
## [1] 6
## 6 written to /tmp/RtmpCquLue/file2db81bfd1887
## [1] 7

5.2.1 Exercises

  1. In the above example, which part of the pipe triggers the display of 6 and 7, respectively?

  2. How do you create a function that returns more than one value?

  3. Implement your own purely functional version of sum() by using reduce(). (Hint: `+` is a function that takes two arguments and returns the sum.)

    reduce(1:5, ___)
    ## [1] 15
  4. Implement your own purely functional version of cumsum() by using accumulate().

    accumulate(1:5, ___)
    ## [1]  1  3  6 10 15
  5. Implement your own purely functional version of cumsum() by using reduce() only. (Hint: Use tail(., 1) to access the last element of a vector.)

    reduce(1:5, ~ _____)
    ## [1]  1  3  6 10 15

5.3 Control flow

Click here to show setup code.

library(tidyverse)
library(here)

weather_path <- function(filename) {
  # Returned value
  here("data/weather", filename)
}
read_weather_file <- function(filename) {
  readxl::read_excel(weather_path(filename))
}

We start once more with the functions weather_path() from section “Arguments” and read_weather_file() from section “Intermediate variables”.

A way to regulate the control flow is by using if ():

read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
  # Create ensemble dataset from files on disk
  weather_data <- bind_rows(
    berlin = read_weather_file("berlin.xlsx"),
    toronto = read_weather_file("toronto.xlsx"),
    tel_aviv = read_weather_file("tel_aviv.xlsx"),
    zurich = read_weather_file("zurich.xlsx"),
    .id = "city_code"
  )

  # Filter, conditionally
  if (omit_zurich) {
    weather_data <-
      weather_data %>%
      filter(city_code != "zurich")
  }

  if (omit_toronto) {
    weather_data <-
      weather_data %>%
      filter(city_code != "toronto")
  }

  # Return result
  weather_data
}

read_weather_data(omit_toronto = TRUE, omit_zurich = TRUE) %>%
  count(city_code)
## # A tibble: 2 x 2
##   city_code     n
##   <chr>     <int>
## 1 berlin       49
## 2 tel_aviv     49
read_weather_data(omit_toronto = TRUE, omit_zurich = FALSE) %>%
  count(city_code)
## # A tibble: 3 x 2
##   city_code     n
##   <chr>     <int>
## 1 berlin       49
## 2 tel_aviv     49
## 3 zurich       49

This can be useful if aiming at a possible early return:

read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
  # Create ensemble dataset from files on disk
  weather_data <- bind_rows(
    berlin = read_weather_file("berlin.xlsx"),
    toronto = read_weather_file("toronto.xlsx"),
    tel_aviv = read_weather_file("tel_aviv.xlsx"),
    zurich = read_weather_file("zurich.xlsx"),
    .id = "city_code"
  )

  # Can keep original data?
  if (!omit_zurich && !omit_toronto) {
    return(weather_data)
  }

  # Filter, conditionally
  if (omit_zurich) {
    weather_data <-
      weather_data %>%
      filter(city_code != "zurich")
  }

  if (omit_toronto) {
    weather_data <-
      weather_data %>%
      filter(city_code != "toronto")
  }

  # Return result
  weather_data
}

Conditional branching with if-else-logic. (This is just for illustration, you should not implement code like this!)

read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
  # Create ensemble dataset from files on disk
  weather_data <- bind_rows(
    berlin = read_weather_file("berlin.xlsx"),
    toronto = read_weather_file("toronto.xlsx"),
    tel_aviv = read_weather_file("tel_aviv.xlsx"),
    zurich = read_weather_file("zurich.xlsx"),
    .id = "city_code"
  )

  # Filter, conditionally, and return
  if (!omit_zurich && !omit_toronto) {
    weather_data
  } else if (omit_zurich && !omit_toronto) {
    weather_data %>%
      filter(city_code != "zurich")
  } else if (!omit_zurich && omit_toronto) {
    weather_data %>%
      filter(city_code != "toronto")
  } else {
    # Filter both
    weather_data %>%
      filter(city_code != "zurich") %>%
      filter(city_code != "toronto")
  }
}

read_weather_data(omit_toronto = TRUE) %>%
  count(city_code)
## # A tibble: 3 x 2
##   city_code     n
##   <chr>     <int>
## 1 berlin       49
## 2 tel_aviv     49
## 3 zurich       49
read_weather_data(omit_zurich = TRUE) %>%
  count(city_code)
## # A tibble: 3 x 2
##   city_code     n
##   <chr>     <int>
## 1 berlin       49
## 2 tel_aviv     49
## 3 toronto      49

5.3.1 Exercises

  1. Implement a function that branches over an argument and returns the sum or the product of the input, respectively.

    agg <- function(_____) {
      if (fun == "___") {
        sum(x)
      } else if (_____) {
        prod(___)
      } else {
        rlang::abort('`fun` must be "sum" or "prod".')
      }
    }
    agg(1:4, "sum")
    ## [1] 10
    agg(1:4, "prod")
    ## [1] 24

5.4 Closures

Click here to show setup code.

library(tidyverse)
library(here)

weather_path <- function(filename) {
  # Returned value
  here("data/weather", filename)
}

read_weather_file <- function(filename) {
  readxl::read_excel(weather_path(filename))
}

get_weather_file_for <- function(city_code) {
  paste0(city_code, ".xlsx")
}

get_weather_data_for <- function(city_code) {
  read_weather_file(get_weather_file_for(city_code))
}

Closures can e.g. be used during function definition.

We start once more with the functions weather_path() from section “Arguments” and read_weather_file() from section “Intermediate variables”.

Here we create a function that loads a particular dataset:

make_read_weather_file <- function(filename) {
  # Avoid odd effects due to lazy evaluation
  force(filename)

  # This function (closure) accesses the filename from the
  # outer function
  f <- function() {
    read_weather_file(filename)
  }

  f
}

read_berlin <- make_read_weather_file("berlin.xlsx")
read_toronto <- make_read_weather_file("toronto.xlsx")
read_tel_aviv <- make_read_weather_file("tel_aviv.xlsx")
read_zurich <- make_read_weather_file("zurich.xlsx")

read_berlin
## function() {
##     read_weather_file(filename)
##   }
## <environment: 0x7a47180>
read_toronto
## function() {
##     read_weather_file(filename)
##   }
## <environment: 0x78f2b18>
read_berlin()
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
read_toronto()
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>

Use closures as wrappers for other verbs/functions (such functions are also called “adverbs”):

loudly <- function(f) {
  force(f)

  function(...) {
    args <- list(...)
    msg <- paste0(length(args), " argument(s)")
    message(msg)

    f(...)
  }
}

read_loudly <- loudly(read_weather_file)
read_loudly
## function(...) {
##     args <- list(...)
##     msg <- paste0(length(args), " argument(s)")
##     message(msg)
## 
##     f(...)
##   }
## <environment: 0x7b40f88>
read_loudly("berlin.xlsx")
## 1 argument(s)
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>

The safely() function is another example from the purrr package:

cities <- list("berlin", "toronto", "milan", "tel_aviv")
try(map(cities, get_weather_data_for))
## Error : `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'
map(cities, safely(get_weather_data_for))
## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[1]]$error
## NULL
## 
## 
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[2]]$error
## NULL
## 
## 
## [[3]]
## [[3]]$result
## NULL
## 
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
## 
## 
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
## 
## [[4]]$error
## NULL
safely(get_weather_data_for)
## function (...) 
## capture_error(.f(...), otherwise, quiet)
## <bytecode: 0x93fe5d0>
## <environment: 0x9094e30>
map(cities, ~ safely(get_weather_data_for)(.))
## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[1]]$error
## NULL
## 
## 
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[2]]$error
## NULL
## 
## 
## [[3]]
## [[3]]$result
## NULL
## 
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
## 
## 
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
## 
## [[4]]$error
## NULL
safe_get_weather_data_for <- safely(get_weather_data_for)
map(cities, ~ safe_get_weather_data_for(.))
## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Mostly… part…               0                0
## 2 2019-04-28 16:00:00 Mostly… part…               0                0
## 3 2019-04-28 17:00:00 Mostly… part…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[1]]$error
## NULL
## 
## 
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## #   precipType <chr>
## 
## [[2]]$error
## NULL
## 
## 
## [[3]]
## [[3]]$result
## NULL
## 
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
## 
## 
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
##   time                summary icon  precipIntensity precipProbabili…
##   <dttm>              <chr>   <chr>           <dbl>            <dbl>
## 1 2019-04-28 15:00:00 Partly… part…               0                0
## 2 2019-04-28 16:00:00 Clear   clea…               0                0
## 3 2019-04-28 17:00:00 Clear   clea…               0                0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## #   apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## #   pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## #   cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
## 
## [[4]]$error
## NULL

5.4.1 Exercises

  1. Review the help and the implementation of safely() and possibly().

    safely
    ## function (.f, otherwise = NULL, quiet = TRUE) 
    ## {
    ##     .f <- as_mapper(.f)
    ##     function(...) capture_error(.f(...), otherwise, quiet)
    ## }
    ## <bytecode: 0x93fead8>
    ## <environment: namespace:purrr>
    possibly
    ## function (.f, otherwise, quiet = TRUE) 
    ## {
    ##     .f <- as_mapper(.f)
    ##     force(otherwise)
    ##     function(...) {
    ##         tryCatch(.f(...), error = function(e) {
    ##             if (!quiet) 
    ##                 message("Error: ", e$message)
    ##             otherwise
    ##         }, interrupt = function(e) {
    ##             stop("Terminated by user", call. = FALSE)
    ##         })
    ##     }
    ## }
    ## <bytecode: 0x7c3b2e8>
    ## <environment: namespace:purrr>

  1. Unix philosophy, originated by Ken Thompson