5 Scoping and flow control
This chapter discusses a few details regarding functions.
5.1 Scope
What happens if a function defines variables that have a variable by the same name in the global environment?
We start with a variable defined in the global environment:
a <- 5
A function can access global variables:
f <- function() {
a
}
f()
## [1] 5
On the other hand, a variable which is defined inside a function is contained in that function. It will not be known outside of that function. Respectively, it won’t overwrite the value of global variables.
f <- function() {
a <- 2
a
}
f()
## [1] 2
a
## [1] 5
Global variables are a (hidden) part of a function’s interface. Ideally, functions are be self-contained, independent of global variables. Notable exceptions are objects are used across your entire analysis, such as “the dataset”. (Otherwise you would need to pass them across many layers.)
5.1.1 Exercises
Double-check what happens if two functions declare/use a variable of the same name.
# Variables in different functions f1 <- function() { a <- 3 a + f2() } f2 <- function() { a } f1() f2() a
5.2 Pure functions and side effects
Click here to show setup code.
library(tidyverse)
Functions should do one thing, and do it well.3
A pure function is one that is called for its return value and which has no side effects:
pure_function <- function(x) {
x + 1
}
pure_function(1)
## [1] 2
For functions with side effect, it is good practice to return the input invisibly:
side_effect_function <- function(x) {
file <- tempfile()
writeLines(format(x), tempfile())
print(x)
message(x, " written to ", file)
invisible(x)
}
side_effect_function(2)
## [1] 2
## 2 written to /tmp/RtmpCquLue/file2db818c0ce09
Separation helps isolate the side effects. If side effect functions return the input, they remain composable with pure functions:
5 %>%
pure_function() %>%
side_effect_function() %>%
pure_function()
## [1] 6
## 6 written to /tmp/RtmpCquLue/file2db81bfd1887
## [1] 7
5.2.1 Exercises
In the above example, which part of the pipe triggers the display of
6
and7
, respectively?How do you create a function that returns more than one value?
Implement your own purely functional version of
sum()
by usingreduce()
. (Hint:`+`
is a function that takes two arguments and returns the sum.)reduce(1:5, ___)
## [1] 15
Implement your own purely functional version of
cumsum()
by usingaccumulate()
.accumulate(1:5, ___)
## [1] 1 3 6 10 15
Implement your own purely functional version of
cumsum()
by usingreduce()
only. (Hint: Usetail(., 1)
to access the last element of a vector.)reduce(1:5, ~ _____)
## [1] 1 3 6 10 15
5.3 Control flow
Click here to show setup code.
library(tidyverse)
library(here)
weather_path <- function(filename) {
# Returned value
here("data/weather", filename)
}
read_weather_file <- function(filename) {
readxl::read_excel(weather_path(filename))
}
We start once more with the functions weather_path()
from section “Arguments” and read_weather_file()
from section “Intermediate variables”.
A way to regulate the control flow is by using if ()
:
read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
# Create ensemble dataset from files on disk
weather_data <- bind_rows(
berlin = read_weather_file("berlin.xlsx"),
toronto = read_weather_file("toronto.xlsx"),
tel_aviv = read_weather_file("tel_aviv.xlsx"),
zurich = read_weather_file("zurich.xlsx"),
.id = "city_code"
)
# Filter, conditionally
if (omit_zurich) {
weather_data <-
weather_data %>%
filter(city_code != "zurich")
}
if (omit_toronto) {
weather_data <-
weather_data %>%
filter(city_code != "toronto")
}
# Return result
weather_data
}
read_weather_data(omit_toronto = TRUE, omit_zurich = TRUE) %>%
count(city_code)
## # A tibble: 2 x 2
## city_code n
## <chr> <int>
## 1 berlin 49
## 2 tel_aviv 49
read_weather_data(omit_toronto = TRUE, omit_zurich = FALSE) %>%
count(city_code)
## # A tibble: 3 x 2
## city_code n
## <chr> <int>
## 1 berlin 49
## 2 tel_aviv 49
## 3 zurich 49
This can be useful if aiming at a possible early return:
read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
# Create ensemble dataset from files on disk
weather_data <- bind_rows(
berlin = read_weather_file("berlin.xlsx"),
toronto = read_weather_file("toronto.xlsx"),
tel_aviv = read_weather_file("tel_aviv.xlsx"),
zurich = read_weather_file("zurich.xlsx"),
.id = "city_code"
)
# Can keep original data?
if (!omit_zurich && !omit_toronto) {
return(weather_data)
}
# Filter, conditionally
if (omit_zurich) {
weather_data <-
weather_data %>%
filter(city_code != "zurich")
}
if (omit_toronto) {
weather_data <-
weather_data %>%
filter(city_code != "toronto")
}
# Return result
weather_data
}
Conditional branching with if-else-logic. (This is just for illustration, you should not implement code like this!)
read_weather_data <- function(omit_zurich = FALSE, omit_toronto = FALSE) {
# Create ensemble dataset from files on disk
weather_data <- bind_rows(
berlin = read_weather_file("berlin.xlsx"),
toronto = read_weather_file("toronto.xlsx"),
tel_aviv = read_weather_file("tel_aviv.xlsx"),
zurich = read_weather_file("zurich.xlsx"),
.id = "city_code"
)
# Filter, conditionally, and return
if (!omit_zurich && !omit_toronto) {
weather_data
} else if (omit_zurich && !omit_toronto) {
weather_data %>%
filter(city_code != "zurich")
} else if (!omit_zurich && omit_toronto) {
weather_data %>%
filter(city_code != "toronto")
} else {
# Filter both
weather_data %>%
filter(city_code != "zurich") %>%
filter(city_code != "toronto")
}
}
read_weather_data(omit_toronto = TRUE) %>%
count(city_code)
## # A tibble: 3 x 2
## city_code n
## <chr> <int>
## 1 berlin 49
## 2 tel_aviv 49
## 3 zurich 49
read_weather_data(omit_zurich = TRUE) %>%
count(city_code)
## # A tibble: 3 x 2
## city_code n
## <chr> <int>
## 1 berlin 49
## 2 tel_aviv 49
## 3 toronto 49
5.3.1 Exercises
Implement a function that branches over an argument and returns the sum or the product of the input, respectively.
agg <- function(_____) { if (fun == "___") { sum(x) } else if (_____) { prod(___) } else { rlang::abort('`fun` must be "sum" or "prod".') } }
agg(1:4, "sum")
## [1] 10
agg(1:4, "prod")
## [1] 24
5.4 Closures
Click here to show setup code.
library(tidyverse)
library(here)
weather_path <- function(filename) {
# Returned value
here("data/weather", filename)
}
read_weather_file <- function(filename) {
readxl::read_excel(weather_path(filename))
}
get_weather_file_for <- function(city_code) {
paste0(city_code, ".xlsx")
}
get_weather_data_for <- function(city_code) {
read_weather_file(get_weather_file_for(city_code))
}
Closures can e.g. be used during function definition.
We start once more with the functions weather_path()
from section “Arguments” and read_weather_file()
from section “Intermediate variables”.
Here we create a function that loads a particular dataset:
make_read_weather_file <- function(filename) {
# Avoid odd effects due to lazy evaluation
force(filename)
# This function (closure) accesses the filename from the
# outer function
f <- function() {
read_weather_file(filename)
}
f
}
read_berlin <- make_read_weather_file("berlin.xlsx")
read_toronto <- make_read_weather_file("toronto.xlsx")
read_tel_aviv <- make_read_weather_file("tel_aviv.xlsx")
read_zurich <- make_read_weather_file("zurich.xlsx")
read_berlin
## function() {
## read_weather_file(filename)
## }
## <environment: 0x7a47180>
read_toronto
## function() {
## read_weather_file(filename)
## }
## <environment: 0x78f2b18>
read_berlin()
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Mostly… part… 0 0
## 2 2019-04-28 16:00:00 Mostly… part… 0 0
## 3 2019-04-28 17:00:00 Mostly… part… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
read_toronto()
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Partly… part… 0 0
## 2 2019-04-28 16:00:00 Clear clea… 0 0
## 3 2019-04-28 17:00:00 Clear clea… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
Use closures as wrappers for other verbs/functions (such functions are also called “adverbs”):
loudly <- function(f) {
force(f)
function(...) {
args <- list(...)
msg <- paste0(length(args), " argument(s)")
message(msg)
f(...)
}
}
read_loudly <- loudly(read_weather_file)
read_loudly
## function(...) {
## args <- list(...)
## msg <- paste0(length(args), " argument(s)")
## message(msg)
##
## f(...)
## }
## <environment: 0x7b40f88>
read_loudly("berlin.xlsx")
## 1 argument(s)
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Mostly… part… 0 0
## 2 2019-04-28 16:00:00 Mostly… part… 0 0
## 3 2019-04-28 17:00:00 Mostly… part… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
The safely()
function is another example from the purrr package:
cities <- list("berlin", "toronto", "milan", "tel_aviv")
try(map(cities, get_weather_data_for))
## Error : `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'
map(cities, safely(get_weather_data_for))
## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Mostly… part… 0 0
## 2 2019-04-28 16:00:00 Mostly… part… 0 0
## 3 2019-04-28 17:00:00 Mostly… part… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
##
## [[1]]$error
## NULL
##
##
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Partly… part… 0 0
## 2 2019-04-28 16:00:00 Clear clea… 0 0
## 3 2019-04-28 17:00:00 Clear clea… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
##
## [[2]]$error
## NULL
##
##
## [[3]]
## [[3]]$result
## NULL
##
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
##
##
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Partly… part… 0 0
## 2 2019-04-28 16:00:00 Clear clea… 0 0
## 3 2019-04-28 17:00:00 Clear clea… 0 0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
##
## [[4]]$error
## NULL
safely(get_weather_data_for)
## function (...)
## capture_error(.f(...), otherwise, quiet)
## <bytecode: 0x93fe5d0>
## <environment: 0x9094e30>
map(cities, ~ safely(get_weather_data_for)(.))
## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Mostly… part… 0 0
## 2 2019-04-28 16:00:00 Mostly… part… 0 0
## 3 2019-04-28 17:00:00 Mostly… part… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
##
## [[1]]$error
## NULL
##
##
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Partly… part… 0 0
## 2 2019-04-28 16:00:00 Clear clea… 0 0
## 3 2019-04-28 17:00:00 Clear clea… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
##
## [[2]]$error
## NULL
##
##
## [[3]]
## [[3]]$result
## NULL
##
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
##
##
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Partly… part… 0 0
## 2 2019-04-28 16:00:00 Clear clea… 0 0
## 3 2019-04-28 17:00:00 Clear clea… 0 0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
##
## [[4]]$error
## NULL
safe_get_weather_data_for <- safely(get_weather_data_for)
map(cities, ~ safe_get_weather_data_for(.))
## [[1]]
## [[1]]$result
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Mostly… part… 0 0
## 2 2019-04-28 16:00:00 Mostly… part… 0 0
## 3 2019-04-28 17:00:00 Mostly… part… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
##
## [[1]]$error
## NULL
##
##
## [[2]]
## [[2]]$result
## # A tibble: 49 x 18
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Partly… part… 0 0
## 2 2019-04-28 16:00:00 Clear clea… 0 0
## 3 2019-04-28 17:00:00 Clear clea… 0 0
## # … with 46 more rows, and 13 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>,
## # precipType <chr>
##
## [[2]]$error
## NULL
##
##
## [[3]]
## [[3]]$result
## NULL
##
## [[3]]$error
## <simpleError: `path` does not exist: '/home/travis/build/krlmlr/tidyprog/data/weather/milan.xlsx'>
##
##
## [[4]]
## [[4]]$result
## # A tibble: 49 x 17
## time summary icon precipIntensity precipProbabili…
## <dttm> <chr> <chr> <dbl> <dbl>
## 1 2019-04-28 15:00:00 Partly… part… 0 0
## 2 2019-04-28 16:00:00 Clear clea… 0 0
## 3 2019-04-28 17:00:00 Clear clea… 0 0
## # … with 46 more rows, and 12 more variables: temperature <dbl>,
## # apparentTemperature <dbl>, dewPoint <dbl>, humidity <dbl>,
## # pressure <dbl>, windSpeed <dbl>, windGust <dbl>, windBearing <dbl>,
## # cloudCover <dbl>, uvIndex <dbl>, visibility <dbl>, ozone <dbl>
##
## [[4]]$error
## NULL
5.4.1 Exercises
Review the help and the implementation of
safely()
andpossibly()
.safely
## function (.f, otherwise = NULL, quiet = TRUE) ## { ## .f <- as_mapper(.f) ## function(...) capture_error(.f(...), otherwise, quiet) ## } ## <bytecode: 0x93fead8> ## <environment: namespace:purrr>
possibly
## function (.f, otherwise, quiet = TRUE) ## { ## .f <- as_mapper(.f) ## force(otherwise) ## function(...) { ## tryCatch(.f(...), error = function(e) { ## if (!quiet) ## message("Error: ", e$message) ## otherwise ## }, interrupt = function(e) { ## stop("Terminated by user", call. = FALSE) ## }) ## } ## } ## <bytecode: 0x7c3b2e8> ## <environment: namespace:purrr>
Unix philosophy, originated by Ken Thompson↩