Late flights

View all flights that arrived after 10:00 PM. Use an intermediate variable, a nested expression, and the pipe. Which appeals more to you?

flights_after_10 <- filter(flights, ___)
View(flights_after_10)
View(filter(flights, ___))
flights %>%
  filter(___) %>%
  View()

Fly United

Extend the four solutions to view all "UA" flights that arrived after 10:00 PM.

flights_after_10 <- filter(flights, ___)
ua_flights_after_10 <- ...
View(___)
View(filter(filter(flights, ___), ___))
flights %>%
  filter(___) %>%
  filter(___) %>%
  View()

Ad infinitum, 1

Extend the four solutions to view all "UA" flights that departed before 6:00 AM and arrived after 10:00 PM.

Ad infinitum, 2

Extend the four solutions to view all "UA" flights that departed before 6:00 AM and arrived after 10:00 PM and had a delay of more than two hours.

Ad infinitum, 3

Extend the four solutions to view all "UA" flights that departed before 6:00 AM and arrived after 10:00 PM and had a delay of more than two hours, originating in one of New York City’s airports.

Ad infinitum, 4

Extend the four solutions to view all "UA" flights that departed before 6:00 AM and arrived after 10:00 PM and had a delay of more than two hours, originating in one of New York City’s airports but excluding Honolulu International airport.

Hint: Consult the airports dataset, use a filter with the predicate stringr::str_detect(name, "^Honolulu") .

Ad infinitum, 5

Sort the result by distance.

► Solution: ### Intermediate variables

Naming is hard!

early_flights <- filter(flights, dep_time >= 600)
early_late_flights <-
  filter(early_flights, arr_time >= 2200)
early_late_ua_flights <-
  filter(early_late_flights, carrier == "UA")
early_late_late_ua_flights <-
  filter(early_late_ua_flights, arr_delay > 120)
early_late_late_ua_flights_not_honolulu <-
  filter(early_late_late_ua_flights, dest != "HNL")
early_late_late_ua_flights_not_honolulu_sorted <-
  arrange(
    early_late_late_ua_flights_not_honolulu,
    distance
  )
View(early_late_late_ua_flights_not_honolulu_sorted)
## # A tibble: 330 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>
##  1  2013    10     7     2108           1710       238     2217
##  2  2013    12    17     2122           1714       248     2248
##  3  2013     3     7     2046           1905       101     2231
##  4  2013     3     7     2124           1550       334     2304
##  5  2013     3    19     2251           2030       141     2355
##  6  2013     5     3     2058           1555       303     2203
##  7  2013     5    19     2201           2000       121     2337
##  8  2013     5    21     2103           1730       213     2231
##  9  2013     6    10     2056           1800       176     2205
## 10  2013     6    24     2049           1800       169     2219
## # ... with 320 more rows, and 12 more variables: sched_arr_time <int>,
## #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>

Nested expressions

Difficult to read.

View(
  arrange(
    filter(
      filter(
        filter(
          filter(
            filter(
              flights,
              dep_time <= 600
            ),
            arr_time >= 2200
          ),
          carrier == "UA"
        ),
        arr_delay > 120
      ),
      dest != "HNL"
    ),
    distance
  )
)
## # A tibble: 0 x 19
## # ... with 19 variables: year <int>, month <int>, day <int>,
## #   dep_time <int>, sched_dep_time <int>, dep_delay <dbl>, arr_time <int>,
## #   sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## #   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Pipe

flights %>% 
  filter(dep_time <= 600) %>% 
  filter(arr_time >= 2200) %>% 
  filter(carrier == "UA") %>% 
  filter(arr_delay > 120) %>% 
  filter(dest != "HNL") %>%
  arrange(distance) %>%
  View()
## # A tibble: 0 x 19
## # ... with 19 variables: year <int>, month <int>, day <int>,
## #   dep_time <int>, sched_dep_time <int>, dep_delay <dbl>, arr_time <int>,
## #   sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## #   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

The original data is never updated! You still need to assign the result of a pipe to a variable:

late_late_ua_flights_not_honolulu <-
  flights %>% 
  filter(dep_time <= 600) %>% 
  filter(arr_time >= 2200) %>% 
  filter(carrier == "UA") %>% 
  filter(arr_delay > 120) %>% 
  filter(dest != "HNL") %>%
  arrange(distance)

Copyright © 2018 Kirill Müller. Licensed under CC BY-NC 4.0.