Viewing

View the flights dataset in RStudio’s data pane. Look up the meaning of the variables in the help.

Hint: You need to load the nycflights13 package.

view(___)

All flights on this day x years ago

Find all flights that departed today 6 years ago.

flights %>%
  filter(month ___, day ___)

► Solution: Be careful with the equality operator ==:

flights %>% 
  filter(month = 6, day = 2)
## Error: `month` (`month = 6`), `day` (`day = 2`) must not be named, do you need `==`?
flights %>% 
  filter(month == 6, day == 2)
## # A tibble: 911 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     6     2       14           2359        15      339            341
##  2  2013     6     2       20           2155       145      222             15
##  3  2013     6     2       24           2245        99      133              1
##  4  2013     6     2       33           2059       214      150           2224
##  5  2013     6     2       35           2130       185      332             17
##  6  2013     6     2       36           1914       322      223           2121
##  7  2013     6     2       44           2359        45      420            350
##  8  2013     6     2      128           2159       209      325             10
##  9  2013     6     2      131           2146       225      229           2251
## 10  2013     6     2      219           2055       324      322           2230
## # … with 901 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
## #   flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## #   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Can we make a dynamic query?

flights %>% 
  filter(
    month == lubridate::month(Sys.Date()),
    day == lubridate::day(Sys.Date())
  )
## # A tibble: 661 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013    11    29      518            515         3      749            808
##  2  2013    11    29      544            540         4      850            850
##  3  2013    11    29      545            550        -5     1023           1027
##  4  2013    11    29      552            600        -8      851            856
##  5  2013    11    29      555            600        -5     1054           1043
##  6  2013    11    29      556            600        -4      812            825
##  7  2013    11    29      558            600        -2      712            730
##  8  2013    11    29      609            615        -6      750            817
##  9  2013    11    29      610            615        -5      749            818
## 10  2013    11    29      612            615        -3      904            920
## # … with 651 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
## #   flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## #   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Can we use arguments?

month_ <- lubridate::month(Sys.Date())
day_ <- lubridate::day(Sys.Date())
flights %>% 
  filter(
    month == month_,
    day == day_
  )
## # A tibble: 661 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013    11    29      518            515         3      749            808
##  2  2013    11    29      544            540         4      850            850
##  3  2013    11    29      545            550        -5     1023           1027
##  4  2013    11    29      552            600        -8      851            856
##  5  2013    11    29      555            600        -5     1054           1043
##  6  2013    11    29      556            600        -4      812            825
##  7  2013    11    29      558            600        -2      712            730
##  8  2013    11    29      609            615        -6      750            817
##  9  2013    11    29      610            615        -5      749            818
## 10  2013    11    29      612            615        -3      904            920
## # … with 651 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
## #   flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## #   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

All flights between 8:00 AM and 10:00 PM

Find all flights that departed between 8:00 AM and 10:00 PM.

flights %>%
  filter(between(dep_time, ___, ___))

► Solution:

flights %>% 
  filter(dep_time >= 800, dep_time <= 2200)
## # A tibble: 267,608 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      800            800         0     1022           1014
##  2  2013     1     1      800            810       -10      949            955
##  3  2013     1     1      801            805        -4      900            919
##  4  2013     1     1      803            810        -7      903            925
##  5  2013     1     1      803            800         3     1132           1144
##  6  2013     1     1      804            810        -6     1103           1116
##  7  2013     1     1      805            805         0     1015           1005
##  8  2013     1     1      805            800         5     1118           1106
##  9  2013     1     1      805            815       -10     1006           1010
## 10  2013     1     1      807            810        -3     1043           1043
## # … with 267,598 more rows, and 11 more variables: arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
flights %>% 
  filter(between(dep_time, 800, 2200))
## # A tibble: 267,608 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      800            800         0     1022           1014
##  2  2013     1     1      800            810       -10      949            955
##  3  2013     1     1      801            805        -4      900            919
##  4  2013     1     1      803            810        -7      903            925
##  5  2013     1     1      803            800         3     1132           1144
##  6  2013     1     1      804            810        -6     1103           1116
##  7  2013     1     1      805            805         0     1015           1005
##  8  2013     1     1      805            800         5     1118           1106
##  9  2013     1     1      805            815       -10     1006           1010
## 10  2013     1     1      807            810        -3     1043           1043
## # … with 267,598 more rows, and 11 more variables: arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Flights in winter months

Find all flights that departed in the three winter months.

flights %>%
  filter(month ___ c(___)))

► Solution:

flights %>%
  filter(month %in% c(12, 1, 2))
## # A tibble: 80,090 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      517            515         2      830            819
##  2  2013     1     1      533            529         4      850            830
##  3  2013     1     1      542            540         2      923            850
##  4  2013     1     1      544            545        -1     1004           1022
##  5  2013     1     1      554            600        -6      812            837
##  6  2013     1     1      554            558        -4      740            728
##  7  2013     1     1      555            600        -5      913            854
##  8  2013     1     1      557            600        -3      709            723
##  9  2013     1     1      557            600        -3      838            846
## 10  2013     1     1      558            600        -2      753            745
## # … with 80,080 more rows, and 11 more variables: arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
winter_months <- c(12, 1, 2)
flights %>%
  filter(month %in% winter_months)
## # A tibble: 80,090 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      517            515         2      830            819
##  2  2013     1     1      533            529         4      850            830
##  3  2013     1     1      542            540         2      923            850
##  4  2013     1     1      544            545        -1     1004           1022
##  5  2013     1     1      554            600        -6      812            837
##  6  2013     1     1      554            558        -4      740            728
##  7  2013     1     1      555            600        -5      913            854
##  8  2013     1     1      557            600        -3      709            723
##  9  2013     1     1      557            600        -3      838            846
## 10  2013     1     1      558            600        -2      753            745
## # … with 80,080 more rows, and 11 more variables: arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Departure time later than arrival time

Are there any flights where departure time is later than arrival time? What does this mean?

flights %>%
  filter(_____)

► Solution:

flights %>% 
  filter(dep_time > arr_time)
## # A tibble: 10,633 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1     1929           1920         9        3              7
##  2  2013     1     1     1939           1840        59       29           2151
##  3  2013     1     1     2058           2100        -2        8           2359
##  4  2013     1     1     2102           2108        -6      146            158
##  5  2013     1     1     2108           2057        11       25             39
##  6  2013     1     1     2120           2130       -10       16             18
##  7  2013     1     1     2121           2040        41        6           2323
##  8  2013     1     1     2128           2135        -7       26             50
##  9  2013     1     1     2134           2045        49       20           2352
## 10  2013     1     1     2136           2145        -9       25             39
## # … with 10,623 more rows, and 11 more variables: arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Copyright © 2019 Kirill Müller. Licensed under CC BY-NC 4.0.