View the flights
dataset in RStudio’s data pane. Look up the meaning of the variables in the help.
Hint: You need to load the nycflights13
package.
view(___)
Find all flights that departed today 6 years ago.
flights %>%
filter(month ___, day ___)
► Solution:
Be careful with the equality operator ==
:
flights %>%
filter(month = 6, day = 2)
## Error: `month` (`month = 6`), `day` (`day = 2`) must not be named, do you need `==`?
flights %>%
filter(month == 6, day == 2)
## # A tibble: 911 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 6 2 14 2359 15 339 341
## 2 2013 6 2 20 2155 145 222 15
## 3 2013 6 2 24 2245 99 133 1
## 4 2013 6 2 33 2059 214 150 2224
## 5 2013 6 2 35 2130 185 332 17
## 6 2013 6 2 36 1914 322 223 2121
## 7 2013 6 2 44 2359 45 420 350
## 8 2013 6 2 128 2159 209 325 10
## 9 2013 6 2 131 2146 225 229 2251
## 10 2013 6 2 219 2055 324 322 2230
## # … with 901 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Can we make a dynamic query?
flights %>%
filter(
month == lubridate::month(Sys.Date()),
day == lubridate::day(Sys.Date())
)
## # A tibble: 661 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 11 29 518 515 3 749 808
## 2 2013 11 29 544 540 4 850 850
## 3 2013 11 29 545 550 -5 1023 1027
## 4 2013 11 29 552 600 -8 851 856
## 5 2013 11 29 555 600 -5 1054 1043
## 6 2013 11 29 556 600 -4 812 825
## 7 2013 11 29 558 600 -2 712 730
## 8 2013 11 29 609 615 -6 750 817
## 9 2013 11 29 610 615 -5 749 818
## 10 2013 11 29 612 615 -3 904 920
## # … with 651 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Can we use arguments?
month_ <- lubridate::month(Sys.Date())
day_ <- lubridate::day(Sys.Date())
flights %>%
filter(
month == month_,
day == day_
)
## # A tibble: 661 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 11 29 518 515 3 749 808
## 2 2013 11 29 544 540 4 850 850
## 3 2013 11 29 545 550 -5 1023 1027
## 4 2013 11 29 552 600 -8 851 856
## 5 2013 11 29 555 600 -5 1054 1043
## 6 2013 11 29 556 600 -4 812 825
## 7 2013 11 29 558 600 -2 712 730
## 8 2013 11 29 609 615 -6 750 817
## 9 2013 11 29 610 615 -5 749 818
## 10 2013 11 29 612 615 -3 904 920
## # … with 651 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Find all flights that departed between 8:00 AM and 10:00 PM.
flights %>%
filter(between(dep_time, ___, ___))
► Solution:
flights %>%
filter(dep_time >= 800, dep_time <= 2200)
## # A tibble: 267,608 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 800 800 0 1022 1014
## 2 2013 1 1 800 810 -10 949 955
## 3 2013 1 1 801 805 -4 900 919
## 4 2013 1 1 803 810 -7 903 925
## 5 2013 1 1 803 800 3 1132 1144
## 6 2013 1 1 804 810 -6 1103 1116
## 7 2013 1 1 805 805 0 1015 1005
## 8 2013 1 1 805 800 5 1118 1106
## 9 2013 1 1 805 815 -10 1006 1010
## 10 2013 1 1 807 810 -3 1043 1043
## # … with 267,598 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
flights %>%
filter(between(dep_time, 800, 2200))
## # A tibble: 267,608 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 800 800 0 1022 1014
## 2 2013 1 1 800 810 -10 949 955
## 3 2013 1 1 801 805 -4 900 919
## 4 2013 1 1 803 810 -7 903 925
## 5 2013 1 1 803 800 3 1132 1144
## 6 2013 1 1 804 810 -6 1103 1116
## 7 2013 1 1 805 805 0 1015 1005
## 8 2013 1 1 805 800 5 1118 1106
## 9 2013 1 1 805 815 -10 1006 1010
## 10 2013 1 1 807 810 -3 1043 1043
## # … with 267,598 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Find all flights that departed in the three winter months.
flights %>%
filter(month ___ c(___)))
► Solution:
flights %>%
filter(month %in% c(12, 1, 2))
## # A tibble: 80,090 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## 7 2013 1 1 555 600 -5 913 854
## 8 2013 1 1 557 600 -3 709 723
## 9 2013 1 1 557 600 -3 838 846
## 10 2013 1 1 558 600 -2 753 745
## # … with 80,080 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
winter_months <- c(12, 1, 2)
flights %>%
filter(month %in% winter_months)
## # A tibble: 80,090 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## 7 2013 1 1 555 600 -5 913 854
## 8 2013 1 1 557 600 -3 709 723
## 9 2013 1 1 557 600 -3 838 846
## 10 2013 1 1 558 600 -2 753 745
## # … with 80,080 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Are there any flights where departure time is later than arrival time? What does this mean?
flights %>%
filter(_____)
► Solution:
flights %>%
filter(dep_time > arr_time)
## # A tibble: 10,633 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 1929 1920 9 3 7
## 2 2013 1 1 1939 1840 59 29 2151
## 3 2013 1 1 2058 2100 -2 8 2359
## 4 2013 1 1 2102 2108 -6 146 158
## 5 2013 1 1 2108 2057 11 25 39
## 6 2013 1 1 2120 2130 -10 16 18
## 7 2013 1 1 2121 2040 41 6 2323
## 8 2013 1 1 2128 2135 -7 26 50
## 9 2013 1 1 2134 2045 49 20 2352
## 10 2013 1 1 2136 2145 -9 25 39
## # … with 10,623 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Copyright © 2019 Kirill Müller. Licensed under CC BY-NC 4.0.