Use spread()
to convert table2
to table1
. What is the meaning of the key
and value
arguments?
table2 %>%
spread(_____)
Use gather()
to convert table1
to table2
. Try an inclusive and an exclusive selection. Do you need an extra transformation to make the result fully identical? Can you reuse key
and value
from the previous result?
table1 %>%
gather(_____, ___:___)
table1 %>%
gather(_____, -___:-___)
Visualize the data: plot cases, population, and both. Which of table1
or table2
is more suitable in which case?
___ %>%
ggplot(aes(___)) +
geom_col()
___ %>%
ggplot(aes(___)) +
geom_col() +
facet_grid(___ ~ ___, scales = "free")
___ %>%
ggplot(aes(___, ___)) +
geom_point()
Use gather()
to convert table4a
and table4b
to table2
. Can you do the same with just one gather()
call?
Hint: Use bind_rows()
to combine similar tibbles.
cases_tbl <-
table4a %>%
gather(_____) %>%
mutate(type = "cases")
population_tbl <-
table4a %>%
gather(_____) %>%
mutate(___)
bind_rows(_____) %>%
_____ %>%
_____
Create a scatterplot from the mpg
dataset that shows both highway and city fuel economy against engine displacement with two different colors using only one geom_point()
call.
mpg %>%
_____ %>%
ggplot(aes(x = displ, y = ___)) +
geom_point()
Find more exercises in Section 12.3.3 of r4ds.
Convert table3
to table1
and table2
.
table3 %>%
separate(
___,
into = c("___", "___"),
convert = TRUE
) %>%
_____ %>%
_____
Convert table2
to table3
.
table2 %>%
_____ %>%
unite(
___,
___, ___,
sep = "/"
)
Count the flights for each relation in the flights
dataset, using just one grouping variable.
flights %>%
unite(
relation,
___, ___,
sep = " -> "
) %>%
count(___)
Find more exercises in Section 12.4.3 of r4ds.
How are the flights
, airlines
, and airports
datasets connected? Which are primary, which are foreign keys?
Hint: Use count()
to support your hypothesis.
flights %>%
count(carrier) %>%
filter(n > 1)
airlines %>%
count(___) %>%
filter(___)
airports %>%
_____ %>%
_____
Compute a list of all flights shorter than 300 miles. Use explicit names for the carriers and the destinations. How do you turn off the joining messages? Describe the column names in the result.
Hint: Use by = c("dest" = "faa")
.
flights %>%
filter(distance < 300) %>%
count(dest, carrier) %>%
left_join(airlines) %>%
left_join(airports, by = c("___" = "___"))
Count the number of observations per airline per destination, and convert to wide form using nice labels for better use of screen space. Do you use spread()
or gather()
? How do you replace the NA
values with zeros?
# The name of the `name` variable isn't very useful,
# need to rename it
flights %>%
filter(distance < 300) %>%
count(dest, carrier) %>%
left_join(_____) %>%
rename(carrier_name = ___) %>%
left_join(_____) %>%
rename(_____) %>%
count(_____) %>%
_____(_____)
Change the code from the last example to use count()
right after filter()
. What additional steps do you need?
airline_names <-
airlines %>%
_____()
dest_airport_names <-
_____ %>%
_____()
flights %>%
filter(distance < 300) %>%
count(_____) %>%
left_join(_____, by = "___") %>%
select(-___) %>%
left_join(_____, by = "___") %>%
_____(_____)
Plot a heat map of destination by airline for all flights shorter than 300 miles, with explicit names. Do you use geom_raster()
or geom_bin2d()
?
Hint: Use by = c("dest" = "faa")
.
short_distance_destinations <-
_____
short_distance_destinations %>%
ggplot() +
geom____(aes(___))
Find more exercises in Section 13.4.6 of r4ds.
Find the airports that are serviced by at least one flight. Which airports did not have direct connections in 2013?
airports %>%
semi_join(flights, by = c(_____))
airports %>%
anti_join(flights, by = c(_____))
Find more exercises in Section 13.5.1 of r4ds.
Copyright © 2018 Kirill Müller. Licensed under CC BY-NC 4.0.