dplyr exercises part 2

Spreading and gathering

Use spread() to convert table2 to table1. What is the meaning of the key and value arguments?
```
table2 %>%
  spread(_____)
```
Use gather() to convert table1 to table2. Try an inclusive and an exclusive selection. Do you need an extra transformation to make the result fully identical? Can you reuse key and value from the previous result?
```
table1 %>%
  gather(_____, ___:___)

table1 %>%
  gather(_____, -___:-___)
```

Visualize the data: plot cases, population, and both. Which of table1 or table2 is more suitable in which case?

___ %>%
  ggplot(aes(___)) +
  geom_col()

___ %>%
  ggplot(aes(___)) +
  geom_col() +
  facet_grid(___ ~ ___, scales = "free")

___ %>%
  ggplot(aes(___, ___)) +
  geom_point()

Use gather() to convert table4a and table4b to table2. Can you do the same with just one gather() call?

Hint: Use bind_rows() to combine similar tibbles.

cases_tbl <-
  table4a %>%
  gather(_____) %>%
  mutate(type = "cases")

population_tbl <-
  table4a %>%
  gather(_____) %>%
  mutate(___)

bind_rows(_____) %>%
  _____ %>%
  _____

Create a scatterplot from the mpg dataset that shows both highway and city fuel economy against engine displacement with two different colors using only one geom_point() call.
```
mpg %>%
  _____ %>%
  ggplot(aes(x = displ, y = ___)) +
  geom_point()
```
Find more exercises in Section 12.3.3 of r4ds.

Separating and uniting

Convert table3 to table1 and table2.

table3 %>%
  separate(
    ___,
    into = c("___", "___"),
    convert = TRUE
  ) %>%
  _____ %>%
  _____

Convert table2 to table3.

table2 %>%
  _____ %>%
  unite(
    ___,
    ___, ___,
    sep = "/"
  )

Count the flights for each relation in the flights dataset, using just one grouping variable.
```
flights %>%
  unite(
    relation,
    ___, ___,
    sep = " -> "
  ) %>%
  count(___)
```
Find more exercises in Section 12.4.3 of r4ds.

Keys and mutating joins

How are the flights, carriers, and airports datasets connected? Which are primary, which are foreign keys?

Hint: Use count() to support your hypothesis.
```
flights %>%
  count(carrier) %>%
  count(n)

airlines %>%
  count(_____) %>%
  _____
```

Plot a heat map of destination by airline for all flights shorter than 300 miles. Use explicit names for the carriers and the destinations. Does the result change if you use a full join? Do you use geom_raster() or geom_bin2d()?

Hint: Use by = c("dest" = "faa").

flights %>% 
  filter(distance < 300) %>%
  count(dest, carrier) %>%
  left_join(airlines) %>%
  left_join(airports, by = c(___))

# The name of the `name` variable isn't very useful,
# need to rename it before plotting
flights %>% 
  filter(distance < 300) %>%
  count(dest, carrier) %>%
  left_join(_____) %>%
  rename(___) %>%
  left_join(_____) %>%
  rename(___) %>% 
  ggplot() +
  geom_raster(aes(___))

Find more exercises in Section 13.4.6 of r4ds.

Filtering joins

Find the airports that are serviced by at least one flight. Which airports did not have direct connections in 2013?
```
airports %>%
  semi_join(flights, by = c(_____))

airports %>%
  anti_join(flights, by = c(_____))
```
Find more exercises in Section 13.5.1 of r4ds.

dplyr exercises part 2

Kirill Müller

Spreading and gathering

Separating and uniting

Keys and mutating joins

Filtering joins