Spreading and gathering

  1. Use spread() to convert table2 to table1. What is the meaning of the key and value arguments?

    table2 %>%
      spread(_____)
  2. Use gather() to convert table1 to table2. Try an inclusive and an exclusive selection. Do you need an extra transformation to make the result fully identical? Can you reuse key and value from the previous result?

    table1 %>%
      gather(_____, ___:___)
    
    table1 %>%
      gather(_____, -___:-___)
  3. Use gather() to convert table4a and table4b to table2. Can you do the same with just one gather() call?

    Hint: Use bind_rows() to combine similar tibbles.

    cases_tbl <-
      table4a %>%
      gather(_____) %>%
      mutate(type = "cases")
    
    population_tbl <-
      table4a %>%
      gather(_____) %>%
      mutate(___)
    
    bind_rows(_____) %>%
      _____ %>%
      _____
  4. Find more exercises in Section 12.3.3 of r4ds.

Separating and uniting

  1. Convert table3 to table1 and table2.

    table3 %>%
      separate(
        ___,
        into = c("___", "___"),
        convert = TRUE
      ) %>%
      _____ %>%
      _____
  2. Convert table2 to table3.

    table2 %>%
      _____ %>%
      unite(
        ___,
        ___, ___,
        sep = "/"
      )
  3. Count the flights for each relation in the flights dataset, using just one grouping variable.

    flights %>%
      unite(
        relation,
        ___, ___,
        sep = " -> "
      ) %>%
      count(___)
  4. Find more exercises in Section 12.4.3 of r4ds.

Keys and mutating joins

  1. How are the flights, airlines, and airports datasets connected? Which are primary, which are foreign keys?

    Hint: Use count() to support your hypothesis.

    flights %>%
      count(carrier) %>%
      filter(n > 1)
    
    airlines %>%
      count(___) %>%
      filter(___)
    
    airports %>%
      _____ %>%
      _____
  2. Compute a list of all flights shorter than 300 miles. Use explicit names for the carriers and the destinations. How do you turn off the joining messages? Describe the column names in the result.

    Hint: Use by = c("dest" = "faa").

    flights %>% 
      filter(distance < 300) %>%
      left_join(airlines) %>%
      left_join(airports, by = c("___" = "___"))
  3. Count the number of observations per airline per destination, and convert to wide form using nice labels for better use of screen space. Do you use spread() or gather()? How do you replace the NA values with zeros?

    # The name of the `name` variable isn't very useful,
    # need to rename it
    flights %>% 
      filter(distance < 300) %>%
      left_join(_____) %>%
      rename(carrier_name = ___) %>%
      left_join(_____) %>%
      rename(_____) %>%
      count(_____) %>%
      _____(_____)
  4. Change the code from the last example to use count() right after filter(). What additional steps do you need?

    airline_names <-
      airlines %>%
      _____()
    
    dest_airport_names <-
      _____ %>%
      _____()
    
    flights %>% 
      filter(distance < 300) %>%
      count(_____) %>% 
      left_join(_____, by = "___") %>%
      select(-___) %>% 
      left_join(_____, by = "___") %>% 
      _____(_____)
  5. Find more exercises in Section 13.4.6 of r4ds.

Filtering joins

  1. Find the airports that are serviced by at least one flight. Which airports did not have direct connections in 2013?

    airports %>%
      semi_join(flights, by = c(_____))
    
    airports %>%
      anti_join(flights, by = c(_____))
  2. Find more exercises in Section 13.5.1 of r4ds.

Copyright © 2018 Kirill Müller. Licensed under CC BY-NC 4.0.