Spreading and gathering

  1. Use spread() to convert table2 to table1. What is the meaning of the key and value arguments?

  2. Use gather() to convert table1 to table2. Do you need an extra transformation to make the result fully identical? Can you reuse key and value from the previous result?

  3. Use a bar chart to visualise the data. Which of table1 or table2 is more suitable for plotting?

  4. Use gather() to convert table4a and table4b to table2.

    Hint: Use bind_rows() to combine similar tibbles.

  5. Create a scatterplot from the mpg dataset that shows both highway and city fuel economy against engine displacement with two different colors using only one geom_point() call.

  6. Find more exercises in Section 12.3.3 of r4ds.

Separating and uniting

  1. Convert table3 to table1 and table2.

  2. Convert table2 to table3.

  3. Use separate() to compute departure and arrival hours and minutes in the flights dataset.

  4. Find more exercises in Section 12.4.3 of r4ds.

Keys and mutating joins

  1. Do you see a problem in the presidential dataset? Can you see how does this affect the following bar plot without actually running the code?

    presidential %>%
      mutate(term = end - start) %>%
      ggplot() +
      geom_bar(aes(name, term))
  2. How are the flights, carriers, and airports datasets connected? Which are primary, which are foreign keys?

    Hint: Use count() to support your hypothesis.

  3. Plot a heat map of destination by airline for all flights shorter than 300 miles. Use explicit names for the carriers and the destinations. Does the result change if you use a full join? Do you use geom_raster() or geom_bin2d()?

    Hint: Use by = c("dest" = "faa").

  4. Find more exercises in Section 13.4.6 of r4ds.

Filtering joins

  1. Find the airports that are serviced by at least one flight. Which airports did not have direct connections in 2013?

  2. Find more exercises in Section 13.5.1 of r4ds.

Assignment 5

  1. Create a single plot that shows time trends of mortality or incidence rates for HIV, TB, and malaria cases for all countries in all high-impact regions, compared against the year 2000 baseline. Each panel should show the data for one region.

    Hint: Use first() with a grouped mutate().

  1. Create a single plot that shows time trends of the number of people living with HIV/AIDS in need for ART (people_living_with_hiv_aids_number and the corresponding lower and higher uncertainty ranges) for all countries in all high-impact regions. Each panel should show the data for one country.

  1. Create a single plot that shows time trends of the number of people living with HIV/AIDS in need for ART aggregated over all countries in a region. Each panel should show the data for one region.

  1. Create a stack bar graph with three bars, each bar represents one disease and the indicator should be number of deaths for each disease in 2015. The stacks should be expressed as percent and three categories of eligibility. This shows across Global Fund supported countries for three diseases what share of deaths is not covered in each disease due to not being eligible.

  1. Load data from sheets 1 to 9 in the Excel file and bring them into a tidy format.

    Hint: The path to the Excel file on the RStudio server is "/data/r-course/courswork_data_tgf.xlsx". Before reading the file, create a link to the "/data/r-course" directory using file.symlink(). Use readxl::read_excel() and look at the documentation of the sheet and range arguments to that function. How do you specify column names?

Case study

  1. Use your own data to answer a question about it using the tools you have learned in this course.

    Hint: To import, use the “readr” (CSV), “readxl” (Excel), or “haven” (SPSS/SAS/Stata) packages. Use the internet to find out how to import other kinds of data, use as_tibble() right after importing to get consistent printing.

  2. Alternatively, create a plot of the total number of tuberculosis cases per year for eight countries of your choice. Can you also plot the share (relative to the overall population of the country)?

Copyright © 2017 Kirill Müller. Licensed under CC BY-NC 4.0.