Scatterplots

  1. Print the mpg dataset to your console.

  2. View mpg in RStudio’s dataset pane (via View()). Display all Audis sorted by year then cylinder. What is the maximum number of highway miles per gallon in this dataset?

    View(___)
  3. Create a few scatterplots using the following template (here with hwy versus displ):

    ggplot(data = mpg) +
      geom_point(mapping = aes(x = displ, y = hwy))

    Share particularly interesting or boring examples you might encounter. Did you find anything surprising?

  4. Can you plot highway fuel economy given as liters per 100 kilometers against engine displacement?

    Hint: Use the formula 235 / hwy to convert from miles per gallon.

    ggplot(data = ___) +
      geom_point(mapping = aes(x = displ, y = ___))
  5. How can you reduce overplotting? Use your favorite search engine to find out.

    ggplot(data = ___) +
      geom____(_____)
  6. Find more examples in Section 3.2.4 of the “R for data science” book (http://r4ds.had.co.nz).

Scatterplots with three or more variables

  1. In the hwy vs. displ plot, map an additional variable to the “color” aesthetic. Which cars consume more fuel than expected by the general trend?

    ggplot(_____) +
      geom_point(mapping = aes(x = ___, y = ___, color = ___))
  2. Experiment with the “color”, “shape”, “size”, and “alpha” aesthetics. Which combinations of attribute class (categorical/continuous) and aesthetics work well, which don’t? Expand on the more surprising examples in the previous exercises.

    Hint: Use factor(year) to convert continuous variables with a limited set of values to categorical variables.

    ggplot(_____) +
      geom_point(mapping = aes(x = ___, y = ___, ___ = ___))
  3. Can you change both color and shape at the same time? What about the other aesthetics?

    ggplot(_____) +
      geom_point(mapping = aes(x = ___, y = ___, ___ = ___, ___ = ___))
  4. What happens if you map the same variable to more than one aesthetic?

    ggplot(_____) +
      geom_point(mapping = aes(x = ___, y = ___, ___ = ___, ___ = ___))
  5. Find more exercises in Section 3.3.1 of r4ds.

Manual aesthetics

  1. Plot hwy vs. displ with approx. 1/3 opacity for each point, in blue.

    Hint: Use quotes " for the color.

    ggplot(_____) +
      geom_point(
        mapping = aes(_____),
        alpha = ___,
        color = "___"
      )
  2. What happens if you assign a variable, e.g. year, to an aesthetic outside the aes() call?

    ggplot(_____) +
      geom_point(
        mapping = aes(_____),
        color = year
      )
  3. What happens if you assign a constant, e.g. 3, to the “size” aesthetic inside the aes() call?

    ggplot(_____) +
      geom_point(
        mapping = aes(_____, size = 3)
      )
  4. What values are valid for color, alpha, shape and size?

    ggplot(_____) +
      geom_point(
        mapping = aes(_____),
        ___ = ___
      )
  5. Find how to update the default point color for all scatterplots in the help for update_geom_defaults(). Does the setting persist between plots? Is it still active after restarting RStudio (Session → Restart R or Ctrl + Shift + F10, Cmd + Shift + F10 on the Mac).

  6. Find how to update the default point color for all label texts in the help for theme(). How to set or update the theme for all subsequent plots?

    ggplot(_____) +
      _____ +
      theme(___)

Other point geoms

  1. Try geom_smooth(). What do the arguments se and method to geom_smooth() change?

    ggplot(data = mpg) +
      geom_smooth(
        mapping = aes(x = displ, y = hwy),
        se = ___,
        method = ___
      )
  2. What does geom_rug() do? Try to reduce overplotting with transparency or by adding position = "jitter". How do you reduce overplotting for the points layer?

    ggplot(data = mpg) +
      geom_point(
        mapping = aes(x = displ, y = hwy),
        ___ = ___
      ) +
      geom_rug(
        mapping = aes(x = displ, y = hwy),
        ___ = ___
      )
  3. How does the order of the geom_...() calls affect the display?

    ggplot(data = ___, mapping = aes(_____)) +
      geom_point() +
      geom_smooth()
    ggplot(data = ___, mapping = aes(_____)) +
      geom_smooth() +
      geom_point()
  4. Can you plot both highway and city economy in one plot?

    Hint: The solution to this exercise is not the recommended way of doing this in ggplot2. We’ll find a better way in a subsequent exercise.

    ggplot(_____) +
      geom_point(mapping = _____, color = "___") +
      geom_point(mapping = _____, color = "___")
  5. Use a bar plot to find out how many cars of each drivetrain (front/rear/4wd) the mpg dataset contains. Which aesthetic mappings do you need to specify?

    Hint: Find the relevant geom by typing geom_ on the console or in your script file.

    ggplot(_____, aes(_____)) +
      geom_bar()
  6. Does the appearance of the plot change when you add y = ..count.. to the aes() call? Why/why not?

    ggplot(_____, aes(_____, y = ..count..)) +
      geom_bar()
  7. What happens if you instead use y = ..prop.., group = 1 in the aes() call? What happens if you omit group = 1? Why?

    Hint: The section “Computed variables” to geom_bar() offers a brief explanation.

    ggplot(_____, aes(_____, y = ..prop.., group = 1)) +
      geom_bar()
  8. Visualize the overall distribution of fuel economy in the dataset with a histogram. Compare this with a frequency polygon, use a second layer if you like.

    ggplot(data = mpg, mapping = aes(x = hwy)) +
      geom_____()
  9. How do you remove the warning in the previous example?

    ggplot(_____) +
      geom_____(binwidth = ___)
  10. Visualize the distribution of fuel economy in the dataset per drivetrain. Do you prefer a histogram or a frequency polygon?

  11. Find more exercises in Sections 3.6.1 and 3.7.1 of r4ds.

Position adjustments

  1. What’s the most prevalent number of gears for manual or automatic transmissions?

    ggplot(data = mpg, aes(x = trans)) +
      geom_bar()
  2. Which aesthetic can you map to further discriminate by car class? Which position adjustment is most useful to detect missing combinations of drivetrain and car class?

    ggplot(data = mpg, mapping = aes(x = class, _____)) +
      geom_bar(position = "___")
  3. Draw a boxplot of highway fuel economy versus drivetrain. Is fuel economy also affected by the number of cylinders?

    Hint: Use factor() as necessary.

  4. Find more exercises in Section 3.8.1 of r4ds.

Faceting

  1. Has fuel economy changed considerably between 1999 and 2008? Perhaps there is a difference if you also consider the car class? Experiment with facet_wrap(), facet_grid(), aesthetic mappings, and smoothing layers.

    ggplot(_____) +
      geom_point() +
      facet_wrap(~___)
    ggplot(_____) +
      geom_point() +
      geom_smooth() + 
      facet_grid(___ ~ ___)
  2. What changes if you add the argument labeller = "label_both" to the facet_wrap() call?

  3. Is it possible to use a different x and y scale for each facet? How?

  4. Experiment with other arguments to facet_wrap() and facet_grid().

  5. Find more exercises in Section 3.5.1 of r4ds.

Copyright © 2018 Kirill Müller. Licensed under CC BY-NC 4.0.