Print the mpg
dataset to your console.
View mpg
in RStudio’s dataset pane (via View()
). Display all Audis sorted by year then cylinder. What is the maximum number of highway miles per gallon in this dataset?
Create a few scatterplots using the following template (here with hwy
versus displ
):
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
Share particularly interesting or boring examples you might encounter. Did you find anything surprising?
Can you plot highway fuel economy given as liters per 100 kilometers against engine displacement?
Hint: Use the formula 235 / hwy
to convert from miles per gallon.
Find more examples in Section 3.2.4 of the “R for data science” book (http://r4ds.had.co.nz).
In the hwy
vs. displ
plot, map an additional variable to the “color” aesthetic. Which cars consume more fuel than expected by the general trend?
Experiment with the “color”, “shape”, “size”, and “alpha” aesthetics. Which combinations of attribute class (categorical/continuous) and aesthetics work well, which don’t? Expand on the more surprising examples in the previous exercises.
Hint: Use factor(year)
to convert continuous variables with a limited set of values to categorical variables.
Can you change both color and shape at the same time? What about the other aesthetics?
What happens if you map the same variable to more than one aesthetic?
Find more exercises in Section 3.3.1 of r4ds.
Plot hwy
vs. displ
with approx. 1/3 opacity for each point, in blue.
Hint: Use quotes "
for the color.
What happens if you assign a variable, e.g. year
, to an aesthetic outside the aes()
call?
What happens if you assign a constant, e.g. 3
, to the “size” aesthetic inside the aes()
call?
What values are valid for color
, alpha
, shape
and size
?
What do the arguments se
and method
to geom_smooth()
change?
What does geom_rug()
do? Try to reduce overplotting with transparency or by adding position = "jitter"
. How do you reduce overplotting for the points layer?
How does the order of the geom_...()
calls affect the display?
Can you plot both highway and city economy in one plot?
Hint: The solution to this exercise is not the recommended way of doing this in ggplot2. We’ll find a better way in a subsequent exercise.
Find more exercises in Section 3.6.1 of r4ds.
Use a bar plot to find out how many cars of each drivetrain (front/rear/4wd) the mpg
dataset contains. Which aesthetic mappings do you need to specify?
Hint: Find the relevant geom by typing geom_
on the console or in your script file.
Does the appearance of the plot change when you add y = ..count..
to the aes()
call? Why/why not?
What happens if you instead use y = ..prop.., group = 1
in the aes()
call? What happens if you omit group = 1
? Why?
Hint: The section “Computed variables” to geom_bar()
offers a brief explanation.
Visualize the overall distribution of fuel economy in the dataset with a histogram. Compare this with a frequency polygon, use a second layer if you like.
Visualize the distribution of fuel economy in the dataset per drivetrain. Do you prefer a histogram or a frequency polygon?
Find more exercises in Section 3.7.1 of r4ds.
What’s the most prevalent number of gears for manual or automatic transmissions?
Which aesthetic can you map to further discriminate by car class? Which position adjustment is most useful to detect missing combinations of drivetrain and car class?
Draw a boxplot of highway fuel economy versus drivetrain. Is fuel economy also affected by the number of cylinders?
Hint: Use factor()
as necessary.
Find more exercises in Section 3.8.1 of r4ds.
Has fuel economy changed considerably between 1999 and 2008? Perhaps there is a difference if you also consider the car class? Experiment with facet_wrap()
, facet_grid()
, aesthetic mappings, and smoothing layers.
What changes if you add the argument labeller = "label_both"
to the facet_wrap()
call?
Is it possible to use a different x and y scale for each facet? How?
Experiment with other arguments to facet_wrap()
and facet_grid()
.
Find more exercises in Section 3.5.1 of r4ds.
Copyright © 2018 Kirill Müller. Licensed under CC BY-NC 4.0.