What’s the most prevalent number of gears for manual or automatic transmissions?
ggplot(data = mpg, aes(x = trans)) +
geom_bar()
► Solution:
Transmission type and number of gears are encoded in the same variable, a simple bar plot helps:
ggplot(data = mpg) +
geom_bar(mapping = aes(x = trans))
Which aesthetic can you map to further discriminate by car class? Which position adjustment is most useful to detect missing combinations of drivetrain and car class?
ggplot(data = mpg, mapping = aes(x = class, _____)) +
geom_bar(position = "___")
► Solution:
ggplot(data = mpg) +
geom_bar(mapping = aes(x = class, fill = trans))
SUVs have automatic mostly, 2-seaters favor six-gear manual, compact and subcompact mostly have five-gear manual transmission. This is much easier to see with position = "dodge"
, the different width of the bars indicates the number of different subgroups for each car class:
ggplot(data = mpg) +
geom_bar(
aes(x = class, fill = trans),
position = "dodge"
)
To use uniform width, specify position_dodge(preserve = "single")
:
ggplot(data = mpg) +
geom_bar(
aes(x = class, fill = trans),
position = position_dodge(preserve = "single")
)
Draw a boxplot of highway fuel economy versus drivetrain. Is fuel economy also affected by the number of cylinders?
Hint: Use factor()
as necessary.
► Solution:
I’m using liters per 100 km as measure for fuel economy here.
ggplot(data = mpg) +
geom_boxplot(mapping = aes(x = drv, y = 235 / hwy))
Forward drivetrains seem much more economical. Does the number of cylinders play a role? I’ll try the “fill” aesthetic:
ggplot(data = mpg) +
geom_boxplot(mapping = aes(x = drv, y = 235 / hwy, fill = cyl))
No dice. Do I also need group =
?
ggplot(data = mpg) +
geom_boxplot(
mapping = aes(
x = drv,
y = 235 / hwy,
fill = cyl,
group = cyl
)
)
The legend reveals that cyl
is a continuous variable. I’ll use its categorical equivalent, because the range is very limited.
ggplot(data = mpg) +
geom_boxplot(
mapping = aes(
x = drv,
y = 235 / hwy,
fill = factor(cyl)
)
)
The default position
setting looks good, let’s try "identity"
:
ggplot(data = mpg) +
geom_boxplot(
mapping = aes(
x = drv,
y = 235 / hwy,
fill = factor(cyl)
),
position = "identity"
)
Find more exercises in Section 3.8.1 of r4ds.
Copyright © 2018 Kirill Müller. Licensed under CC BY-NC 4.0.