What’s the most prevalent number of gears for manual or automatic transmissions?
ggplot(data = mpg, aes(x = trans)) +
  geom_bar()► Solution:
 Transmission type and number of gears are encoded in the same variable, a simple bar plot helps:
ggplot(data = mpg) +
  geom_bar(mapping = aes(x = trans))
Use the fill aesthetic to further discriminate by car class. Can you explain why the parts representing transmission types are stacked on top of each other?
ggplot(data = mpg, mapping = aes(x = class, _____))► Solution:
ggplot(data = mpg) +
  geom_bar(mapping = aes(x = class, fill = trans))
position argument to geom_bar() is "stack". This means that related geoms are stacked on top of each other.
Apply a position adjustment to make it easier to detect missing combinations of drivetrain and car class.
ggplot(data = mpg, mapping = aes(x = class, _____)) +
  geom_bar(position = "___")► Solution:
ggplot(data = mpg) +
  geom_bar(
    aes(x = class, fill = trans), 
    position = "dodge"
  )
To use uniform width, specify position_dodge(preserve = "single"):
ggplot(data = mpg) +
  geom_bar(
    aes(x = class, fill = trans), 
    position = position_dodge(preserve = "single")
  ) 
Draw a boxplot of highway fuel economy versus drivetrain. Is fuel economy also affected by the number of cylinders?
Hint: Use factor() as necessary.
► Solution:
 I’m using liters per 100 km as measure for fuel economy here.
ggplot(data = mpg) +
  geom_boxplot(mapping = aes(x = drv, y = 235 / hwy))
Forward drivetrains seem much more economical. Does the number of cylinders play a role? I’ll try the “fill” aesthetic:
ggplot(data = mpg) +
  geom_boxplot(mapping = aes(x = drv, y = 235 / hwy, fill = cyl))
No dice. Do I also need group = ?
ggplot(data = mpg) +
  geom_boxplot(
    mapping = aes(
      x = drv,
      y = 235 / hwy,
      fill = cyl,
      group = cyl
    )
  )
The legend reveals that cyl is a continuous variable. I’ll use its categorical equivalent, because the range is very limited.
ggplot(data = mpg) +
  geom_boxplot(
    mapping = aes(
      x = drv,
      y = 235 / hwy,
      fill = factor(cyl)
    )
  )
The default position setting looks good, we use position_dodge(preserve = "single") again for uniform width:
ggplot(data = mpg) +
  geom_boxplot(
    mapping = aes(
      x = drv,
      y = 235 / hwy,
      fill = factor(cyl)
    ),
    position = position_dodge(preserve = "single")
  )
Find more exercises in Section 3.8.1 of r4ds.
Copyright © 2019 Kirill Müller. Licensed under CC BY-NC 4.0.