RStudio walkthrough

  1. Evaluate a few arithmetic expressions in the console. Do operator precendence rules hold?
    • Also try the modulo operator %%, the integer division %/%, and functions such as floor(), sin(), max().
  2. Practice:
    • repeating and searching expressions in the history,
    • aborting and restarting the entry of an expression
  3. What happens if you type 5 + <enter> 3 <enter>? Is the result different from 5 <enter> + 3 <enter>? Why?

  4. Practice:
    • restarting the R session
    • other shortcuts from the RStudio cheat sheet

Setup

  1. Choose “File → New Project… → New Directory → New Project” from the menu.

  2. Type a name for your new directory, and choose a directory on your local or network drive.

  3. OK.

  4. Close and reopen RStudio. Double-check that the project is still active.

  5. Type and run getwd() in the console, double-check that the output matches the path to your newly created directory.

  6. Choose “Tools → Global Options…” from the menu, and there uncheck the fourth checkbox (“Restore .RData into workspace on startup”) and select “Never” in the combobox below (“Save workspace to .RData on exit”).

Getting help

  1. Look up the help for the following objects and functions:
    • diamonds
    • geom_boxplot()
    • near()

    Hint: Make sure you have loaded the tidyverse package with library(tidyverse).

  2. Copy-paste the example for near() in your console.

  3. Run the example for geom_boxplot().

    Hint: Look up in the help what example() does.

  4. Find the example for geom_boxplot() on http://ggplot2.tidyverse.org. Do you have a preference for either?

  5. The example() function is part of base R, can you find it on https://rdocumentation.org? Can you find the other functions you looked up earlier in this example?

Scatterplots

  1. Print the mpg dataset to your console.

  2. View mpg in RStudio’s dataset pane (via View()). Display all Audis sorted by year then cylinder. What is the maximum number of highway miles per gallon in this dataset?

  3. Create a few scatterplots using the following template (here with hwy versus displ):

    ggplot(data = mpg) +
      geom_point(mapping = aes(x = displ, y = hwy))

    Share particularly interesting or boring examples you might encounter. Did you find anything surprising?

  4. Can you plot highway fuel economy given as liters per 100 kilometers against engine displacement?

    Hint: Use the formula 235 / hwy to convert from miles per gallon.

  5. Find more examples in Section 3.2.4 of the “R for data science” book (http://r4ds.had.co.nz).

Scripts

  1. Choose “File → New file → R script” to create a new script.

  2. Save it, pick a file name. The extension .R will be added automatically.

  3. Add code to load the tidyverse package on the first line. Save the script.

  4. Copy some expressions and plots you have run before from the history to your script. Save the script.

  5. Use Ctrl + Enter to source a single expression from the script.

  6. Use Ctrl + Shift + Enter to source the entire script.

  7. Restart the R session (via Ctrl + Shift + F10 or via “Session → Restart R”). Source the entire script.

  8. Choose “File → Knit document” to render the entire output to a single document.

  9. Everything after # is a comment. Add a regular comment and one prefixed with #' (hash apostroph). Knit the document. Can you see the difference?

  10. Assign a plot to a variable by changing ggplot(... to variable <- ggplot(..., choose a meaningful name. Knit the document. What changes?

  11. In a separate line, write print(variable) (or simply variable), using the name you have chosen above.

Scatterplots with three or more variables

  1. In the hwy vs. displ plot, map an additional variable to the “color” aesthetic. Which cars consume more fuel than expected by the general trend?

  2. Experiment with the “color”, “shape”, “size”, and “alpha” aesthetics. Which combinations of attribute class (categorical/continuous) and aesthetics work well, which don’t? Expand on the more surprising examples in the previous exercises.

    Hint: Use factor(year) to convert continuous variables with a limited set of values to categorical variables.

  3. Can you change both color and shape at the same time? What about the other aesthetics?

  4. What happens if you map the same variable to more than one aesthetics?

  5. Find more exercises in Section 3.3.1 of r4ds.

Manual aesthetics

  1. Plot hwy vs. displ with approx. 1/3 opacity for each point, in blue.

    Hint: Use quotes " for the color.

  2. What happens if you assign a variable, e.g. year, to an aesthetic outside the aes() call?

  3. What happens if you assign a constant, e.g. 3, to the “size” aesthetic inside the aes() call?

  4. What values are valid for color, alpha, shape and size?

Other point geoms

  1. What do the arguments se and method to geom_smooth() change?

  2. What does geom_rug() do? Try to reduce overplotting with transparency or by adding position = "jitter". How do you reduce overplotting for the points layer?

  3. How does the order of the geom_...() calls affect the display?

  4. Can you plot both highway and city economy in one plot?

    Hint: The solution to this exercise is not the recommended way of doing this in ggplot2. We’ll find a better way in a subsequent exercise.

  5. Find more exercises in Section 3.6.1 of r4ds.

Statistical transformations

  1. Use a bar plot to find out how many cars of each drivetrain (front/rear/4wd) the mpg dataset contains. Which aesthetic mappings do you need to specify?

    Hint: Find the relevant geom by typing geom_ on the console or in your script file.

  2. Does the appearance of the plot change when you add y = ..count.. to the aes() call? Why/why not?

  3. What happens if you instead use y = ..prop.., group = 1 in the aes() call? What happens if you omit group = 1? Why?

    Hint: The section “Computed variables” to geom_bar() offers a brief explanation.

  4. Visualize the overall distribution of fuel economy in the dataset with a histogram. Compare this with a frequency polygon, use a second layer if you like.

  5. Visualize the distribution of fuel economy in the dataset per drivetrain. Do you prefer a histogram or a frequency polygon?

  6. Find more exercises in Section 3.7.1 of r4ds.

Position adjustments

  1. What’s the most prevalent number of gears for manual or automatic transmissions?

  2. Which aesthetic can you map to further discriminate by car class? Which position adjustment is most useful to detect missing combinations of drivetrain and car class?

  3. Draw a boxplot of highway fuel economy versus drivetrain. Is fuel economy also affected by the number of cylinders?

    Hint: Use factor() as necessary.

  4. Find more exercises in Section 3.8.1 of r4ds.

Faceting

  1. Has fuel economy changed considerably between 1999 and 2008? Perhaps there is a difference if you also consider the car class? Experiment with facet_wrap(), facet_grid(), aesthetic mappings, and smoothing layers.

  2. What changes if you add the argument labeller = "label_both" to the facet_wrap() call?

  3. Is it possible to use a different x and y scale for each facet? How?

  4. Experiment with other arguments to facet_wrap() and facet_grid().

  5. Find more exercises in Section 3.5.1 of r4ds.

Copyright © 2017 Kirill Müller. Licensed under CC BY-NC 4.0.