1 R and RStudio

1.1 R as a toolkit

R as a toolkit

R as a toolkit

  • Scriptability \(\rightarrow\) R
  • Literate programming (code, narrative, output in one place) \(\rightarrow\) R Markdown
  • Version control \(\rightarrow\) Git / GitHub

1.1.1 Why R and RStudio?

1.1.2 Some R basics

  • You will load packages at the start of every new R session.
    • “Base” R comes with tons of useful built-in functions. It also provides all the tools necessary for you to write your own functions.
    • However, many of R’s best data science functions and tools come from external packages written by other users.
  • R easily and infinitely parallelizes. For free.
    • Compare the cost of a Stata/MP license, nevermind the fact that you effectively pay per core…

1.2 R code examples

1.2.1 Linear regression

fit <- lm(dist ~ 1 + speed, data = cars)
summary(fit)
## 
## Call:
## lm(formula = dist ~ 1 + speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

1.2.2 Base R plot

plot(cars, pch = 19, col = "darkgray")
abline(fit, lwd = 2)

1.2.3 ggplot2

library(ggplot2)
library(gapminder) ## For the gapminder data

ggplot(
  data = gapminder,
  mapping = aes(x = gdpPercap, y = lifeExp)
) +
  geom_point()

1.2.4 gganimate

1.3 R vs. RStudio

  • R is a statistical programming language
  • RStudio is a convenient interface for R (an integrated development environment, IDE)
  • At its simplest:
    • R is like a car’s engine
    • RStudio is like a car’s dashboard
Engine vs. dashboard

Engine vs. dashboard

1.4 R vs. R packages

  • R packages extend the functionality of R by providing additional functions, data, and documentation.

  • They are written by a world-wide community of R users and can be downloaded for no cost

R versus R packages

R versus R packages

1.5 R packages

  • CRAN: A group of people who check that packages fulfill certain standards

  • Mirror: A location on the web where to download R packages from. Because many thousand people download them daily, the load is distributed on different machines. Pick one which is geographically close to you

  • R base/recommended packages: The base installation of R ships with a bunch of default packages. In addition, there are some more packages listed as “recommended”.

“base” packages are managed by the R core team and will only be updated for every R release.

Packages listed as “recommended” inherit the attributes of being widely used and having a long history in the R community.

##     Package Priority
## 1      base     base
## 2  compiler     base
## 3  datasets     base
## 4  graphics     base
## 5 grDevices     base
## 6      grid     base
## 7   methods     base
## 8  parallel     base
##       Package    Priority
## 1        boot recommended
## 2       class recommended
## 3     cluster recommended
## 4   codetools recommended
## 5     foreign recommended
## 6  KernSmooth recommended
## 7     lattice recommended
## 8        MASS recommended
## 9      Matrix recommended
## 10       mgcv recommended
##  [ reached 'max' / getOption("max.print") -- omitted 2 rows ]

1.6 .Rprofile

  • File in your home directory ~/.Rprofile

  • Will be executed before every R session starts

  • Useful to set global options and for loading of often used packages

1.7 .Renviron

  • File in your home directory ~/.Renviron

  • Used to set environment variables

  • Used to store “Access tokens” (Github, CI provider, C++ flags)

1.8 RStudio

\(\rightarrow\) Exists to boost your productivity

\(\rightarrow\) Change the defaults to your liking so you actually can be productive

\(\rightarrow\) Keybindings = productivity

Since RStudio v1.3 a portable JSON settings file exists.

If you want to have sane settings without much hassle, you can execute the following R code: source("https://bit.ly/rstudio-pat")

This code will change/overwrite your existing RStudio settings and

  • set custom keybindings

  • move the console panel to the top-right (by default bottom-left)

  • Enable/Disable some core settings to have a better overall experience


R scripts (source code) are written in the Source pane (Editor).

Source pane

Source pane

(Source of all following RStudio screenshots: https://github.com/edrubin/EC525S19)


You can use the menubar or ⇧+⌘+N / ⇧+CTRL+N to create new R scripts.

New script

New script


To execute commands from your R script, use ⌘+Enter / CTRL+Enter.

Execute commands

Execute commands

RStudio will execute the command in the console.

Console output

Console output

You can see the new object in the Environment pane.

Environment pane

Environment pane


The History tab records your old commands.

History pane

History pane


The Files pane is the file explorer.

Files pane

Files pane


The Plots pane/tab shows… plots.

Plots pane

Plots pane


Packages shows installed packages

Packages pane

Packages pane


Packages shows installed packages and whether they are loaded.

Loaded and installed packages

Loaded and installed packages


The Help tab shows help documentation (also accessible via ?).

Help pane

Help pane


Finally, you can customize the actual layout

Customize layout

Customize layout

1.9 RStudio addins

RStudio can be further enhanced by so called “addins”. These are clickable snippets that execute certain actions in RStudio.

They aim to make repetitive tasks easier and to save you time. There is an addin called addinslist which lists all available addins. It can be installed as a normal package from CRAN:

install.packages("addinslist")

To have an addin available in RStudio after installation, RStudio needs to be restarted.

1.10 RStudio projects

Without a project, you will need to define long file paths which only exist on your machine.

sample_df <- read.csv("/Users/<yourname>/somewhere/on/this/machine/sample.csv")

With a project, R automatically references the project’s folder as the current working directory.

From there on, you can use relative paths to point to files.

sample_df <- read.csv("sample.csv")

Double-plus bonus: The here package extends RStudio project philosophy even more and helps in cases when not using RStudio (e.g. on the command line).

© Allison Horst

Figure 1.1: © Allison Horst

1.11 Alternatives to RStudio

  • Using R directly in the terminal via radian (optimized R console interpreter)

  • R is supported in other “general purpose IDE’s” (VScode, Sublime Text, Atom, Vim, etc.)