class: center, middle, inverse, title-slide # Reproducible workflows with the
drake
R package ## make for R ### Kirill Müller ### 2019-04-14 --- # Why *drake*? 1. too many slow R scripts? 1. *rmarkdown* code chunks too heavy? .right[ # Save time and ensure reproducibility! ## data analysis → set of data transformations ] ---
--- ```r library(drake) library(tidyverse) create_plot <- function(data) { ggplot(data, aes(x = Petal.Width, fill = Species)) + geom_histogram() } plan <- drake_plan( raw_data = readxl::read_xlsx(file_in("raw-data.xlsx")), data = raw_data %>% mutate(Species = forcats::fct_inorder(Species)), hist = create_plot(data), fit = lm(Sepal.Width ~ Petal.Width + Species, data), report = rmarkdown::render( knitr_in("report.Rmd"), output_file = file_out("report.pdf"), quiet = TRUE ) ) ``` --- ```r plan ## # A tibble: 5 x 2 ## target command ## <chr> <expr> ## 1 raw_data readxl::read_xlsx(file_in("raw-data.xlsx")) … ## 2 data raw_data %>% mutate(Species = forcats::fct_inorder(Sp… ## 3 hist create_plot(data) … ## 4 fit lm(Sepal.Width ~ Petal.Width + Species, data) … ## 5 report rmarkdown::render(knitr_in("report.Rmd"), output_file… ``` --- .pull-left[ ```r make(plan) ## target raw_data ## target data ## target fit ## target hist ## target report ``` ] .pull-right[ <img src="index_files/figure-html/make-report-1.png" width="1727" /> ] --- ```r summary(readd(fit)) ## ## Call: ## lm(formula = Sepal.Width ~ Petal.Width + Species, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.17017 -0.19105 0.00793 0.19173 0.85172 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.23587 0.05194 62.295 < 2e-16 *** ## Petal.Width 0.78102 0.12121 6.443 1.59e-09 *** ## Speciesversicolor -1.50150 0.14407 -10.422 < 2e-16 *** ## Speciesvirginica -1.84421 0.22399 -8.234 9.35e-14 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.3008 on 146 degrees of freedom ## Multiple R-squared: 0.5335, Adjusted R-squared: 0.5239 ## F-statistic: 55.65 on 3 and 146 DF, p-value: < 2.2e-16 ``` --- ```r loadd(data, hist) data ## # A tibble: 150 x 6 ## index Sepal.Length Sepal.Width Petal.Length Petal.Width ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 5.1 3.5 1.4 0.2 ## 2 2 4.9 3 1.4 0.2 ## 3 3 4.7 3.2 1.3 0.2 ## 4 4 4.6 3.1 1.5 0.2 ## 5 5 5 3.6 1.4 0.2 ## 6 6 5.4 3.9 1.7 0.4 ## 7 7 4.6 3.4 1.4 0.3 ## 8 8 5 3.4 1.5 0.2 ## 9 9 4.4 2.9 1.4 0.2 ## 10 10 4.9 3.1 1.5 0.1 ## # … with 140 more rows, and 1 more variable: Species <fct> ``` --- ```r create_plot <- function(data) { ggplot(data, aes(x = Petal.Width, fill = Species)) + geom_histogram() } plan <- drake_plan( raw_data = readxl::read_xlsx(file_in("raw-data.xlsx")), data = raw_data %>% mutate(Species = forcats::fct_inorder(Species)) %>% * select(Sepal.Length:Species), hist = create_plot(data), fit = lm(Sepal.Width ~ Petal.Width + Species, data), report = rmarkdown::render( knitr_in("report.Rmd"), output_file = file_out("report.pdf"), quiet = TRUE ) ) ``` ---
--- ```r make(plan) ## target data ## target fit ## target hist ## target report readd(data) ## # A tibble: 150 x 5 ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## <dbl> <dbl> <dbl> <dbl> <fct> ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ## # … with 140 more rows ``` --- ```r readd(hist) ## `stat_bin()` using `bins = 30`. Pick better value with ## `binwidth`. ``` ![](index_files/figure-html/show-hist-1.png)<!-- --> --- ```r create_plot <- function(data) { ggplot(data, aes(x = Petal.Width, fill = Species)) + * geom_histogram(binwidth = 0.25) + * theme_gray(20) } plan <- drake_plan( raw_data = readxl::read_xlsx(file_in("raw-data.xlsx")), data = raw_data %>% mutate(Species = forcats::fct_inorder(Species)) %>% select(Sepal.Length:Species), hist = create_plot(data), fit = lm(Sepal.Width ~ Petal.Width + Species, data), report = rmarkdown::render( knitr_in("report.Rmd"), output_file = file_out("report.pdf"), quiet = TRUE ) ) ``` --- .pull-left[ ```r make(plan) ## target hist ## target report ``` ] .pull-right[ <img src="index_files/figure-html/make-report-fixed-hist-1.png" width="1727" /> ] ---
--- class: inverse, middle, center