class: center, middle, inverse, title-slide # DBI ## Recent developments in R’s database interface ### Kirill Müller,
@krlmlr
### 2018-09-03, R meetup Zurich,
https://bit.ly/2wFJix3
--- background-image: url(img/cynkra.png) background-size: 80% class: bottom, center ### (Commercial) --- # adminR: R in Swiss Official Statistics .pull-left[ **R users in Swiss public adminis-tration**, or institutions with ties to it - Get to know each other - Share our experience - Learn about different use cases of R in public statistics **Thursday, September 20, 2018**<br> 5:15 PM to 7:15 PM State Secretariat for Economic Affairs SECO · Holzikofenweg 36 · **Bern** ] .pull-right[ ![](img/map.png) ] .center[ ### (Commercial) ] --- class: inverse, center # Who has ... <iframe src="https://giphy.com/embed/l41lJGJEyjYTKMxag" width="351" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/afvbabies-babies-afv-aha-l41lJGJEyjYTKMxag">via GIPHY</a></p> ??? Hi! Welcome everybode to my presentation. Today I'll talk about R and databases. Before we start, I'd like to conduct a short survey. --- class: inverse, center, middle # ... worked with R? --- class: inverse, center, middle # ... used a data frame (or _tibble_) in R? --- class: inverse, center, middle # ... worked with a database? ??? A separate server whose only purpose is to store and retrieve data. --- class: inverse, center, middle # ... written a SQL query? ??? ... or "an Ess-Cue-Ell query"? --- class: inverse, center, middle # ... connected to a database from R? ??? to retrieve some data --- class: inverse, center, middle # ... connected to a database from R # using _DBI_ ? ??? Did your code use `dbConnect()` ? --- class: inverse, center, middle # ... defined a database schema? ??? If you don't know what a schema is, you probably haven't. But I'll talk about it later. --- class: inverse, center, middle # ... implemented a _DBI_ backend? ??? If you don't know what a DBI backend is, you probably haven't. --- class: center, middle <center> <table> <tr> <td> ![:scale 200px](img/db-schema.png) </td> <td> <div style="font-size:100px"> ↔ </div> </td> <td> ![:scale 200px](img/Rlogo.svg) </td> </tr> </table> </center> ??? Thanks. Today's presentation will be about connecting *relational* databases to R using the _DBI_ package. I'd like to make this presentation useful for you, no matter if you're new to databases or if you use them daily. I'm going to start with some basic concepts, but I'm also including more advanced examples. - storage for a set of rectangular tables - rows and columns, atomic columns - similar to data frames - "relational": some tables have column with the same meaning (or name), rows from different tables with the same value in these columns are related - databases store their data on disk, optimized for retrieval without reading everything into memory - Postgres, MySQL/MariaDB, SQLite (open-source) - Microsoft SQL Server, Oracle (commercial) - No non-relational: Mongo DB or ElasticSearch, see _nodbi_ by Scott - No other solutions such as the _RODBC_ package --- # Connect to a Postgres database ```r library(RPostgres) con <- dbConnect(Postgres()) dbGetQuery(con, "SELECT 'Hello, world!' AS hi") ``` ``` ## hi ## 1 Hello, world! ``` ```r dbDisconnect(con) ``` ??? This talk will use a lot of code examples, please bear with me. - Explain code line by line - Query string: SQL --- # Connect to a MariaDB database ```r library(RMariaDB) con <- dbConnect(MariaDB()) dbGetQuery(con, "SELECT 'Hello, world!' AS hi") ``` ``` ## hi ## 1 Hello, world! ``` ```r dbDisconnect(con) ``` ??? Mention SQL dialects --- # Connect to a SQLite database ```r library(RSQLite) con <- dbConnect(SQLite()) dbGetQuery(con, "SELECT 'Hello, world!' AS hi") ``` ``` ## hi ## 1 Hello, world! ``` ```r dbDisconnect(con) ``` --- # _DataBase Interface_ ```r library(DBI) *con <- dbConnect(RSQLite::SQLite()) # con <- dbConnect(RMariaDB::MariaDB()) # con <- dbConnect(RPostgres::Postgres()) dbGetQuery(con, "SELECT 'Hello, world!' AS hi") ``` ``` ## hi ## 1 Hello, world! ``` ```r dbDisconnect(con) ``` ??? Not only syntactic similarity Attach DBI instead of individual database packages, only specify database in `dbConnect()` Use only functions from DBI to work with the database, only refer to database backends when connecting. - S4 classes and methods - method = function defined in DBI --- background-image: url(img/deps.svg) background-size: 100% ??? What is a DBI backend? More packages involved, omitted here ODBC is covered as part of this framework Extra packages - DBI: Function definition - Can use database-specific backends --- background-image: url(img/dbi-old.png) background-size: 100% ??? DBI has been around for ages, almost as old as R itself. --- background-image: url(img/network-cables-cable-patch-patch-cable-46237.jpeg) # Interfaces: syntax vs. semantics ??? DBI 0.x had the "syntax" bit right Subtle differences in semantics between packages - Data types - 64-bit integers - Character encoding - Time zones - Behavior in corner cases (zero rows) - Optional syntax --- class: center <iframe src="https://cran.r-project.org/web/packages/DBI/vignettes/spec.html" width="800" height="600" frameBorder="0" allowFullScreen></iframe> --- class: middle background-image: url(img/checklist.svg) background-size: 36% background-position: 50% 57% .center[ # DBI specification ] .pull-left[ ## Implementers ### Test suite ### Clear expectations ### Boilerplate ] .pull-right[ .right[ ## Users ### Interchangeable backends ### Reliable behavior ### Precise documentation ] ] .center[ ] ??? - Simplify user's life - Only learn one interface - Simplify implementer's life - We can't force DBI backends, but we provide a test suite - Project funded by R Consortium - First two parts done --- class: inverse ??? - Walk through DBI's features - Pointing to new features - Showcase interesting developments --- background-image: url(img/deps.svg) background-size: 65% background-position: 100% 100% # R's DataBase Interface ### ![](img/if_play_alt_118620.png) connect/disconnect ### ![](img/if_search_322497.png) discover ### ![](img/if_arrows-24_808453.png) read/write ### ![](img/if_interface-40_809302.png) query/execute ### ![](img/if_75_111116.png) quote/parametrize ### ![](img/if_git-commit_298786.png) transact ??? --- class: inverse, center, middle # Connecting <iframe src="https://giphy.com/embed/Ap7t9YOLvTv6U" width="720" height="386" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/back-to-the-future-Ap7t9YOLvTv6U">via GIPHY</a></p> ??? First step when working with a database Can be frustrating when things don't work, but needs to be done only once Only thing that is backend-specific: connection parameters --- ```r library(DBI) con_sqlite <- dbConnect( RSQLite::SQLite(), * path = "..." ) con_mariadb <- dbConnect( RMariaDB::MariaDB(), * default.file = "..." ) con_postgres <- dbConnect( RPostgres::Postgres(), * host = "...", * port = "..." ) dbDisconnect(con_postgres) dbDisconnect(con_mariadb) dbDisconnect(con_sqlite) ``` --- background-image: url(img/render-doc.gif) background-size: 100% --- class: inverse, center, middle # Discovering <iframe src="https://giphy.com/embed/26uf6NTG5BpEkXouQ" width="720" height="495" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/hulu-firefly-fox-television-classics-26uf6NTG5BpEkXouQ">via GIPHY</a></p> --- ```r library(DBI) con <- dbConnect(RPostgres::Postgres()) dbListTables(con) ``` ``` ## [1] "geography_columns" "geometry_columns" "spatial_ref_sys" ## [4] "raster_columns" "raster_overviews" "topology" ## [7] "layer" "geom_test" "temp" ``` --- ```r dbListObjects(con) ``` ``` ## table is_prefix ## 1 <Id> table = geography_columns FALSE ## 2 <Id> table = geometry_columns FALSE ## 3 <Id> table = spatial_ref_sys FALSE ## 4 <Id> table = raster_columns FALSE ## 5 <Id> table = raster_overviews FALSE ## 6 <Id> table = topology FALSE ## 7 <Id> table = layer FALSE ## 8 <Id> table = geom_test FALSE ## 9 <Id> table = temp FALSE ## 10 <Id> schema = topology TRUE ## 11 <Id> schema = my_schema TRUE ## 12 <Id> schema = information_schema TRUE ## 13 <Id> schema = pg_catalog TRUE ## 14 <Id> schema = nycflights13 TRUE ## 15 <Id> schema = public TRUE ``` ??? Schema: like a directory in the file system --- ```r flights_obj <- dbListObjects(con, Id(schema = "nycflights13")) flights_obj ``` ``` ## table is_prefix ## 1 <Id> schema = nycflights13, table = airlines FALSE ## 2 <Id> schema = nycflights13, table = airports FALSE ## 3 <Id> schema = nycflights13, table = flights FALSE ## 4 <Id> schema = nycflights13, table = planes FALSE ## 5 <Id> schema = nycflights13, table = weather FALSE ``` ```r flights_obj$table[[3]] ``` ``` ## <Id> schema = nycflights13, table = flights ``` --- background-image: url(img/dbi-issue-table.png) background-size: 100% --- class: center, middle # Reading <iframe src="https://giphy.com/embed/bAPIlJ3w57SjS" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/gif-art-tumblr-featured-bAPIlJ3w57SjS">via GIPHY</a></p> --- ```r spatial_ref_sys <- dbReadTable(con, "spatial_ref_sys") class(spatial_ref_sys) ``` ``` ## [1] "data.frame" ``` ```r tibble::as_tibble(spatial_ref_sys) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 5,435 x 5</span><span> ## </span><span style='font-weight: bold;'>srid</span><span> </span><span style='font-weight: bold;'>auth_name</span><span> </span><span style='font-weight: bold;'>auth_srid</span><span> </span><span style='font-weight: bold;'>srtext</span><span> </span><span style='font-weight: bold;'>proj4text</span><span> ## </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> ## </span><span style='color: #555555;'> 1</span><span> </span><span style='text-decoration: underline;'>3</span><span>819 EPSG </span><span style='text-decoration: underline;'>3</span><span>819 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"HD1909… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'> 2</span><span> </span><span style='text-decoration: underline;'>3</span><span>821 EPSG </span><span style='text-decoration: underline;'>3</span><span>821 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"TWD67\… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'> 3</span><span> </span><span style='text-decoration: underline;'>3</span><span>824 EPSG </span><span style='text-decoration: underline;'>3</span><span>824 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"TWD97\… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'> 4</span><span> </span><span style='text-decoration: underline;'>3</span><span>889 EPSG </span><span style='text-decoration: underline;'>3</span><span>889 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"IGRS\"… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'> 5</span><span> </span><span style='text-decoration: underline;'>3</span><span>906 EPSG </span><span style='text-decoration: underline;'>3</span><span>906 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"MGI 19… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'> 6</span><span> </span><span style='text-decoration: underline;'>4</span><span>001 EPSG </span><span style='text-decoration: underline;'>4</span><span>001 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"Unknow… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'> 7</span><span> </span><span style='text-decoration: underline;'>4</span><span>002 EPSG </span><span style='text-decoration: underline;'>4</span><span>002 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"Unknow… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'> 8</span><span> </span><span style='text-decoration: underline;'>4</span><span>003 EPSG </span><span style='text-decoration: underline;'>4</span><span>003 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"Unknow… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'> 9</span><span> </span><span style='text-decoration: underline;'>4</span><span>004 EPSG </span><span style='text-decoration: underline;'>4</span><span>004 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"Unknow… </span><span style='color: #555555;'>"</span><span>+proj=longlat +e… ## </span><span style='color: #555555;'>10</span><span> </span><span style='text-decoration: underline;'>4</span><span>005 EPSG </span><span style='text-decoration: underline;'>4</span><span>005 </span><span style='color: #555555;'>"</span><span>GEOGCS[\"Unknow… </span><span style='color: #555555;'>"</span><span>+proj=longlat +a… ## </span><span style='color: #555555;'># ... with 5,425 more rows</span><span> </span></CODE></PRE> ??? This presentation is a "pure" presentation --- ```r flights <- dbReadTable( con, * Id(schema = "nycflights13", table = "flights") ) tibble::as_tibble(flights) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 336,776 x 19</span><span> ## </span><span style='font-weight: bold;'>year</span><span> </span><span style='font-weight: bold;'>month</span><span> </span><span style='font-weight: bold;'>day</span><span> </span><span style='font-weight: bold;'>dep_time</span><span> </span><span style='font-weight: bold;'>sched_dep_time</span><span> </span><span style='font-weight: bold;'>dep_delay</span><span> </span><span style='font-weight: bold;'>arr_time</span><span> ## </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> ## </span><span style='color: #555555;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 517 515 2 830 ## </span><span style='color: #555555;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 533 529 4 850 ## </span><span style='color: #555555;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 542 540 2 923 ## </span><span style='color: #555555;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 ## </span><span style='color: #555555;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 ## </span><span style='color: #555555;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 ## </span><span style='color: #555555;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 ## </span><span style='color: #555555;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 ## </span><span style='color: #555555;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 ## </span><span style='color: #555555;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 ## </span><span style='color: #555555;'># ... with 336,766 more rows, and 12 more variables: ## # </span><span style='color: #555555;font-weight: bold;'>sched_arr_time</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><int></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>arr_delay</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>carrier</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, ## # </span><span style='color: #555555;font-weight: bold;'>flight</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><int></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>tailnum</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>origin</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>dest</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, ## # </span><span style='color: #555555;font-weight: bold;'>air_time</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>distance</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>hour</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>minute</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, ## # </span><span style='color: #555555;font-weight: bold;'>time_hour</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- class: inverse, center, middle # Querying <iframe src="https://giphy.com/embed/uHEqSttWHv476" width="480" height="438" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/magic-power-charms-uHEqSttWHv476">via GIPHY</a></p> --- ```r query <- 'SELECT * FROM "nycflights13"."flights" WHERE month = 1' flights <- dbGetQuery(con, query) dim(flights) ``` ``` ## [1] 27004 19 ``` --- class: inverse, center, middle # Quoting <iframe src="https://giphy.com/embed/qs6ev2pm8g9dS" width="720" height="381" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/austin-powers-dr-evil-air-quotes-qs6ev2pm8g9dS">via GIPHY</a></p> ??? Quoting identifiers and literals Dynamic SQL Evil code! --- ```r *month <- 1 query <- paste0( 'SELECT * FROM "nycflights13"."flights" WHERE month = ', month ) cat(query) ``` ``` ## SELECT * FROM "nycflights13"."flights" WHERE month = 1 ``` --- ```r month <- 1 query <- paste0( 'SELECT * FROM "nycflights13"."flights" WHERE month = ', * dbQuoteLiteral(con, month) ) cat(query) ``` ``` ## SELECT * FROM "nycflights13"."flights" WHERE month = 1::float8 ``` --- ```r *month_var <- "month" month <- 1 query <- paste0( 'SELECT * FROM "nycflights13"."flights" WHERE ', month_var, ' = ', dbQuoteLiteral(con, month) ) cat(query) ``` ``` ## SELECT * FROM "nycflights13"."flights" WHERE month = 1::float8 ``` --- ```r month_var <- "month" month <- 1 query <- paste0( 'SELECT * FROM "nycflights13"."flights" WHERE ', * dbQuoteIdentifier(con, month_var), ' = ', dbQuoteLiteral(con, month) ) cat(query) ``` ``` ## SELECT * FROM "nycflights13"."flights" WHERE "month" = 1::float8 ``` --- ```r *table_name <- Id(schema = "nycflights13", table = "flights") month_var <- "month" month <- 1 query <- paste0( 'SELECT * FROM ', * dbQuoteIdentifier(con, table_name), ' WHERE ', dbQuoteIdentifier(con, month_var), ' = ', dbQuoteLiteral(con, month) ) cat(query) ``` ``` ## SELECT * FROM "nycflights13"."flights" WHERE "month" = 1::float8 ``` ```r flights <- dbGetQuery(con, query) dim(flights) ``` ``` ## [1] 27004 19 ``` ??? Much more interesting when quoting strings, datetime, ... --- ```r table_name <- Id(schema = "nycflights13", table = "flights") month_var <- "month" month <- 1 query <- paste0( 'SELECT * FROM ', dbQuoteIdentifier(con, table_name), ' WHERE ', dbQuoteIdentifier(con, month_var), * ' = $1' ) cat(query) ``` ``` ## SELECT * FROM "nycflights13"."flights" WHERE "month" = $1 ``` ```r flights <- dbGetQuery( con, query, * params = list(1) ) dim(flights) ``` ``` ## [1] 27004 19 ``` --- class: middle, center <iframe src="https://giphy.com/embed/xUPGcvE4VNePqiexOM" width="720" height="576" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/nope-stop-motion-claymation-xUPGcvE4VNePqiexOM">via GIPHY</a></p> --- ```r library(dplyr) *library(dbplyr) src <- src_dbi(con) flights <- tbl(con, ident_q('"nycflights13"."flights"')) flights ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># Source: table<"nycflights13"."flights"> [?? x 19]</span><span> ## </span><span style='color: #555555;'># Database: postgres [kirill@/var/run/postgresql:5432/kirill]</span><span> ## </span><span style='font-weight: bold;'>year</span><span> </span><span style='font-weight: bold;'>month</span><span> </span><span style='font-weight: bold;'>day</span><span> </span><span style='font-weight: bold;'>dep_time</span><span> </span><span style='font-weight: bold;'>sched_dep_time</span><span> </span><span style='font-weight: bold;'>dep_delay</span><span> </span><span style='font-weight: bold;'>arr_time</span><span> ## </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> ## </span><span style='color: #555555;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 517 515 2 830 ## </span><span style='color: #555555;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 533 529 4 850 ## </span><span style='color: #555555;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 542 540 2 923 ## </span><span style='color: #555555;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 ## </span><span style='color: #555555;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 ## </span><span style='color: #555555;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 ## </span><span style='color: #555555;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 ## </span><span style='color: #555555;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 ## </span><span style='color: #555555;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 ## </span><span style='color: #555555;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 ## </span><span style='color: #555555;'># ... with more rows, and 12 more variables: ## # </span><span style='color: #555555;font-weight: bold;'>sched_arr_time</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><int></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>arr_delay</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>carrier</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, ## # </span><span style='color: #555555;font-weight: bold;'>flight</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><int></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>tailnum</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>origin</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>dest</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, ## # </span><span style='color: #555555;font-weight: bold;'>air_time</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>distance</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>hour</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, </span><span style='color: #555555;font-weight: bold;'>minute</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, ## # </span><span style='color: #555555;font-weight: bold;'>time_hour</span><span style='color: #555555;'> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ```r flights %>% select(year, month, day, ends_with("delay")) %>% filter(month == 2) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># Source: lazy query [?? x 5]</span><span> ## </span><span style='color: #555555;'># Database: postgres [kirill@/var/run/postgresql:5432/kirill]</span><span> ## </span><span style='font-weight: bold;'>year</span><span> </span><span style='font-weight: bold;'>month</span><span> </span><span style='font-weight: bold;'>day</span><span> </span><span style='font-weight: bold;'>dep_delay</span><span> </span><span style='font-weight: bold;'>arr_delay</span><span> ## </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> ## </span><span style='color: #555555;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>4</span><span> 4 ## </span><span style='color: #555555;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>5</span><span> -</span><span style='color: #BB0000;'>4</span><span> ## </span><span style='color: #555555;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>3</span><span> 8 ## </span><span style='color: #555555;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>8</span><span> -</span><span style='color: #BB0000;'>10</span><span> ## </span><span style='color: #555555;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 0 9 ## </span><span style='color: #555555;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>8</span><span> -</span><span style='color: #BB0000;'>1</span><span> ## </span><span style='color: #555555;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>8</span><span> 9 ## </span><span style='color: #555555;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>8</span><span> -</span><span style='color: #BB0000;'>14</span><span> ## </span><span style='color: #555555;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>7</span><span> 18 ## </span><span style='color: #555555;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 1 -</span><span style='color: #BB0000;'>7</span><span> -</span><span style='color: #BB0000;'>4</span><span> ## </span><span style='color: #555555;'># … with more rows</span><span> </span></CODE></PRE> --- ```r flights %>% group_by(year, month, day) %>% summarize(mean(arr_delay)) ``` ``` ## Warning: Missing values are always removed in SQL. ## Use `AVG(x, na.rm = TRUE)` to silence this warning ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># Source: lazy query [?? x 4]</span><span> ## </span><span style='color: #555555;'># Database: postgres [kirill@/var/run/postgresql:5432/kirill]</span><span> ## </span><span style='color: #555555;'># Groups: year, month</span><span> ## </span><span style='font-weight: bold;'>year</span><span> </span><span style='font-weight: bold;'>month</span><span> </span><span style='font-weight: bold;'>day</span><span> </span><span style='font-weight: bold;'>`mean(arr_delay)`</span><span> ## </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> ## </span><span style='color: #555555;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 12.7 ## </span><span style='color: #555555;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 2 12.7 ## </span><span style='color: #555555;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 3 5.73 ## </span><span style='color: #555555;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 4 -</span><span style='color: #BB0000;'>1.93</span><span> ## </span><span style='color: #555555;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 5 -</span><span style='color: #BB0000;'>1.53</span><span> ## </span><span style='color: #555555;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 6 4.24 ## </span><span style='color: #555555;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 7 -</span><span style='color: #BB0000;'>4.95</span><span> ## </span><span style='color: #555555;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 8 -</span><span style='color: #BB0000;'>3.23</span><span> ## </span><span style='color: #555555;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 9 -</span><span style='color: #BB0000;'>0.264</span><span> ## </span><span style='color: #555555;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 10 -</span><span style='color: #BB0000;'>5.90</span><span> ## </span><span style='color: #555555;'># … with more rows</span><span> </span></CODE></PRE> --- ```r flights %>% group_by(year, month, day) %>% summarize(mean_delay = mean(arr_delay, na.rm = TRUE)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># Source: lazy query [?? x 4]</span><span> ## </span><span style='color: #555555;'># Database: postgres [kirill@/var/run/postgresql:5432/kirill]</span><span> ## </span><span style='color: #555555;'># Groups: year, month</span><span> ## </span><span style='font-weight: bold;'>year</span><span> </span><span style='font-weight: bold;'>month</span><span> </span><span style='font-weight: bold;'>day</span><span> </span><span style='font-weight: bold;'>mean_delay</span><span> ## </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> ## </span><span style='color: #555555;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 12.7 ## </span><span style='color: #555555;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 2 12.7 ## </span><span style='color: #555555;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 3 5.73 ## </span><span style='color: #555555;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 4 -</span><span style='color: #BB0000;'>1.93</span><span> ## </span><span style='color: #555555;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 5 -</span><span style='color: #BB0000;'>1.53</span><span> ## </span><span style='color: #555555;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 6 4.24 ## </span><span style='color: #555555;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 7 -</span><span style='color: #BB0000;'>4.95</span><span> ## </span><span style='color: #555555;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 8 -</span><span style='color: #BB0000;'>3.23</span><span> ## </span><span style='color: #555555;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 9 -</span><span style='color: #BB0000;'>0.264</span><span> ## </span><span style='color: #555555;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 10 -</span><span style='color: #BB0000;'>5.90</span><span> ## </span><span style='color: #555555;'># … with more rows</span><span> </span></CODE></PRE> --- ```r flights %>% group_by(year, month, day) %>% summarize(mean_delay = mean(arr_delay, na.rm = TRUE)) %>% * sql_render() ``` ``` ## <SQL> SELECT "year", "month", "day", AVG("arr_delay") AS "mean_delay" ## FROM "nycflights13"."flights" ## GROUP BY "year", "month", "day" ``` --- ```r flights %>% group_by(year, month, day) %>% summarize(mean_delay = mean(arr_delay, na.rm = TRUE)) %>% * collect() ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 365 x 4</span><span> ## </span><span style='color: #555555;'># Groups: year, month [12]</span><span> ## </span><span style='font-weight: bold;'>year</span><span> </span><span style='font-weight: bold;'>month</span><span> </span><span style='font-weight: bold;'>day</span><span> </span><span style='font-weight: bold;'>mean_delay</span><span> ## </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><int></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> ## </span><span style='color: #555555;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 9 6.64 ## </span><span style='color: #555555;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 5 18 -</span><span style='color: #BB0000;'>10.7</span><span> ## </span><span style='color: #555555;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 8 20 -</span><span style='color: #BB0000;'>8.21</span><span> ## </span><span style='color: #555555;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 24 1.09 ## </span><span style='color: #555555;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 4 27 -</span><span style='color: #BB0000;'>12.0</span><span> ## </span><span style='color: #555555;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 31 32.6 ## </span><span style='color: #555555;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 11 15 -</span><span style='color: #BB0000;'>4.68</span><span> ## </span><span style='color: #555555;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 11 4 3.46 ## </span><span style='color: #555555;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 10 19 -</span><span style='color: #BB0000;'>1.08</span><span> ## </span><span style='color: #555555;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 12 19 9.01 ## </span><span style='color: #555555;'># … with 355 more rows</span><span> </span></CODE></PRE> --- ```r model <- lm(arr_delay ~ factor(origin) + dep_delay, nycflights13::flights) summary(model) ``` ``` ## ## Call: ## lm(formula = arr_delay ~ factor(origin) + dep_delay, data = nycflights13::flights) ## ## Residuals: ## Min 1Q Median 3Q Max ## -107.429 -11.042 -1.684 8.804 202.748 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -6.1975535 0.0539274 -114.924 < 2e-16 *** ## factor(origin)JFK -0.5112923 0.0758116 -6.744 1.54e-11 *** ## factor(origin)LGA 1.4919371 0.0773890 19.278 < 2e-16 *** ## dep_delay 1.0196879 0.0007865 1296.428 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 18.01 on 327342 degrees of freedom ## (9430 observations deleted due to missingness) ## Multiple R-squared: 0.8372, Adjusted R-squared: 0.8372 ## F-statistic: 5.612e+05 on 3 and 327342 DF, p-value: < 2.2e-16 ``` --- ```r library(tidypredict) *sql <- tidypredict_sql(model, con) sql ``` ``` ## <SQL> -6.19755349842104 + (CASE WHEN ("origin" = 'JFK') THEN (1.0) WHEN NOT("origin" = 'JFK') THEN (0.0) END) * (-0.511292266035181) + (CASE WHEN ("origin" = 'LGA') THEN (1.0) WHEN NOT("origin" = 'LGA') THEN (0.0) END) * (1.49193714231315) + ("dep_delay") * (1.01968794076906) ``` ```r cat(strwrap(sql), sep = "\n") ``` ``` ## -6.19755349842104 + (CASE WHEN ("origin" = 'JFK') THEN ## (1.0) WHEN NOT("origin" = 'JFK') THEN (0.0) END) * ## (-0.511292266035181) + (CASE WHEN ("origin" = 'LGA') THEN ## (1.0) WHEN NOT("origin" = 'LGA') THEN (0.0) END) * ## (1.49193714231315) + ("dep_delay") * (1.01968794076906) ``` --- ### _rquery_ Alternative for generating SQL queries ### _dbx_ Useful DML operations (update, upsert, ...) ### _sqlr_ Database-agnostic DDL ### _arkdb_ Database-agnostic import and export ### _sf_ (Support for spatial databases) ### *bigrquery*, *civis*, *MonetDBLite*, *odbc*, *RClickhouse*, ... (Other backends) ### ... ? --- class: inverse, middle, center # Consistent ## Interface to # Work with databases in R ??? Thank contributors! FIXME: Word cloud --- background-image: url(img/contributors.svg) background-size: 130% --- class: inverse, middle, center <iframe src="https://giphy.com/embed/du578VUmFzEkiT6eTl" width="720" height="588" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/nfl-allen-buffalo-bills-josh-du578VUmFzEkiT6eTl">via GIPHY</a></p> ??? Become a contributor today! --- class: middle ```r con <- RPostgres::Postgres()@connect() con@get_query("SELECT 'Hello, world!' AS hi") ``` --- background-image: url(img/RConsortium_Horizontal_Pantone.png) background-size: 80% class: bottom, center # https://bit.ly/2wFJix3 ## https://r-dbi.org ### @krlmlr --- class: inverse --- class: inverse, center # Writing and executing --- ```r dbWriteTable(con, "spatial_ref_sys", spatial_ref_sys) dbCreateTable(con, "spatial_ref_sys", spatial_ref_sys) dbAppendTable(con, "spatial_ref_sys", spatial_ref_sys) ``` --- ```r dbExecute(con, "DROP TABLE ...") dbBegin(con) dbExecute(con, "UPDATE TABLE ...") dbExecute(con, "INSERT TABLE ...") dbCommit(con) dbWithTransaction( con, { dbExecute(con, "UPDATE TABLE ...") dbExecute(con, "INSERT TABLE ...") } ) ```