class: center, middle, inverse, title-slide # Connecting to open-source databases ### Kirill Müller,
@krlmlr
### 2018-02-02 --- background-image: url(img/deps.svg) background-size: 70% background-position: 100% 100% # R's DataBase Interface ### ![](img/if_play_alt_118620.png) connect/disconnect ### ![](img/if_arrows-24_808453.png) read/write ### ![](img/if_interface-40_809302.png) execute ### ![](img/if_interface-39_809306.png) parameters ### ![](img/if_git-commit_298786.png) transactions ??? `rstudio::conf(2018)`, San Diego - Project funded by R Consortium - First two parts done, third part *almost* done - what is DBI - idea: common S4 interface for accessing databases - implement interface: wrap specific DBMS library - use interface: retrieve data from a database, write to database - has been around since 2001, current version 0.7 - quite a few packages that implement and use this interface - odbc (Jim), implyr (Ian) - loose specification - corner cases --- class: middle background-image: url(img/hand-1311786_640.png) background-size: 25% .center[ # Explicit specification of behavior ] .pull-left[ ## Implementers ### Test suite ### Clear expectations ### Boilerplate ] .pull-right[ .right[ ## Users ### Reliable behavior ### Better documentation ### Interchangeable backends ] ] .center[ ] ??? - Interface specification needs an unambiguous description of the classes, methods, and their behavior - Ongoing work, mostly done - Simplify user's life - Simplify implementer's life --- background-image: url(img/deps-dbi.svg) background-size: 50% background-position: 100% 100% # Recent improvements ### Documentation updates ### Quoting literals ### Browsing schemas ### Constructing identifiers ??? - stable - quoting of identifiers: friction - queries vs. statements: warnings - parametrized: robust generation --- background-image: url(img/render-doc.gif) background-size: 100% --- # Quoting literals ```r library(RPostgres) con <- dbConnect(Postgres()) *dbQuoteLiteral(con, 1.5) ``` ``` ## <SQL> 1.5::float8 ``` ```r *dbQuoteLiteral(con, "string") ``` ``` ## <SQL> 'string' ``` ```r *dbQuoteLiteral(con, Sys.time()) ``` ``` ## <SQL> '2018-02-07 00:02:20'::timestamp ``` ```r dbDisconnect(con) ``` ??? - How do I communicate a constant to the database? --- # Quoting literals ```r library(RMariaDB) *con <- dbConnect(MariaDB()) dbQuoteLiteral(con, 1.5) ``` ``` ## <SQL> 1.5 ``` ```r dbQuoteLiteral(con, "string") ``` ``` ## <SQL> 'string' ``` ```r dbQuoteLiteral(con, Sys.time()) ``` ``` ## <SQL> '20180206230221' ``` ```r dbDisconnect(con) ``` ??? - How do I communicate a constant to the database? --- # Browsing schemas ```r con <- dbConnect(Postgres()) *dbListObjects(con) %>% as_tibble() ``` ``` ## # A tibble: 15 x 3 ## table id is_prefix ## <I(list)> <S4: SQL> <lgl> ## 1 <S4: Id> '"oylxnqbetu"' FALSE ## 2 <S4: Id> '"largetable"' FALSE ## 3 <S4: Id> '"kzoxivxdyr"' FALSE ## 4 <S4: Id> '"foo"' FALSE ## 5 <S4: Id> '"asbwgbmlva"' FALSE ## 6 <S4: Id> '"coaleofwrw"' FALSE ## 7 <S4: Id> '"shopmapping"' FALSE ## 8 <S4: Id> '"lsizzrwaup"' FALSE ## 9 <S4: Id> '"weather".' TRUE ## 10 <S4: Id> '"information_schema".' TRUE ## 11 <S4: Id> '"pg_catalog".' TRUE ## 12 <S4: Id> '"weather_test".' TRUE ## 13 <S4: Id> '"public".' TRUE ## 14 <S4: Id> '"nycflights13".' TRUE ## 15 <S4: Id> '"dbo".' TRUE ``` ```r dbDisconnect(con) ``` ??? - Look up objects --- # Browsing schemas ```r con <- dbConnect(Postgres()) *id <- Id(schema = "nycflights13") *dbListObjects(con, id) %>% as_tibble() ``` ``` ## # A tibble: 5 x 3 ## table id is_prefix ## <I(list)> <S4: SQL> <lgl> ## 1 <S4: Id> '"nycflights13"."airlines"' FALSE ## 2 <S4: Id> '"nycflights13"."airports"' FALSE ## 3 <S4: Id> '"nycflights13"."flights"' FALSE ## 4 <S4: Id> '"nycflights13"."planes"' FALSE ## 5 <S4: Id> '"nycflights13"."weather"' FALSE ``` ```r dbDisconnect(con) ``` ??? - Look up objects in schemas --- # Constructing identifiers ```r con <- dbConnect(Postgres()) *(id <- Id(schema = "nycflights13", table = "flights")) ``` ``` ## <Table> schema = nycflights13, table = flights ``` ```r (quoted <- dbQuoteIdentifier(con, id)) ``` ``` ## <SQL> "nycflights13"."flights" ``` ```r dbUnquoteIdentifier(con, quoted) ``` ``` ## [[1]] ## <Table> schema = nycflights13, table = flights ``` ```r dbDisconnect(con) ``` ??? - If you know the name of an object --- # Using identifiers ```r con <- dbConnect(Postgres()) id <- Id(schema = "nycflights13", table = "flights") *dbReadTable(con, id) %>% as_tibble() ``` ``` ## # A tibble: 336,776 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> ## 1 2013 1 1 517 515 2.00 830 ## 2 2013 1 1 533 529 4.00 850 ## 3 2013 1 1 542 540 2.00 923 ## 4 2013 1 1 544 545 -1.00 1004 ## 5 2013 1 1 554 600 -6.00 812 ## 6 2013 1 1 554 558 -4.00 740 ## 7 2013 1 1 555 600 -5.00 913 ## 8 2013 1 1 557 600 -3.00 709 ## 9 2013 1 1 557 600 -3.00 838 ## 10 2013 1 1 558 600 -2.00 753 ## # ... with 336,766 more rows, and 12 more variables: ## # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, ## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, ## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, ## # time_hour <dttm> ``` ```r dbDisconnect(con) ``` ??? - If you know the name of an object --- background-image: url(img/deps-non-dbi.svg) background-size: 100% ??? - packages created or updated as part of the project - green: done - yellow: stable, work in progress - red: todo - dotted: "Suggests" - walk through the different packages --- background-image: url(img/deps-dbitest.svg) background-size: 50% background-position: 100% 100% # Specifying method behavior ```r #' @return #' `dbSendQuery()` returns res <- expect_visible(dbSendQuery(con, "SELECT 1")) #' an S4 object that inherits from [DBIResult-class]. expect_s4_class(res, "DBIResult") #' The result set can be used with [dbFetch()] to extract records. expect_equal(check_df(dbFetch(res))[[1]], 1) #' Once you have finished using a result, make sure to clear it #' with [dbClearResult()]. dbClearResult(res) ``` ??? - explain how these sources define documentation, spec, and tests - keep test close to the specs: update specs, update text --- background-image: url(img/doc.png) background-size: cover ??? - update documentation of all DBI methods --- background-image: url(img/spec-1.png) background-size: cover ??? - DBI specification as vignette - example shows `dbSendQuery()` - similar to previous slide, no coincidence - want to keep all sources for documentation in one place --- background-image: url(img/spec-2.png) background-size: cover ??? - Value and Specification section: detailed description of behavior - available in both method doc and spec vignette - actually, there's a test for each of these assertions --- background-image: url(img/deps-rsqlite.svg) background-size: 50% background-position: 100% 100% class: middle .large[ ### File-based database ### Supports different data types per column ### ~130 downstream dependencies ] ??? - File-based, popular on Bioconductor - One of very few RDBMS that support different data types in a column - 64-bit data - On-the-fly conversion to `integer64` - Revdep checks - proxy of what may go wrong in user scripts - `row.names = 1` - ```name = "`table.name`"``` - performance problems - compatibility tweaks - fully DBI compliant release without compatibility tweaks after 5 months - Lesson learned: don't --- background-image: url(img/deps-rmariadb-rpostgres.svg) background-size: 50% background-position: 100% 100% class: middle .large[ ### Server-based databases ### New packages ### Shared codebase ] ??? --- background-image: url(img/deps-boilerplate.svg) background-size: 50% background-position: 100% 100% class: middle .large[ ### Use this to implement a new DBI backend from scratch! ### *bigrquery*, *civis*, *MonetDBLite*, *odbc*, *RClickhouse*, ... ] --- background-image: url(img/deps-helper.svg) background-size: 50% background-position: 100% 100% class: middle .large[ ### Simple S3 classes for new data types ] ??? Data types that are common in database engines but not so common in R --- background-image: url(img/deps-todo.svg) background-size: 50% background-position: 100% 100% class: middle .large[ ### Ongoing maintenance ### Onboarding ### Test coverage ### Hooks for data types ] ??? --- background-image: url(img/RConsortium_Horizontal_Pantone.png) background-size: 80% class: bottom, center # https://goo.gl/8sJLLp ## https://r-dbi.org ### @krlmlr