Insert, update or remove rows in a database dm

Introduction

This tutorial introduces the methods {dm} provides for modifying the data in the tables of a relational model. There are 6 methods:

dm_rows_insert() - adds new rows
dm_rows_update() - changes values in rows
dm_rows_patch() - fills in missing values
dm_rows_upsert() - adds new rows or changes values if pre-existing
dm_rows_delete() - deletes rows
dm_rows_truncate() - removes all rows, leaving table structure intact

The dm_rows_* process

All six methods take the same arguments and using them follows the same process:

Create a temporary changeset dm that defines the intended changes on the RDBMS
If desired, simulate changes with in_place = FALSE to double-check
Apply changes with in_place = TRUE.

To start, a dm object is created containing the tables, and rows, that you want to change. This changeset dm is then copied into the same source as the dm you want to modify. With the dm in the same RDBMS as the destination dm, you call the appropriate method, such as dm_rows_insert(), to make your planned changes, along with an argument of in_place = FALSE so you can confirm you achieve the changes that you want.

This verification can be done visually, looking at row counts and the like, or using {dm}’s constraint checking method, dm_examine_constraints(). The biggest danger is damaging key relations between data spread across multiple tables by deleting or duplicating rows and their keys. dm_examine_constraints() will catch errors where primary keys are duplicated or foreign keys do not have a matching primary key (unless the foreign key value is NA).

With the changes confirmed, you execute the method again, this time with the argument in_place = TRUE to make the changes permanent. Note that in_place = FALSE is the default: you must opt in to actually change data on the database.

Each method has its own requirements in order to maintain database consistency. These involve constraints on primary key values as they are how rows are identified.

Method	Requirements
`dm_rows_insert()`	The primary keys must differ from existing records.
`dm_rows_update()`	Primary keys must match for all records to be updated.
`dm_rows_patch()`	Updates missing values in existing records. Primary keys must match for all records to be patched.
`dm_rows_upsert()`	Updates existing records and adds new records, based on the primary key.
`dm_rows_delete()`	Removes matching records based on the primary key.
`dm_rows_truncate()`	Removes all records, only for tables in the changeset dm.

To ensure the integrity of all relations during the process, all methods automatically determine the correct processing order for the tables involved. For operations that create records, parent tables are processed before child tables. For dm_rows_delete() and dm_rows_truncate(), child tables are processed before their parent tables. For more details on this see vignette("howto-dm-theory") and vignette("howto-dm-db").

Usage

To demonstrate the use of these table modifying methods we will create a simple dm object with two tables linked by a foreign key. Note the foreign key of NA in the child table.

library(tidyverse)

#> ── Attaching packages ────────────────────────────────── tidyverse 1.3.0 ──

#> ✔ ggplot2 3.3.3     ✔ purrr   0.3.4
#> ✔ tibble  3.0.6     ✔ dplyr   1.0.4
#> ✔ tidyr   1.1.2     ✔ stringr 1.4.0
#> ✔ readr   1.4.0     ✔ forcats 0.5.1

#> ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks dm::filter(), stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

library(dm)
parent <- tibble(value = c("A", "B", "C"), pk = 1:3)
parent

#> # A tibble: 3 x 2
#>   value    pk
#>   <chr> <int>
#> 1 A         1
#> 2 B         2
#> 3 C         3

child <- tibble(value = c("a", "b", "c"), pk = 1:3, fk = c(1, 1, NA))
child

#> # A tibble: 3 x 3
#>   value    pk    fk
#>   <chr> <int> <dbl>
#> 1 a         1     1
#> 2 b         2     1
#> 3 c         3    NA

demo_dm <-
  dm(parent = parent, child = child) %>%
  dm_add_pk(parent, pk) %>%
  dm_add_pk(child, pk) %>%
  dm_add_fk(child, fk, parent)

demo_dm %>% 
  dm_draw(view_type = "full")

{dm} doesn’t check your key values when you create a dm, we add this check:¹

dm_examine_constraints(demo_dm)

#> ℹ All constraints satisfied.

Then we copy demo_dm into an SQLite database. Note: the default for the method used, copy_dm_to(), is to create temporary tables that will be automatically deleted when your session ends. As demo_sql will be the destination dm for the examples, the argument temporary = FALSE is used to make this distinction apparent.

library(DBI)
sqlite_db <- DBI::dbConnect(RSQLite::SQLite())
demo_sql <- copy_dm_to(sqlite_db, demo_dm, temporary = FALSE)
demo_sql

#> ── Table source ───────────────────────────────────────────────────────────
#> src:  sqlite 3.34.1 []
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `parent`, `child`
#> Columns: 5
#> Primary keys: 2
#> Foreign keys: 1

{dm}’s table modification methods can be piped together to create a repeatable sequence of operations that returns a dm incorporating all the changes required. This is a common use case for {dm} – building by hand a sequence of operations using temporary results until it is complete and correct, then committing the result.

`dm_rows_insert()`

To demonstrate dm_rows_insert() we create a dm with tables containing the rows to insert and copy it to sqlite_db, the same source as demo_sql. For all of the dm_rows_* methods the source and destination dm objects must be in the same RDBMS. You will get an error message if this is not the case.

The code below adds parent and child table entries for the letter “D”. First, the changeset dm is created and temporarily copied to the database:

new_parent <- tibble(value = "D", pk = 4)
new_parent

#> # A tibble: 1 x 2
#>   value    pk
#>   <chr> <dbl>
#> 1 D         4

new_child <- tibble(value = "d", pk = 4, fk = 4)
new_child

#> # A tibble: 1 x 3
#>   value    pk    fk
#>   <chr> <dbl> <dbl>
#> 1 d         4     4

dm_insert_in <-
  dm(parent = new_parent, child = new_child) %>%
  copy_dm_to(sqlite_db, ., temporary = TRUE)

The changeset dm is then used as an argument to dm_rows_insert().

dm_insert_out <-
  demo_sql %>%
  dm_rows_insert(dm_insert_in)
#> Not persisting, use `in_place = FALSE` to turn off this message.

This gives us a warning that changes will not be persisted. Inspecting the child table of the resulting dm_insert_out and demo_sql, we can see that’s exactly what happened. {dm} returned to us a dm object with our inserted rows in place, but the underlying database has not changed.

dm_insert_out$child

#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.34.1 []
#>   value    pk    fk
#>   <chr> <dbl> <dbl>
#> 1 a         1     1
#> 2 b         2     1
#> 3 c         3    NA
#> 4 d         4     4

demo_sql$child

#> # Source:   table<`child`> [?? x 3]
#> # Database: sqlite 3.34.1 []
#>   value    pk    fk
#>   <chr> <int> <dbl>
#> 1 a         1     1
#> 2 b         2     1
#> 3 c         3    NA

We repeat the operation, this time with the argument in_place = TRUE and the changes now persist in demo_sql.

dm_insert_out <-
  demo_sql %>%
  dm_rows_insert(dm_insert_in, in_place = TRUE)

demo_sql$child

#> # Source:   table<`child`> [?? x 3]
#> # Database: sqlite 3.34.1 []
#>   value    pk    fk
#>   <chr> <int> <dbl>
#> 1 a         1     1
#> 2 b         2     1
#> 3 c         3    NA
#> 4 d         4     4

`dm_rows_update()`

dm_rows_update() works the same as dm_rows_insert(). We create the dm object and copy it to the same source as the destination. Here we will change the foreign key for the row in child containing “b” to point to the correct row in parent. And we will persist the changes.

updated_child <- tibble(value = "b", pk = 2, fk = 2)
updated_child

#> # A tibble: 1 x 3
#>   value    pk    fk
#>   <chr> <dbl> <dbl>
#> 1 b         2     2

dm_update_in <-
  dm(child = updated_child) %>%
  copy_dm_to(sqlite_db, ., temporary = TRUE)

dm_update_out <-
  demo_sql %>%
  dm_rows_update(dm_update_in, in_place = TRUE)

demo_sql$child

#> # Source:   table<`child`> [?? x 3]
#> # Database: sqlite 3.34.1 []
#>   value    pk    fk
#>   <chr> <int> <dbl>
#> 1 a         1     1
#> 2 b         2     2
#> 3 c         3    NA
#> 4 d         4     4

`dm_rows_delete()`

dm_rows_delete() is not currently implemented to work with an RDBMS, so we will shift our demonstrations back to the local R environment. We’ve made changes to demo_sql so we use collect() to copy the current tables out of SQLite. Note that persistence is not a concern with local dm objects. Every operation returns a new dm object containing the changes made.

local_dm <- collect(demo_sql)

local_dm$parent

#> # A tibble: 4 x 2
#>   value    pk
#>   <chr> <int>
#> 1 A         1
#> 2 B         2
#> 3 C         3
#> 4 D         4

local_dm$child

#> # A tibble: 4 x 3
#>   value    pk    fk
#>   <chr> <int> <dbl>
#> 1 a         1     1
#> 2 b         2     2
#> 3 c         3    NA
#> 4 d         4     4

dm_deleted <-
  dm(parent = new_parent, child = new_child) %>%
  dm_rows_delete(local_dm, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.
#> Ignoring extra columns: value, fk
#> Ignoring extra columns: value

dm_deleted$child

#> # A tibble: 3 x 3
#>   value    pk    fk
#>   <chr> <int> <dbl>
#> 1 a         1     1
#> 2 b         2     2
#> 3 c         3    NA

`dm_rows_patch()`

dm_rows_patch() updates missing values in existing records. We use it here to fix the missing foreign key in the child table.

patched_child <- tibble(value = "c", pk = 3, fk = 3)
patched_child

#> # A tibble: 1 x 3
#>   value    pk    fk
#>   <chr> <dbl> <dbl>
#> 1 c         3     3

dm_patched <-
  dm(child = patched_child) %>%
  dm_rows_patch(dm_deleted, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.

dm_patched$child

#> # A tibble: 3 x 3
#>   value    pk    fk
#>   <chr> <int> <dbl>
#> 1 a         1     1
#> 2 b         2     2
#> 3 c         3     3

`dm_rows_upsert()`

dm_rows_upsert() updates rows with supplied values if they exist or inserts the supplied values as new rows if they don’t. In this example we add the letter “D” back to our dm, and update the foreign key for “b”.

upserted_parent <- tibble(value = "D", pk = 4)
upserted_parent

#> # A tibble: 1 x 2
#>   value    pk
#>   <chr> <dbl>
#> 1 D         4

upserted_child <- tibble(value = c("b", "d"), pk = c(2, 4), fk = c(3, 4))
upserted_child

#> # A tibble: 2 x 3
#>   value    pk    fk
#>   <chr> <dbl> <dbl>
#> 1 b         2     3
#> 2 d         4     4

dm_upserted <-
  dm(parent = upserted_parent, child = upserted_child) %>%
  dm_rows_upsert(dm_patched, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.

dm_upserted$parent

#> # A tibble: 4 x 2
#>   value    pk
#>   <chr> <dbl>
#> 1 A         1
#> 2 B         2
#> 3 C         3
#> 4 D         4

dm_upserted$child

#> # A tibble: 4 x 3
#>   value    pk    fk
#>   <chr> <dbl> <dbl>
#> 1 a         1     1
#> 2 b         2     3
#> 3 c         3     3
#> 4 d         4     4

`dm_rows_truncate()`

dm_rows_truncate() deletes all the rows in a table while leaving all other related information intact, including column names, column types, and key relations. The function derives its name from the SQL TRUNCATE TABLE statement, so we will return to our SQLite database to demonstrate its use. The example below truncates only the child table. Note how a modified version of the destination dm is used as “changeset dm”: the rows in the changeset dm do not matter here.

dm_trunc_in <-
  demo_sql %>%
  dm_select_tbl(child)
dm_trunc_in

#> ── Table source ───────────────────────────────────────────────────────────
#> src:  sqlite 3.34.1 []
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `child`
#> Columns: 3
#> Primary keys: 1
#> Foreign keys: 0

dm_trunc_out <-
  demo_sql %>%
  dm_rows_truncate(dm_trunc_in, in_place = TRUE)

demo_sql$child

#> # Source:   table<`child`> [?? x 3]
#> # Database: sqlite 3.34.1 []
#> # … with 3 variables: value <chr>, pk <int>, fk <dbl>

Conclusion

The dm_rows_* methods give you row-level granularity over the modifications you need to make to your relational model. By using the in_place argument they all share you can construct and verify your modifications before committing them. There are a few limitations, as mentioned in the tutorial, but these will be addressed in future updates to {dm}.

Next Steps

If this tutorial answered some questions, but opened others, these resources might be of assistance.

Is your data in an RDBMS? vignette("howto-dm-db") offers a detailed look at working with an existing relational data model.

If your data is in data frames, then you may want to read vignette("howto-dm-df") next.

If you feel you need to know more about relational data models in order to get the most out of dm, check out vignette("howto-dm-theory").

If you’re familiar with relational data models but want to know how to work with them in dm, then any of vignette("tech-dm-join"), vignette("tech-dm-filter"), or vignette("tech-dm-zoom") is a good next step.

Be aware that when using dm_examine_constraints() NULL (NA) foreign keys are allowed and will be counted as a match. In some cases this doesn’t make sense and non-NULL columns should be enforced by the RDBMS. Currently {dm} does not specify or check non-NULL constraints for columns.↩︎

James Wondrasek

2021-02-14