vignettes/howto-dm-rows.Rmd
howto-dm-rows.Rmd
This tutorial introduces the methods {dm} provides for modifying the data in the tables of a relational model. There are 6 methods:
dm_rows_insert()
- adds new rowsdm_rows_update()
- changes values in rowsdm_rows_patch()
- fills in missing valuesdm_rows_upsert()
- adds new rows or changes values if pre-existingdm_rows_delete()
- deletes rowsdm_rows_truncate()
- removes all rows, leaving table structure intactAll six methods take the same arguments and using them follows the same process:
in_place = FALSE
to double-checkin_place = TRUE
.To start, a dm object is created containing the tables, and rows, that you want to change. This changeset dm is then copied into the same source as the dm you want to modify. With the dm in the same RDBMS as the destination dm, you call the appropriate method, such as dm_rows_insert()
, to make your planned changes, along with an argument of in_place = FALSE
so you can confirm you achieve the changes that you want.
This verification can be done visually, looking at row counts and the like, or using {dm}’s constraint checking method, dm_examine_constraints()
. The biggest danger is damaging key relations between data spread across multiple tables by deleting or duplicating rows and their keys. dm_examine_constraints()
will catch errors where primary keys are duplicated or foreign keys do not have a matching primary key (unless the foreign key value is NA
).
With the changes confirmed, you execute the method again, this time with the argument in_place = TRUE
to make the changes permanent. Note that in_place = FALSE
is the default: you must opt in to actually change data on the database.
Each method has its own requirements in order to maintain database consistency. These involve constraints on primary key values as they are how rows are identified.
Method | Requirements |
---|---|
dm_rows_insert() |
The primary keys must differ from existing records. |
dm_rows_update() |
Primary keys must match for all records to be updated. |
dm_rows_patch() |
Updates missing values in existing records. Primary keys must match for all records to be patched. |
dm_rows_upsert() |
Updates existing records and adds new records, based on the primary key. |
dm_rows_delete() |
Removes matching records based on the primary key. |
dm_rows_truncate() |
Removes all records, only for tables in the changeset dm. |
To ensure the integrity of all relations during the process, all methods automatically determine the correct processing order for the tables involved. For operations that create records, parent tables are processed before child tables. For dm_rows_delete()
and dm_rows_truncate()
, child tables are processed before their parent tables. For more details on this see vignette("howto-dm-theory")
and vignette("howto-dm-db")
.
To demonstrate the use of these table modifying methods we will create a simple dm object with two tables linked by a foreign key. Note the foreign key of NA
in the child
table.
#> # A tibble: 3 x 2
#> value pk
#> <chr> <int>
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> # A tibble: 3 x 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
demo_dm <-
dm(parent = parent, child = child) %>%
dm_add_pk(parent, pk) %>%
dm_add_pk(child, pk) %>%
dm_add_fk(child, fk, parent)
demo_dm %>%
dm_draw(view_type = "full")
{dm} doesn’t check your key values when you create a dm, we add this check:1
dm_examine_constraints(demo_dm)
Then we copy demo_dm
into an SQLite database. Note: the default for the method used, copy_dm_to()
, is to create temporary tables that will be automatically deleted when your session ends. As demo_sql
will be the destination dm for the examples, the argument temporary = FALSE
is used to make this distinction apparent.
library(DBI)
sqlite_db <- DBI::dbConnect(RSQLite::SQLite())
demo_sql <- copy_dm_to(sqlite_db, demo_dm, temporary = FALSE)
demo_sql
#> ── Table source ───────────────────────────────────────────────────────────
#> src: sqlite 3.34.1 []
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `parent`, `child`
#> Columns: 5
#> Primary keys: 2
#> Foreign keys: 1
{dm}’s table modification methods can be piped together to create a repeatable sequence of operations that returns a dm incorporating all the changes required. This is a common use case for {dm} – building by hand a sequence of operations using temporary results until it is complete and correct, then committing the result.
dm_rows_insert()
To demonstrate dm_rows_insert()
we create a dm with tables containing the rows to insert and copy it to sqlite_db
, the same source as demo_sql
. For all of the dm_rows_*
methods the source and destination dm objects must be in the same RDBMS. You will get an error message if this is not the case.
The code below adds parent
and child
table entries for the letter “D”. First, the changeset dm is created and temporarily copied to the database:
new_parent <- tibble(value = "D", pk = 4)
new_parent
#> # A tibble: 1 x 2
#> value pk
#> <chr> <dbl>
#> 1 D 4
new_child <- tibble(value = "d", pk = 4, fk = 4)
new_child
#> # A tibble: 1 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 d 4 4
dm_insert_in <-
dm(parent = new_parent, child = new_child) %>%
copy_dm_to(sqlite_db, ., temporary = TRUE)
The changeset dm is then used as an argument to dm_rows_insert()
.
dm_insert_out <-
demo_sql %>%
dm_rows_insert(dm_insert_in)
#> Not persisting, use `in_place = FALSE` to turn off this message.
This gives us a warning that changes will not be persisted. Inspecting the child
table of the resulting dm_insert_out
and demo_sql
, we can see that’s exactly what happened. {dm} returned to us a dm object with our inserted rows in place, but the underlying database has not changed.
dm_insert_out$child
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.34.1 []
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
#> 4 d 4 4
demo_sql$child
#> # Source: table<`child`> [?? x 3]
#> # Database: sqlite 3.34.1 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
We repeat the operation, this time with the argument in_place = TRUE
and the changes now persist in demo_sql
.
dm_insert_out <-
demo_sql %>%
dm_rows_insert(dm_insert_in, in_place = TRUE)
demo_sql$child
#> # Source: table<`child`> [?? x 3]
#> # Database: sqlite 3.34.1 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
#> 4 d 4 4
dm_rows_update()
dm_rows_update()
works the same as dm_rows_insert()
. We create the dm object and copy it to the same source as the destination. Here we will change the foreign key for the row in child
containing “b” to point to the correct row in parent
. And we will persist the changes.
updated_child <- tibble(value = "b", pk = 2, fk = 2)
updated_child
#> # A tibble: 1 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 b 2 2
dm_update_in <-
dm(child = updated_child) %>%
copy_dm_to(sqlite_db, ., temporary = TRUE)
dm_update_out <-
demo_sql %>%
dm_rows_update(dm_update_in, in_place = TRUE)
demo_sql$child
#> # Source: table<`child`> [?? x 3]
#> # Database: sqlite 3.34.1 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
#> 4 d 4 4
dm_rows_delete()
dm_rows_delete()
is not currently implemented to work with an RDBMS, so we will shift our demonstrations back to the local R environment. We’ve made changes to demo_sql
so we use collect()
to copy the current tables out of SQLite. Note that persistence is not a concern with local dm objects. Every operation returns a new dm object containing the changes made.
local_dm <- collect(demo_sql)
local_dm$parent
#> # A tibble: 4 x 2
#> value pk
#> <chr> <int>
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
local_dm$child
#> # A tibble: 4 x 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
#> 4 d 4 4
dm_deleted <-
dm(parent = new_parent, child = new_child) %>%
dm_rows_delete(local_dm, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.
#> Ignoring extra columns: value, fk
#> Ignoring extra columns: value
dm_deleted$child
#> # A tibble: 3 x 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
dm_rows_patch()
dm_rows_patch()
updates missing values in existing records. We use it here to fix the missing foreign key in the child
table.
patched_child <- tibble(value = "c", pk = 3, fk = 3)
patched_child
#> # A tibble: 1 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 c 3 3
dm_patched <-
dm(child = patched_child) %>%
dm_rows_patch(dm_deleted, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.
dm_patched$child
#> # A tibble: 3 x 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 3
dm_rows_upsert()
dm_rows_upsert()
updates rows with supplied values if they exist or inserts the supplied values as new rows if they don’t. In this example we add the letter “D” back to our dm, and update the foreign key for “b”.
upserted_parent <- tibble(value = "D", pk = 4)
upserted_parent
#> # A tibble: 1 x 2
#> value pk
#> <chr> <dbl>
#> 1 D 4
#> # A tibble: 2 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 b 2 3
#> 2 d 4 4
dm_upserted <-
dm(parent = upserted_parent, child = upserted_child) %>%
dm_rows_upsert(dm_patched, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.
dm_upserted$parent
#> # A tibble: 4 x 2
#> value pk
#> <chr> <dbl>
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
dm_upserted$child
#> # A tibble: 4 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 a 1 1
#> 2 b 2 3
#> 3 c 3 3
#> 4 d 4 4
dm_rows_truncate()
dm_rows_truncate()
deletes all the rows in a table while leaving all other related information intact, including column names, column types, and key relations. The function derives its name from the SQL TRUNCATE TABLE
statement, so we will return to our SQLite database to demonstrate its use. The example below truncates only the child
table. Note how a modified version of the destination dm is used as “changeset dm”: the rows in the changeset dm do not matter here.
dm_trunc_in <-
demo_sql %>%
dm_select_tbl(child)
dm_trunc_in
#> ── Table source ───────────────────────────────────────────────────────────
#> src: sqlite 3.34.1 []
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `child`
#> Columns: 3
#> Primary keys: 1
#> Foreign keys: 0
dm_trunc_out <-
demo_sql %>%
dm_rows_truncate(dm_trunc_in, in_place = TRUE)
demo_sql$child
#> # Source: table<`child`> [?? x 3]
#> # Database: sqlite 3.34.1 []
#> # … with 3 variables: value <chr>, pk <int>, fk <dbl>
The dm_rows_*
methods give you row-level granularity over the modifications you need to make to your relational model. By using the in_place
argument they all share you can construct and verify your modifications before committing them. There are a few limitations, as mentioned in the tutorial, but these will be addressed in future updates to {dm}.
If this tutorial answered some questions, but opened others, these resources might be of assistance.
Is your data in an RDBMS? vignette("howto-dm-db")
offers a detailed look at working with an existing relational data model.
If your data is in data frames, then you may want to read vignette("howto-dm-df")
next.
If you feel you need to know more about relational data models in order to get the most out of dm, check out vignette("howto-dm-theory")
.
If you’re familiar with relational data models but want to know how to work with them in dm, then any of vignette("tech-dm-join")
, vignette("tech-dm-filter")
, or vignette("tech-dm-zoom")
is a good next step.
Be aware that when using dm_examine_constraints()
NULL (NA
) foreign keys are allowed and will be counted as a match. In some cases this doesn’t make sense and non-NULL columns should be enforced by the RDBMS. Currently {dm} does not specify or check non-NULL constraints for columns.↩︎