Questioning lifecycle

Filtering a table of a dm object may affect other tables that are connected to it directly or indirectly via foreign key relations.

dm_filter() can be used to define filter conditions for tables using syntax that is similar to dplyr::filter(). These conditions will be stored in the dm, and executed immediately for the tables that they are referring to.

With dm_apply_filters(), all tables will be updated according to the filter conditions and the foreign key relations.

dm_apply_filters_to_tbl() retrieves one specific table of the dm that is updated according to the filter conditions and the foreign key relations.

dm_filter(dm, table, ...)

dm_apply_filters(dm)

dm_apply_filters_to_tbl(dm, table)

Arguments

dm

A dm object.

table

A table in the dm.

...

Logical predicates defined in terms of the variables in .data, passed on to dplyr::filter(). Multiple conditions are combined with & or ,. Only the rows where the condition evaluates to TRUE are kept.

The arguments in ... are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing. See vignette("programming", package = "dplyr") for an introduction to these concepts.

Value

For dm_filter: an updated dm object (filter executed for given table, and condition stored).

For dm_apply_filters: an updated dm object (filter effects evaluated for all tables).

For dm_apply_filters_to_tbl(), a table.

Details

The effect of the stored filter conditions on the tables related to the filtered ones is only evaluated in one of the following scenarios:

  1. Calling dm_apply_filters() or compute() (method for dm objects) on a dm: each filtered table potentially reduces the rows of all other tables connected to it by foreign key relations (cascading effect), leaving only the rows with corresponding key values. Tables that are not connected to any table with an active filter are left unchanged. This results in a new dm class object without any filter conditions.

  2. Calling dm_apply_filters_to_tbl(): the remaining rows of the requested table are calculated by performing a sequence of semi-joins (dplyr::semi_join()) starting from each table that has been filtered to the requested table (similar to 1. but only for one table).

Several functions of the dm package will throw an error if filter conditions exist when they are called.

Life cycle

These functions are marked "questioning" because it feels wrong to tightly couple filtering with the data model. On the one hand, an overview of active filters is useful when specifying the base data set for an analysis in terms of column selections and row filters. However, these filter condition should be only of informative nature and never affect the results of other operations. We are working on formalizing the semantics of the underlying operations in order to present them in a cleaner interface.

Use dm_zoom_to() and dplyr::filter() to filter rows without registering the filter.

Examples

dm_nyc_filtered <- dm_nycflights13() %>% dm_filter(airports, name == "John F Kennedy Intl") dm_apply_filters_to_tbl(dm_nyc_filtered, flights)
#> # A tibble: 111,279 x 19 #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> <int> <int> <int> <int> <int> <dbl> <int> <int> #> 1 2013 1 1 542 540 2 923 850 #> 2 2013 1 1 544 545 -1 1004 1022 #> 3 2013 1 1 557 600 -3 838 846 #> 4 2013 1 1 558 600 -2 849 851 #> 5 2013 1 1 558 600 -2 853 856 #> 6 2013 1 1 558 600 -2 924 917 #> 7 2013 1 1 559 559 0 702 706 #> 8 2013 1 1 606 610 -4 837 845 #> 9 2013 1 1 611 600 11 945 931 #> 10 2013 1 1 613 610 3 925 921 #> # … with 111,269 more rows, and 11 more variables: arr_delay <dbl>, #> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, #> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
dm_nycflights13() %>% dm_filter(airports, name == "John F Kennedy Intl") %>% dm_apply_filters()
#> ── Table source ──────────────────────────────────────────────────────────────── #> src: <environment: R_GlobalEnv> #> ── Metadata ──────────────────────────────────────────────────────────────────── #> Tables: `airlines`, `airports`, `flights`, `planes`, `weather` #> Columns: 53 #> Primary keys: 3 #> Foreign keys: 3
# If you want to keep only those rows in the parent tables # whose primary key values appear as foreign key values in # `flights`, you can set a `TRUE` filter in `flights`: dm_nycflights13() %>% dm_filter(flights, 1 == 1) %>% dm_apply_filters() %>% dm_nrow()
#> airlines airports flights planes weather #> 16 3 336776 3322 26115
# note that in this example, the only affected table is # `airports` because the departure airports in `flights` are # only the three New York airports. dm_nycflights13() %>% dm_filter(flights, month == 3) %>% dm_apply_filters()
#> ── Table source ──────────────────────────────────────────────────────────────── #> src: <environment: R_GlobalEnv> #> ── Metadata ──────────────────────────────────────────────────────────────────── #> Tables: `airlines`, `airports`, `flights`, `planes`, `weather` #> Columns: 53 #> Primary keys: 3 #> Foreign keys: 3
dm_nycflights13() %>% dm_filter(planes, engine %in% c("Reciprocating", "4 Cycle")) %>% compute()
#> ── Table source ──────────────────────────────────────────────────────────────── #> src: <environment: R_GlobalEnv> #> ── Metadata ──────────────────────────────────────────────────────────────────── #> Tables: `airlines`, `airports`, `flights`, `planes`, `weather` #> Columns: 53 #> Primary keys: 3 #> Foreign keys: 3
dm_nycflights13() %>% dm_filter(flights, month == 3) %>% dm_apply_filters_to_tbl(planes)
#> # A tibble: 2,619 x 9 #> tailnum year type manufacturer model engines seats speed engine #> <chr> <int> <chr> <chr> <chr> <int> <int> <int> <chr> #> 1 N10156 2004 Fixed wing m… EMBRAER EMB-1… 2 55 NA Turbo-… #> 2 N102UW 1998 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 3 N103US 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 4 N104UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 5 N10575 2002 Fixed wing m… EMBRAER EMB-1… 2 55 NA Turbo-… #> 6 N105UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 7 N107US 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 8 N108UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 9 N109UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 10 N110UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> # … with 2,609 more rows