Filtering a table of a
dm object may affect other tables that are connected to it
directly or indirectly via foreign key relations.
dm_filter() can be used to define filter conditions for tables using syntax that is similar to
These conditions will be stored in the
dm, and executed immediately for the tables that they are referring to.
dm_apply_filters(), all tables will be updated according to the filter conditions and the foreign key relations.
dm_apply_filters_to_tbl() retrieves one specific table of the
dm that is updated according to the filter conditions and the foreign key relations.
dm_filter(dm, table, ...) dm_apply_filters(dm) dm_apply_filters_to_tbl(dm, table)
A table in the
Logical predicates defined in terms of the variables in
The arguments in ... are automatically quoted and evaluated in the context of
the data frame. They support unquoting and splicing.
dm_filter(): an updated
dm object (filter executed for given table, and condition stored).
dm_apply_filters(): an updated
dm object (filter effects evaluated for all tables).
dm_apply_filters_to_tbl(), a table.
The effect of the stored filter conditions on the tables related to the filtered ones is only evaluated in one of the following scenarios:
compute() (method for
dm objects) on a
dm: each filtered table potentially
reduces the rows of all other tables connected to it by foreign key relations (cascading effect), leaving only the rows
with corresponding key values.
Tables that are not connected to any table with an active filter are left unchanged.
This results in a new
dm class object without any filter conditions.
dm_apply_filters_to_tbl(): the remaining rows of the requested table are calculated by performing a sequence
of semi-joins (
dplyr::semi_join()) starting from each table that has been filtered to the requested table
(similar to 1. but only for one table).
Several functions of the dm package will throw an error if filter conditions exist when they are called.
These functions are marked "questioning" because it feels wrong to tightly couple filtering with the data model. On the one hand, an overview of active filters is useful when specifying the base data set for an analysis in terms of column selections and row filters. However, these filter condition should be only of informative nature and never affect the results of other operations. We are working on formalizing the semantics of the underlying operations in order to present them in a cleaner interface.
dm_nyc <- dm_nycflights13() dm_nyc_filtered <- dm_nycflights13() %>% dm_filter(airports, name == "John F Kennedy Intl") dm_apply_filters_to_tbl(dm_nyc_filtered, flights)#> # A tibble: 3,661 x 19 #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> <int> <int> <int> <int> <int> <dbl> <int> <int> #> 1 2013 1 10 3 2359 4 426 437 #> 2 2013 1 10 16 2359 17 447 444 #> 3 2013 1 10 531 540 -9 832 850 #> 4 2013 1 10 535 540 -5 1015 1017 #> 5 2013 1 10 553 600 -7 711 715 #> 6 2013 1 10 557 600 -3 855 912 #> 7 2013 1 10 557 600 -3 933 925 #> 8 2013 1 10 558 600 -2 844 906 #> 9 2013 1 10 559 600 -1 653 658 #> 10 2013 1 10 607 601 6 932 918 #> # … with 3,651 more rows, and 11 more variables: arr_delay <dbl>, #> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, #> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>dm_nyc_filtered %>% dm_apply_filters()#> ── Table source ──────────────────────────────────────────────────────────────── #> src: <environment: R_GlobalEnv> #> ── Metadata ──────────────────────────────────────────────────────────────────── #> Tables: `airlines`, `airports`, `flights`, `planes`, `weather` #> Columns: 53 #> Primary keys: 3 #> Foreign keys: 3# If you want to keep only those rows in the parent tables # whose primary key values appear as foreign key values in # `flights`, you can set a `TRUE` filter in `flights`: dm_nyc %>% dm_filter(flights, 1 == 1) %>% dm_apply_filters() %>% dm_nrow()#> airlines airports flights planes weather #> 16 3 11227 2345 861# note that in this example, the only affected table is # `airports` because the departure airports in `flights` are # only the three New York airports. dm_nyc %>% dm_filter(planes, engine %in% c("Reciprocating", "4 Cycle")) %>% compute()#> ── Table source ──────────────────────────────────────────────────────────────── #> src: <environment: R_GlobalEnv> #> ── Metadata ──────────────────────────────────────────────────────────────────── #> Tables: `airlines`, `airports`, `flights`, `planes`, `weather` #> Columns: 53 #> Primary keys: 3 #> Foreign keys: 3