Filtering one table of a dm object has an effect on all tables connected to this table via one or more steps of foreign key relations. Firstly, one or more filter conditions for one or more tables can be defined using cdm_filter(), with a syntax similar to dplyr::filter(). These conditions will be stored in the dm and not immediately executed. With cdm_apply_filters() all tables will be updated according to the filter conditions and the foreign key relations.

cdm_filter(dm, table, ...)

cdm_apply_filters(dm)

Arguments

dm

A dm object.

table

A table in the dm

...

Logical predicates defined in terms of the variables in .data, passed on to dplyr::filter(). Multiple conditions are combined with & or ,. Only rows where the condition evaluates to TRUE are kept.

The arguments in ... are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing. See vignette("programming", package = "dplyr") for an introduction to these concepts.

Details

cdm_filter() allows you to set one or more filter conditions for one table of a dm object. These conditions will be stored in the dm for when they are needed. Once executed, the filtering the will affect all tables connected to the filtered one by foreign key constraints, leaving only the rows with the corresponding key values. The filtering implicitly takes place, once a table is requested from the dm by using one of tbl(), [[.dm(), $.dm().

With cdm_apply_filters() all set filter conditions are applied and their combined cascading effect on each table of the dm is taken into account, producing a new dm object. This function is called by the compute() method for dm class objects.

Examples

library(dplyr) dm_nyc_filtered <- cdm_nycflights13() %>% cdm_filter(airports, name == "John F Kennedy Intl") tbl(dm_nyc_filtered, "flights")
#> # A tibble: 111,279 x 19 #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> <int> <int> <int> <int> <int> <dbl> <int> <int> #> 1 2013 1 1 542 540 2 923 850 #> 2 2013 1 1 544 545 -1 1004 1022 #> 3 2013 1 1 557 600 -3 838 846 #> 4 2013 1 1 558 600 -2 849 851 #> 5 2013 1 1 558 600 -2 853 856 #> 6 2013 1 1 558 600 -2 924 917 #> 7 2013 1 1 559 559 0 702 706 #> 8 2013 1 1 606 610 -4 837 845 #> 9 2013 1 1 611 600 11 945 931 #> 10 2013 1 1 613 610 3 925 921 #> # … with 111,269 more rows, and 11 more variables: arr_delay <dbl>, #> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, #> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
dm_nyc_filtered[["planes"]]
#> # A tibble: 1,381 x 9 #> tailnum year type manufacturer model engines seats speed engine #> <chr> <int> <chr> <chr> <chr> <int> <int> <int> <chr> #> 1 N102UW 1998 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 2 N103US 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 3 N104UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 4 N105UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 5 N107US 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 6 N108UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 7 N109UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 8 N110UW 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 9 N111US 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> 10 N112US 1999 Fixed wing m… AIRBUS INDUST… A320-… 2 182 NA Turbo-… #> # … with 1,371 more rows
dm_nyc_filtered$airlines
#> # A tibble: 10 x 2 #> carrier name #> <chr> <chr> #> 1 9E Endeavor Air Inc. #> 2 AA American Airlines Inc. #> 3 B6 JetBlue Airways #> 4 DL Delta Air Lines Inc. #> 5 EV ExpressJet Airlines Inc. #> 6 HA Hawaiian Airlines Inc. #> 7 MQ Envoy Air #> 8 UA United Air Lines Inc. #> 9 US US Airways Inc. #> 10 VX Virgin America
cdm_nycflights13() %>% cdm_filter(airports, name == "John F Kennedy Intl") %>% cdm_apply_filters()
#> ── Table source ──────────────────────────────────────────────────────────────── #> src: <package: nycflights13> #> ── Data model ────────────────────────────────────────────────────────────────── #> Data model object: #> 5 tables: airlines, airports, flights, planes ... #> 53 columns #> 3 primary keys #> 3 references #> ── Filters ───────────────────────────────────────────────────────────────────── #> None
cdm_nycflights13() %>% cdm_filter(flights, month == 3) %>% cdm_apply_filters()
#> ── Table source ──────────────────────────────────────────────────────────────── #> src: <package: nycflights13> #> ── Data model ────────────────────────────────────────────────────────────────── #> Data model object: #> 5 tables: airlines, airports, flights, planes ... #> 53 columns #> 3 primary keys #> 3 references #> ── Filters ───────────────────────────────────────────────────────────────────── #> None
library(dplyr) cdm_nycflights13() %>% cdm_filter(planes, engine %in% c("Reciprocating", "4 Cycle")) %>% compute()
#> ── Table source ──────────────────────────────────────────────────────────────── #> src: <package: nycflights13> #> ── Data model ────────────────────────────────────────────────────────────────── #> Data model object: #> 5 tables: airlines, airports, flights, planes ... #> 53 columns #> 3 primary keys #> 3 references #> ── Filters ───────────────────────────────────────────────────────────────────── #> None