Flatten a part of a dm into a wide table — dm_flatten_to

dm_flatten_to_tbl() and dm_squash_to_tbl() gather all information of interest in one place in a wide table. Both functions perform a disambiguation of column names and a cascade of joins.

dm_flatten_to_tbl(dm, start, ..., join = left_join)

dm_squash_to_tbl(dm, start, ..., join = left_join)

Arguments

dm	A `dm` object.
start	The table from which all outgoing foreign key relations are considered when establishing a processing order for the joins. An interesting choice could be for example a fact table in a star schema.
...	Unquoted names of the tables to be included in addition to the `start` table. The order of the tables here determines the order of the joins. If the argument is empty, all tables that can be reached will be included. If this includes tables that are not direct neighbors of `start`, it will only work with `dm_squash_to_tbl()` (given one of the allowed join-methods). `tidyselect` is supported, see `dplyr::select()` for details on the semantics.
join	The type of join to be performed, see `dplyr::join()`.

Value

A single table that results from consecutively joining all affected tables to the start table.

Details

With ... left empty, this function will join together all the tables of your dm object that can be reached from the start table, in the direction of the foreign key relations (pointing from the child tables to the parent tables), using the foreign key relations to determine the argument by for the necessary joins. The result is one table with unique column names. Use the ... argument if you would like to control which tables should be joined to the start table.

How does filtering affect the result?

Case 1, either no filter conditions are set in the dm, or set only in the part that is unconnected to the start table: The necessary disambiguations of the column names are performed first. Then all involved foreign tables are joined to the start table successively, with the join function given in the join argument.

Case 2, filter conditions are set for at least one table that is connected to start: First, disambiguation will be performed if necessary. The start table is then calculated using tbl(dm, "start"). This implies that the effect of the filters on this table is taken into account. For right_join, full_join and nest_join, an error is thrown if any filters are set because filters will not affect the right hand side tables and the result will therefore be incorrect in general (calculating the effects on all RHS-tables would also be time-consuming, and is not supported; if desired, call dm_apply_filters() first to achieve that effect). For all other join types, filtering only the start table is enough because the effect is passed on by successive joins.

Mind that calling dm_flatten_to_tbl() with join = right_join and no table order determined in the ... argument will not lead to a well-defined result if two or more foreign tables are to be joined to start. The resulting table would depend on the order the tables that are listed in the dm. Therefore, trying this will result in a warning.

Since join = nest_join() does not make sense in this direction (LHS = child table, RHS = parent table: for valid key constraints each nested column entry would be a tibble of one row), an error will be thrown if this method is chosen.

Examples

dm_nycflights13() %>%
  dm_select_tbl(-weather) %>%
  dm_flatten_to_tbl(flights)
#> Renamed columns:
#> * year -> flights.year, planes.year
#> * name -> airlines.name, airports.name
#> # A tibble: 11,227 x 35
#>    flights.year month   day dep_time sched_dep_time dep_delay arr_time
#>           <int> <int> <int>    <int>          <int>     <dbl>    <int>
#>  1         2013     1    10        3           2359         4      426
#>  2         2013     1    10       16           2359        17      447
#>  3         2013     1    10      450            500       -10      634
#>  4         2013     1    10      520            525        -5      813
#>  5         2013     1    10      530            530         0      824
#>  6         2013     1    10      531            540        -9      832
#>  7         2013     1    10      535            540        -5     1015
#>  8         2013     1    10      546            600       -14      645
#>  9         2013     1    10      549            600       -11      652
#> 10         2013     1    10      550            600       -10      649
#> # … with 11,217 more rows, and 28 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, origin <chr>,
#> #   dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
#> #   time_hour <dttm>, airlines.name <chr>, airports.name <chr>, lat <dbl>,
#> #   lon <dbl>, alt <dbl>, tz <dbl>, dst <chr>, tzone <chr>, planes.year <int>,
#> #   type <chr>, manufacturer <chr>, model <chr>, engines <int>, seats <int>,
#> #   speed <int>, engine <chr>

Flatten a part of a `dm` into a wide table

Arguments

Value

Details

See also

Examples