dm_flatten_to_tbl() and dm_squash_to_tbl() gather all information of interest in one place in a wide table. Both functions perform a disambiguation of column names and a cascade of joins.

dm_flatten_to_tbl(dm, start, ..., join = left_join)

dm_squash_to_tbl(dm, start, ..., join = left_join)

Arguments

dm

A dm object.

start

The table from which all outgoing foreign key relations are considered when establishing a processing order for the joins. An interesting choice could be for example a fact table in a star schema.

...

Unquoted names of the tables to be included in addition to the start table. The order of the tables here determines the order of the joins. If the argument is empty, all tables that can be reached will be included. If this includes tables that are not direct neighbours of start, it will only work with dm_squash_to_tbl() (given one of the allowed join-methods). tidyselect is supported, see dplyr::select() for details on the semantics.

join

The type of join to be performed, see dplyr::join().

Value

A single table that results from consecutively joining all affected tables to the start table.

Details

With ... left empty, this function will join together all the tables of your dm object that can be reached from the start table, in the direction of the foreign key relations (pointing from the child tables to the parent tables), using the foreign key relations to determine the argument by for the necessary joins. The result is one table with unique column names. Use the ... argument if you would like to control which tables should be joined to the start table.

How does filtering affect the result?

Case 1, either no filter conditions are set in the dm, or set only in the part that is unconnected to the start table: The necessary disambiguations of the column names are performed first. Then all involved foreign tables are joined to the start table successively, with the join function given in the join argument.

Case 2, filter conditions are set for at least one table that is connected to start: First, disambiguation will be performed if necessary. The start table is then calculated using tbl(dm, "start"). This implies that the effect of the filters on this table is taken into account. For right_join, full_join and nest_join, an error is thrown if any filters are set because filters will not affect the right hand side tables and the result will therefore be incorrect in general (calculating the effects on all RHS-tables would also be time-consuming, and is not supported; if desired, call dm_apply_filters() first to achieve that effect). For all other join types, filtering only the start table is enough because the effect is passed on by successive joins.

Mind that calling dm_flatten_to_tbl() with join = right_join and no table order determined in the ... argument will not lead to a well-defined result if two or more foreign tables are to be joined to start. The resulting table would depend on the order the tables that are listed in the dm. Therefore, trying this will result in a warning.

Since join = nest_join() does not make sense in this direction (LHS = child table, RHS = parent table: for valid key constraints each nested column entry would be a tibble of one row), an error will be thrown if this method is chosen.

See also

Other flattening functions: dm_join_to_tbl()

Examples

dm_nycflights13() %>% dm_select_tbl(-weather) %>% dm_flatten_to_tbl(flights)
#> Renamed columns: #> * year -> flights$flights.year, planes$planes.year #> * name -> airlines$airlines.name, airports$airports.name
#> # A tibble: 336,776 x 35 #> flights.year month day dep_time sched_dep_time dep_delay arr_time #> <int> <int> <int> <int> <int> <dbl> <int> #> 1 2013 1 1 517 515 2 830 #> 2 2013 1 1 533 529 4 850 #> 3 2013 1 1 542 540 2 923 #> 4 2013 1 1 544 545 -1 1004 #> 5 2013 1 1 554 600 -6 812 #> 6 2013 1 1 554 558 -4 740 #> 7 2013 1 1 555 600 -5 913 #> 8 2013 1 1 557 600 -3 709 #> 9 2013 1 1 557 600 -3 838 #> 10 2013 1 1 558 600 -2 753 #> # … with 336,766 more rows, and 28 more variables: sched_arr_time <int>, #> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, #> # dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, #> # time_hour <dttm>, airlines.name <chr>, airports.name <chr>, lat <dbl>, #> # lon <dbl>, alt <dbl>, tz <dbl>, dst <chr>, tzone <chr>, planes.year <int>, #> # type <chr>, manufacturer <chr>, model <chr>, engines <int>, seats <int>, #> # speed <int>, engine <chr>