dm_flatten_to_tbl()
and dm_squash_to_tbl()
gather all information of interest in one place in a wide table.
Both functions perform a disambiguation of column names and a cascade of joins.
dm_flatten_to_tbl(dm, start, ..., join = left_join) dm_squash_to_tbl(dm, start, ..., join = left_join)
dm | A |
---|---|
start | The table from which all outgoing foreign key relations are considered when establishing a processing order for the joins. An interesting choice could be for example a fact table in a star schema. |
... | Unquoted names of the tables to be included in addition to the |
join | The type of join to be performed, see |
A single table that results from consecutively joining all affected tables to the start
table.
With ...
left empty, this function will join together all the tables of your dm
object that can be reached from the start
table, in the direction of the foreign key relations
(pointing from the child tables to the parent tables), using the foreign key relations to
determine the argument by
for the necessary joins.
The result is one table with unique column names.
Use the ...
argument if you would like to control which tables should be joined to the start
table.
How does filtering affect the result?
Case 1, either no filter conditions are set in the dm
, or set only in the part that is unconnected to the start
table:
The necessary disambiguations of the column names are performed first.
Then all involved foreign tables are joined to the start
table successively, with the join function given in the join
argument.
Case 2, filter conditions are set for at least one table that is connected to start
:
First, disambiguation will be performed if necessary. The start
table is then calculated using tbl(dm, "start")
.
This implies
that the effect of the filters on this table is taken into account.
For right_join
, full_join
and nest_join
, an error
is thrown if any filters are set because filters will not affect the right hand side tables and the result will therefore be
incorrect in general (calculating the effects on all RHS-tables would also be time-consuming, and is not supported;
if desired, call dm_apply_filters()
first to achieve that effect).
For all other join types, filtering only the start
table is enough because the effect is passed on by
successive joins.
Mind that calling dm_flatten_to_tbl()
with join = right_join
and no table order determined in the ...
argument
will not lead to a well-defined result if two or more foreign tables are to be joined to start
.
The resulting
table would depend on the order the tables that are listed in the dm
.
Therefore, trying this will result in a warning.
Since join = nest_join()
does not make sense in this direction (LHS = child table, RHS = parent table: for valid key constraints
each nested column entry would be a tibble of one row), an error will be thrown if this method is chosen.
Other flattening functions:
dm_join_to_tbl()
#>#> #>#> # A tibble: 11,227 x 35 #> flights.year month day dep_time sched_dep_time dep_delay arr_time #> <int> <int> <int> <int> <int> <dbl> <int> #> 1 2013 1 10 3 2359 4 426 #> 2 2013 1 10 16 2359 17 447 #> 3 2013 1 10 450 500 -10 634 #> 4 2013 1 10 520 525 -5 813 #> 5 2013 1 10 530 530 0 824 #> 6 2013 1 10 531 540 -9 832 #> 7 2013 1 10 535 540 -5 1015 #> 8 2013 1 10 546 600 -14 645 #> 9 2013 1 10 549 600 -11 652 #> 10 2013 1 10 550 600 -10 649 #> # … with 11,217 more rows, and 28 more variables: sched_arr_time <int>, #> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, #> # dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, #> # time_hour <dttm>, airlines.name <chr>, airports.name <chr>, lat <dbl>, #> # lon <dbl>, alt <dbl>, tz <dbl>, dst <chr>, tzone <chr>, planes.year <int>, #> # type <chr>, manufacturer <chr>, model <chr>, engines <int>, seats <int>, #> # speed <int>, engine <chr>