Questioning lifecycle

Determine which columns would be good candidates to be used as foreign keys of a table, to reference the primary key column of another table of the dm object.

dm_enum_fk_candidates(dm, table, ref_table)

enum_fk_candidates(zoomed_dm, ref_table)

Arguments

dm

A dm object.

table

The table whose columns should be tested for suitability as foreign keys.

ref_table

A table with a primary key.

zoomed_dm

A dm with a zoomed table.

Value

A tibble with the following columns:

columns

columns of table,

candidate

boolean: are these columns a candidate for a foreign key,

why

if not a candidate for a foreign key, explanation for for this.

Details

dm_enum_fk_candidates() first checks if ref_table has a primary key set, if not, an error is thrown.

If ref_table does have a primary key, then a join operation will be tried using that key as the by argument of join() to match it to each column of table. Attempting to join incompatible columns triggers an error.

The outcome of the join operation determines the value of the why column in the result:

  • an empty value for a column of table that is a suitable foreign key candidate

  • the count and percentage of missing matches for a column that is not suitable

  • the error message triggered for unsuitable candidates that may include the types of mismatched columns

enum_fk_candidates() works like dm_enum_fk_candidates() with the zoomed table as table.

Life cycle

These functions are marked "questioning" because we are not yet sure about the interface, in particular if we need both dm_enum...() and enum...() variants. Changing the interface later seems harmless because these functions are most likely used interactively.

See also

Other foreign key functions: dm_add_fk(), dm_get_all_fks(), dm_get_fk(), dm_has_fk()

Examples

dm_nycflights13() %>% dm_enum_fk_candidates(flights, airports)
#> # A tibble: 19 x 3 #> columns candidate why #> <keys> <lgl> <chr> #> 1 origin TRUE "" #> 2 dest FALSE "242 entries (2.2%) of `flights$dest` not in `airport… #> 3 tailnum FALSE "11080 entries (98.7%) of `flights$tailnum` not in `a… #> 4 carrier FALSE "11227 entries (100%) of `flights$carrier` not in `ai… #> 5 air_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 6 arr_delay FALSE "Can't join on `x$value` x `y$value` because of incom… #> 7 arr_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 8 day FALSE "Can't join on `x$value` x `y$value` because of incom… #> 9 dep_delay FALSE "Can't join on `x$value` x `y$value` because of incom… #> 10 dep_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 11 distance FALSE "Can't join on `x$value` x `y$value` because of incom… #> 12 flight FALSE "Can't join on `x$value` x `y$value` because of incom… #> 13 hour FALSE "Can't join on `x$value` x `y$value` because of incom… #> 14 minute FALSE "Can't join on `x$value` x `y$value` because of incom… #> 15 month FALSE "Can't join on `x$value` x `y$value` because of incom… #> 16 sched_arr_t… FALSE "Can't join on `x$value` x `y$value` because of incom… #> 17 sched_dep_t… FALSE "Can't join on `x$value` x `y$value` because of incom… #> 18 time_hour FALSE "Can't join on `x$value` x `y$value` because of incom… #> 19 year FALSE "Can't join on `x$value` x `y$value` because of incom…
dm_nycflights13() %>% dm_zoom_to(flights) %>% enum_fk_candidates(airports)
#> # A tibble: 19 x 3 #> columns candidate why #> <keys> <lgl> <chr> #> 1 origin TRUE "" #> 2 dest FALSE "242 entries (2.2%) of `flights$dest` not in `airport… #> 3 tailnum FALSE "11080 entries (98.7%) of `flights$tailnum` not in `a… #> 4 carrier FALSE "11227 entries (100%) of `flights$carrier` not in `ai… #> 5 air_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 6 arr_delay FALSE "Can't join on `x$value` x `y$value` because of incom… #> 7 arr_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 8 day FALSE "Can't join on `x$value` x `y$value` because of incom… #> 9 dep_delay FALSE "Can't join on `x$value` x `y$value` because of incom… #> 10 dep_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 11 distance FALSE "Can't join on `x$value` x `y$value` because of incom… #> 12 flight FALSE "Can't join on `x$value` x `y$value` because of incom… #> 13 hour FALSE "Can't join on `x$value` x `y$value` because of incom… #> 14 minute FALSE "Can't join on `x$value` x `y$value` because of incom… #> 15 month FALSE "Can't join on `x$value` x `y$value` because of incom… #> 16 sched_arr_t… FALSE "Can't join on `x$value` x `y$value` because of incom… #> 17 sched_dep_t… FALSE "Can't join on `x$value` x `y$value` because of incom… #> 18 time_hour FALSE "Can't join on `x$value` x `y$value` because of incom… #> 19 year FALSE "Can't join on `x$value` x `y$value` because of incom…