Determine which columns would be good candidates to be used as foreign keys of a table,
to reference the primary key column of another table of the dm
object.
dm_enum_fk_candidates(dm, table, ref_table) enum_fk_candidates(zoomed_dm, ref_table)
dm | A |
---|---|
table | The table whose columns should be tested for suitability as foreign keys. |
ref_table | A table with a primary key. |
zoomed_dm | A |
A tibble with the following columns:
columns
columns of table
,
candidate
boolean: are these columns a candidate for a foreign key,
why
if not a candidate for a foreign key, explanation for for this.
dm_enum_fk_candidates()
first checks if ref_table
has a primary key set,
if not, an error is thrown.
If ref_table
does have a primary key, then a join operation will be tried using
that key as the by
argument of join() to match it to each column of table
.
Attempting to join incompatible columns triggers an error.
The outcome of the join operation determines the value of the why
column in the result:
an empty value for a column of table
that is a suitable foreign key candidate
the count and percentage of missing matches for a column that is not suitable
the error message triggered for unsuitable candidates that may include the types of mismatched columns
enum_fk_candidates()
works like dm_enum_fk_candidates()
with the zoomed table as table
.
These functions are marked "questioning" because we are not yet sure about
the interface, in particular if we need both dm_enum...()
and enum...()
variants.
Changing the interface later seems harmless because these functions are
most likely used interactively.
Other foreign key functions:
dm_add_fk()
,
dm_get_all_fks()
,
dm_get_fk()
,
dm_has_fk()
#> # A tibble: 19 x 3 #> columns candidate why #> <keys> <lgl> <chr> #> 1 origin TRUE "" #> 2 dest FALSE "242 entries (2.2%) of `flights$dest` not in `airport… #> 3 tailnum FALSE "11080 entries (98.7%) of `flights$tailnum` not in `a… #> 4 carrier FALSE "11227 entries (100%) of `flights$carrier` not in `ai… #> 5 air_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 6 arr_delay FALSE "Can't join on `x$value` x `y$value` because of incom… #> 7 arr_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 8 day FALSE "Can't join on `x$value` x `y$value` because of incom… #> 9 dep_delay FALSE "Can't join on `x$value` x `y$value` because of incom… #> 10 dep_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 11 distance FALSE "Can't join on `x$value` x `y$value` because of incom… #> 12 flight FALSE "Can't join on `x$value` x `y$value` because of incom… #> 13 hour FALSE "Can't join on `x$value` x `y$value` because of incom… #> 14 minute FALSE "Can't join on `x$value` x `y$value` because of incom… #> 15 month FALSE "Can't join on `x$value` x `y$value` because of incom… #> 16 sched_arr_t… FALSE "Can't join on `x$value` x `y$value` because of incom… #> 17 sched_dep_t… FALSE "Can't join on `x$value` x `y$value` because of incom… #> 18 time_hour FALSE "Can't join on `x$value` x `y$value` because of incom… #> 19 year FALSE "Can't join on `x$value` x `y$value` because of incom…#> # A tibble: 19 x 3 #> columns candidate why #> <keys> <lgl> <chr> #> 1 origin TRUE "" #> 2 dest FALSE "242 entries (2.2%) of `flights$dest` not in `airport… #> 3 tailnum FALSE "11080 entries (98.7%) of `flights$tailnum` not in `a… #> 4 carrier FALSE "11227 entries (100%) of `flights$carrier` not in `ai… #> 5 air_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 6 arr_delay FALSE "Can't join on `x$value` x `y$value` because of incom… #> 7 arr_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 8 day FALSE "Can't join on `x$value` x `y$value` because of incom… #> 9 dep_delay FALSE "Can't join on `x$value` x `y$value` because of incom… #> 10 dep_time FALSE "Can't join on `x$value` x `y$value` because of incom… #> 11 distance FALSE "Can't join on `x$value` x `y$value` because of incom… #> 12 flight FALSE "Can't join on `x$value` x `y$value` because of incom… #> 13 hour FALSE "Can't join on `x$value` x `y$value` because of incom… #> 14 minute FALSE "Can't join on `x$value` x `y$value` because of incom… #> 15 month FALSE "Can't join on `x$value` x `y$value` because of incom… #> 16 sched_arr_t… FALSE "Can't join on `x$value` x `y$value` because of incom… #> 17 sched_dep_t… FALSE "Can't join on `x$value` x `y$value` because of incom… #> 18 time_hour FALSE "Can't join on `x$value` x `y$value` because of incom… #> 19 year FALSE "Can't join on `x$value` x `y$value` because of incom…