Questioning lifecycle

enum_pk_candidates() checks for each column of a table if the column contains only unique values, and is thus a suitable candidate for a primary key of the table.

dm_enum_pk_candidates() performs these checks for a table in a dm object.

enum_pk_candidates(table)

dm_enum_pk_candidates(dm, table)

Arguments

table

A table in the dm.

dm

A dm object.

Value

A tibble with the following columns:

columns

columns of table,

candidate

boolean: are these columns a candidate for a primary key,

why

if not a candidate for a primary key column, explanation for this.

Life cycle

These functions are marked "questioning" because we are not yet sure about the interface, in particular if we need both dm_enum...() and enum...() variants. Changing the interface later seems harmless because these functions are most likely used interactively.

See also

Other primary key functions: dm_add_pk(), dm_get_all_pks(), dm_get_pk(), dm_has_pk()

Examples

nycflights13::flights %>% enum_pk_candidates()
#> # A tibble: 19 x 3 #> columns candidate why #> <keys> <lgl> <chr> #> 1 air_time FALSE has missing values, and duplicate values: 20, 21, 22,… #> 2 arr_delay FALSE has missing values, and duplicate values: -75, -71, -… #> 3 arr_time FALSE has missing values, and duplicate values: 1, 2, 3, 4,… #> 4 carrier FALSE has duplicate values: 9E, AA, AS, B6, DL, … #> 5 day FALSE has duplicate values: 1, 2, 3, 4, 5, … #> 6 dep_delay FALSE has missing values, and duplicate values: -25, -24, -… #> 7 dep_time FALSE has missing values, and duplicate values: 1, 2, 3, 4,… #> 8 dest FALSE has duplicate values: ABQ, ACK, ALB, ANC, ATL, … #> 9 distance FALSE has duplicate values: 80, 94, 96, 116, 143, … #> 10 flight FALSE has duplicate values: 1, 2, 3, 4, 5, … #> 11 hour FALSE has duplicate values: 5, 6, 7, 8, 9, … #> 12 minute FALSE has duplicate values: 0, 1, 2, 3, 4, … #> 13 month FALSE has duplicate values: 1, 2, 3, 4, 5, … #> 14 origin FALSE has duplicate values: EWR, JFK, LGA #> 15 sched_arr_t… FALSE has duplicate values: 1, 2, 3, 4, 5, … #> 16 sched_dep_t… FALSE has duplicate values: 500, 505, 510, 515, 516, … #> 17 tailnum FALSE has missing values, and duplicate values: D942DN, N0E… #> 18 time_hour FALSE has duplicate values: 2013-01-01 05:00:00, 2013-01-01… #> 19 year FALSE has duplicate values: 2013
dm_nycflights13() %>% dm_enum_pk_candidates(airports)
#> # A tibble: 8 x 3 #> columns candidate why #> <keys> <lgl> <chr> #> 1 faa TRUE "" #> 2 lon TRUE "" #> 3 alt FALSE "has duplicate values: 0, 1, 3, 4, 5, …" #> 4 dst FALSE "has duplicate values: A, N, U" #> 5 lat FALSE "has duplicate values: 38.88944, 40.63975" #> 6 name FALSE "has duplicate values: All Airports, Capital City Airport, … #> 7 tz FALSE "has duplicate values: -10, -9, -8, -7, -6, …" #> 8 tzone FALSE "has missing values, and duplicate values: America/Anchorag…