When loading both plyr
and dplyr
, the last package loaded overwrites symbols exported by the package loaded first:
library(plyr)
library(dplyr)
Currently, the following symbols are affected:
plyr_exports <- ls("package:plyr")
dplyr_exports <- ls("package:dplyr")
(both_exports <- intersect(plyr_exports, dplyr_exports))
## [1] "arrange" "count" "desc" "failwith" "id" "mutate"
## [7] "rename" "summarise" "summarize"
This means that existing projects that use plyr
cannot simply load dplyr
using library(dplyr)
without potentially breaking existing code. There are workarounds, but all of them seem to have specific disadvantages:
dplyr::summarise
).
dplyr
after plyr
, modify usage of conflicting symbols in the code
dplyr
primitives with plyr
This document explores alternative solutions.
Let’s take a closer look at the interface of the functions exported from both packages:
name | plyr | dplyr | data_first | identical |
---|---|---|---|---|
mutate | .data, … | .data, … | TRUE | TRUE |
summarise | .data, … | .data, … | TRUE | TRUE |
summarize | .data, … | .data, … | TRUE | TRUE |
arrange | df, … | .data, … | TRUE | FALSE |
count | df, vars, wt_var | x, …, wt, sort | TRUE | FALSE |
rename | x, replace, warn_missing, warn_duplicated | .data, … | TRUE | FALSE |
desc | x | x | FALSE | TRUE |
failwith | default, f, quiet | default, f, quiet | FALSE | TRUE |
id | .variables, drop | .variables, drop | FALSE | TRUE |
We can split the overlaps in two groups:
name | plyr | dplyr | identical |
---|---|---|---|
mutate | .data, … | .data, … | TRUE |
summarise | .data, … | .data, … | TRUE |
summarize | .data, … | .data, … | TRUE |
arrange | df, … | .data, … | FALSE |
count | df, vars, wt_var | x, …, wt, sort | FALSE |
rename | x, replace, warn_missing, warn_duplicated | .data, … | FALSE |
Here, mutate
, summari[sz]e
and arrange
mean the same in both plyr
and dplyr
, although plyr::summari[sz]e
on a grouped tbl_df
seems to destroy the grouping. The count
function is similar, too, only that plyr::count
is more similar to dplyr::count_
, and that for some reason the first argument of dplyr::count
is called x
and not .data
. Also, plyr::rename
seems to be more similar to dplyr::rename_
.
name | plyr | dplyr | |
---|---|---|---|
7 | desc | x | x |
8 | failwith | default, f, quiet | default, f, quiet |
9 | id | .variables, drop | .variables, drop |
Here, we take a look at the bodies. We compare them for textual identity.
body_l %>%
llply(. %>% { identical(as.character(.$plyr), as.character(.$dplyr)) } ) %>%
ldply(. %>% data.frame(identical_body = .), .id = "name")
name | identical_body |
---|---|
desc | TRUE |
failwith | FALSE |
id | TRUE |
Only failwith
is different, but the effect seems to be the same:
body_l$failwith$plyr
{
f <- match.fun(f)
function(...) try_default(f(...), default, quiet = quiet)
}
body_l$failwith$dplyr
{
function(...) {
out <- default
try(out <- f(...), silent = quiet)
out
}
}
For fully seamless transition between plyr
and dplyr
, a compatibility package looks like a possible option. It would provide the union of the exported functions of both packages, and compatibility wrappers for the two functions count
and rename
that need special attention.
Deprecating the functions count
and rename
in the plyr
package seems simpler. For an even more radical solution, all overlapping functions could be deprecated, referring to identical functionality in dplyr
:
mutate
-> dplyr::mutate
summari[sz]e
-> dplyr::summari[sz]e
arrange(df, ...)
-> dplyr::arrange(.data, ...)
count(df, vars, wt_var)
-> dplyr::count_(x, vars, wt, sort = FALSE)
n
column to freq
rename(x, replace, warn_missing)
-> dplyr::rename_(.data, replace)
replace
needs to be massagedwarn_missing
is always TRUE
desc()
, failwith()
, id()
-> dplyr::...
For practical use, a thin compatibility layer seems to work reasonably well for a project that was created using plyr
and is now transitioning towards dplyr
:
attach(pdlyr::dplyr_compat)
## The following objects are masked from package:dplyr:
##
## count, mutate, rename
##
## The following objects are masked from package:plyr:
##
## count, mutate, rename
Tests in the pdlyr
package will assure that this package can be safely loaded with warn.conflicts = FALSE
. The original plyr
implementations can be accessed via shortcuts (prefix p
).
A few usage examples:
mtcars %>% mutate(lphkm = 100 * 3.785411784 / 1.609344 / mpg) %>% head
## Warning in mutate(., lphkm = 100 * 3.785411784/1.609344/mpg): Row names
## will be lost
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | lphkm |
---|---|---|---|---|---|---|---|---|---|---|---|
21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | 11.20069 |
21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 11.20069 |
22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | 10.31643 |
21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | 10.99134 |
18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | 12.57832 |
18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | 12.99528 |
mtcars %>% plyr::mutate(lphkm = 100 * 3.785411784 / 1.609344 / mpg) %>% head
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | lphkm | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | 11.20069 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 11.20069 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | 10.31643 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | 10.99134 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | 12.57832 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | 12.99528 |
mtcars %>% ddply("cyl", summarize, mean_mpg = mean(mpg))
cyl | mean_mpg |
---|---|
4 | 26.66364 |
6 | 19.74286 |
8 | 15.10000 |
mtcars %>% ddply("cyl", plyr::summarise, mean_mpg = mean(mpg))
cyl | mean_mpg |
---|---|
4 | 26.66364 |
6 | 19.74286 |
8 | 15.10000 |
mtcars %>% arrange(-wt) %>% head
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
---|---|---|---|---|---|---|---|---|---|---|
10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
mtcars %>% plyr::arrange(-wt) %>% head
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
---|---|---|---|---|---|---|---|---|---|---|
10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
mtcars %>% count("gear")
## Warning: 'count' is deprecated.
## Use 'dplyr::count_' instead.
## See help("Deprecated") and help("plyr-deprecated").
gear | freq |
---|---|
3 | 15 |
4 | 12 |
5 | 5 |
mtcars %>% plyr::count("gear")
gear | freq |
---|---|
3 | 15 |
4 | 12 |
5 | 5 |
mtcars %>% rename(list(mpg = "miles_per_gallon")) %>% extract(1:2) %>% head
## Warning: 'rename' is deprecated.
## Use 'dplyr::rename_' instead.
## See help("Deprecated") and help("plyr-deprecated").
miles_per_gallon | cyl | |
---|---|---|
Mazda RX4 | 21.0 | 6 |
Mazda RX4 Wag | 21.0 | 6 |
Datsun 710 | 22.8 | 4 |
Hornet 4 Drive | 21.4 | 6 |
Hornet Sportabout | 18.7 | 8 |
Valiant | 18.1 | 6 |
mtcars %>% plyr::rename(list(mpg = "miles_per_gallon")) %>% extract(1:2) %>% head
miles_per_gallon | cyl | |
---|---|---|
Mazda RX4 | 21.0 | 6 |
Mazda RX4 Wag | 21.0 | 6 |
Datsun 710 | 22.8 | 4 |
Hornet 4 Drive | 21.4 | 6 |
Hornet Sportabout | 18.7 | 8 |
Valiant | 18.1 | 6 |
Last changed: 2015-05-27 11:36:17 UTC