Derek Jones recently discussed a possible future for the R
ecosystem in “StatsModels: the first nail in R’s coffin”.
This got me thinking on the future of CRAN
(which I consider vital to R
, and vital in distributing our work) in the era of super-popular meta-packages. Meta-packages are convenient, but they have a profoundly negative impact on the packages they exclude.
For example: tidyverse
advertises a popular R
universe where the vital package data.table
never existed.
And now tidymodels
is shaping up to be a popular universe where our own package vtreat
never existed, except possibly as a footnote to embed
.
Users currently (with some luck) discover packages like ours and then (because they trust CRAN
) feel able to try them. With popular walled gardens that becomes much less likely. It is one thing for a standard package to duplicate another package (it is actually hard to avoid, and how work legitimately competes), it is quite another for a big-brand meta-package to pre-pick winners (and losers).
All I can say is: please give vtreat
a chance and a try. It is a package for preparing messy real-world data for predictive modeling. In addition to re-coding high cardinality categorical variables (into what we call effect-codes after Cohen, or impact-codes), it deals with missing values, can be parallelized, can be run on databases, and has years of production experience baked in.
Some places to start with vtreat
:
Like this:
Like Loading…
Related