Our R package roundup

And yet again, it’s that time of the year when one eats too much and gets in a reflective mood! 2016 is nearly over, and us bloggers here at opiateforthemass.es thought it would be nice to argue endlessly which R package was the best/neatest/most fun/most useful/most whatever in this year! By now, it’s become a tradition.

Like last year, we will each present our top pick for this year’s R package, a purely subjective list of packages we (and Chuck Norris) approve of.

But do not despair, dear reader! We have also pulled hard data on R package popularity from CRAN, and will present this first.

Let’s start with some factual data before we go into our personal favourites of 2016. We’ll pull the titles of the new 2016 R packages from cranberries. Our approach from last year no longer works, since rvest::read_html chokes on the <doi> tags in the page – they must have been added during this year. Instead, we read the RSS feed of Dirk’s page and go a little crazy on the grep. Always a healthy thing, I say. Any problem can be solved if you throw enough grep at it. Randall agrees:

As mentioned last year, using downloads per day as a ranking metric could have the problem that earlier package releases have had more time to create a buzz and shift up the average downloads per day, skewing the data in favour of older releases. Or, it could have the complication that younger package releases are still on the early “hump” part of the downloads (let’s assume they’ll follow a log-normal (exponential decay) distribution, which most of these things do), thus skewing the data in favour of younger releases. We still don’t know, since we still didn’t bother to dig deeper into this topic. Perhaps we’ll manage next year, but I’m not going to do any New Year’s Resolutions…

For now, let’s again assume that average downloads per day is a relatively stable metric to gauge package success with. With some quick dplyr and ggplot magic, these are the top 20 new CRAN packages from 2016, by average number of daily downloads:

The full code and data is availble on github, of course.

As we can see, the main bias does not come from our choice of ranking metric, but by the fact that some packages are more “under the hood” and are pulled by many packages as dependencies, thus inflating the download statistics.

tibble is Hadley’s re-take of the venerable data.frame and is pulled as a dependency by practically all of his packages. It was always part of dplyr known as tbl and now got an own package to keep things clean and isolated. tidyverse is a meta-package containing all of Hadley’s “tidy data” packages, and allows for easier loading of these. We thought about picking these as our top packages, but we settled for naming them “honorable, because obvious mentions”. We want to put some focus on the not-so-universally-known packages as well!

The packages (sourcetools, hms, plogr, backports) are mainly technical packages, racking up downloads by being pulled as dependencies by popular packages or by being helpful in running R code “deeply”. ModelMetrics contains metrics to compare Rcpp models. rsconnect and miniUI are packages used by shiny, thus their appearance high in the list.

Excluding these, the clear winner of “frontline” packages are rprojroot, allowing you to define your project root directory, by matching a certain criterion (for instance, a certain file is contained in that directory), forcats for making working with categorical variables easier, and modelr, enhancing the pipe workflow established by magittr. I must admit that I had not known of the rprojroot package, probably because I’m a sickler for clean directory setups.

An interesting find is ggrepel, as it’s the single “gg-packages” (of 23 in total) that made it into the top20. Since the possibility to build extensions for ggplot2 there has been an explosion of specialised and fantastic packages to build even more, and better graphs in R. In fact, the new version of ggplot2 should be listed as our top “new” package for 2016, if we’d count new major versions as a new package. Also this year Rstudio finally got to it’s 1.0.0 version, with many cool new features.

But now, this blog’s authors’ personal top four of new R packages for 2016:

(safferli’s pick)

bookdown is a specialised markdown package by the venerable Yihui, of knitr fame. It’s still early in development, but already very much productive. It’s used to build actual books from Rmarkdown documents. It can build gitbook, pdf, mobile, and html books, all from a single markdown source.

Being a huge LaTeX fan, I’m a little sad that with me working more with data in R the advent of knitrand now bookdown I actually never get to use it directly any more. For extremely specialised (and static) books or reports I would probably still move back to LaTeX, but the power and flexibility of Rmarkdown to build dynamic html reports outweighs the detailed configuration possibilities and pure beauty of direct LaTeX. Especially if you are using data in your reports.

A big thumbs up for this package. If you’re looking to write a long-ish documentation or report, currently nothing beats gitbook, and bookdown gives you the power of R and Rmarkdown to work with.

(Yuki’s pick)

Yuki focuses more on Python now, and will continue blogging on the blog about Python-stuff and data science topics. To honor this, we picked feather for Yuki, a package that makes storing dataframes to disk lightning fast and shareable between R and Python.

(Kirill’s pick)

My choice of the year I discovered quite late (2 weeks ago), so maybe I am biased, because I am still amazed by it. I was building interactive graphs with highcharter and wanted to share the results with my colleagues. Since I had a couple of charts, I thought I will make a Rmarkdown document and insert them there. That’s where I discovered flexdashboard. Basically an Rmarkdown formatter to make static dashboards. Just build some interactive graphs with any htmlwidget based packed (or any package you like) arrange it easily with flexdashboard, et voilà, you have a static easy dashboard. I even realized that in many cases, if people want to share interactive graphs with other, the go-to tool is shiny, but this is overengineering often if all it does is filtering the data by date or categories. The beauty of flexdashboard is to have a single standalone html file, which doesn’t need serve side logic or any complex stuff in the backend.

(Jess’s pick)

I spent 2016 discovering the beauty of shinydashboard building quite some dashboards with many many tabs and many many more plots. Eventually I wanted the values to be displayed when hovering over a point with the mouse. This is when ggiraph came to the rescue, a package written by David Grohel that extends the ggplot functionality and makes it easy to animate ggplots. Take a look at the website and browse through the examples to see what ggiraph is capable of. Super easy to use with plain ggplot as well as with shiny!