Your and my 2019 R goals

Here we go again, using a Twitter trend as blog fodder! Colin Faylaunched an inspiring movement by sharing his R goals of 2019.

It’s been quite interesting reading the objectives of other tweeps: whatthey want to learn, make, how they want to get involved in thecommunity, etc. As Mike Kearney, rtweet’s maintainer, underlined, itis excellent reading material!

… but also blogging material! Let me fetch and tokenize these tweets tosummarize them!

Disclaimer: I later saw that Jason Baik got the sameidea and wasfaster than I, find the analysishere.

If you’re using rtweet for the first time, check out itswebsite for information about use and setup andalso refer to Twitter API docsthemselves for knowing more about rate limitation and e.g. learning thatthe search endpoint won’t let you get tweets older than 6-9 days old.

1
2
3
tweets <- rtweet::search_tweets("Rstats goals 2019",
 include_rts = FALSE)

I obtained 87 tweets from 85 unique users. Definitely not big data, butnot bad!

I then set out to tokenize the tweets into words using the specifictokenizers::tokenize_tweets() tokenizer via the tidytext package. Ifyou’re new to tidytext I’d recommend reading the book written by itsauthors. A token in natural languageprocessing can be a word, line, etc. which is a totally differentconcept from a token for rtweet functions (your API credentials).

The tweet tokenization is a “tokenization by word that preservesusernames, hashtags, and URLS�. So awesome, and today is the first timeI find an occasion to use it! I also removed stopwords.

1
2
3
4
5
6
7
8
9
10
11
library("magrittr")

stopwords <- rcorpora::corpora("words/stopwords/en")$stopWords

tokens <- tweets %>%
 dplyr::select(text) %>%
 tidytext::unnest_tokens(token, text,
 token = "tweets",
 drop = FALSE) %>%
 dplyr::filter(!token %in% stopwords) 

Most mentioned topics

I first was able to draw a figure similar to Jason Baik’s one, with themost common tokens. I too removed digits.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
library("ggalt")
tokens %>%
 dplyr::mutate(token = stringr::str_remove_all(token, "[^\x01-\x7F]")) %>%
 dplyr::mutate(token = stringr::str_remove_all(token, "[[:digit:]]")) %>%
 dplyr::filter(! token %in% c("", "#rstats", "goals")) %>%
 dplyr::count(token, sort = TRUE) %>%
 dplyr::mutate(token = reorder(token, n)) %>%
 head(n = 18) %>%
 ggplot() +
 geom_lollipop(aes(token, n),
 size = 1.5, col = "salmon") +
 hrbrthemes::theme_ipsum(base_size = 12,
 axis_title_size = 12) +
 coord_flip()

In this figure I identify verbs like learn, finish, write, buildand contribute. Let me look at a sample of lines for each of them.This is a sample of lines for a small sample of verbs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
lines <- tweets %>%
 dplyr::select(text) %>%
 tidytext::unnest_tokens(line, text,
 token = "lines")

sample_verb <- function(verb, lines){
 set.seed(42)
 dplyr::filter(lines, stringr::str_detect(line, paste0(verb, " "))) %>%
 dplyr::sample_n(3)
}

samples <- purrr::map_df(c("learn", "finish", "write", "build", "contribute"), sample_verb, lines)

knitr::kable(samples)

3�⃣ learn how to make r packages and write my code so it could be made into an r package more easily
1. learn how to do spatial analysis in r
2�⃣ learn better way to automate feature engineering (neural nets) for text
1�⃣ finally finish all the courses and certifications i started last year on #coursera and #datacamp
2�⃣ finish my track and field r package
3). finish that text mining project i started in october
4�⃣ write an advanced shiny book with bookdown �
3�⃣ learn how to make r packages and write my code so it could be made into an r package more easily
1�⃣ write the htmlwidgets book
– build my first #rstats package (aiming for 2 but 1 would be great :d)
2⃣ build a shiny web app to explore tx staar data
– use f(x) regularly & build own package. cease patching.
5⃣ contribute to foss https://t.co/oh7mwcq50r
2 contribute more to #rstats community through #scicomm, #stackoverflow, etc
3) contribute to #swdchallenge (with r, duh)

These actions are quite varied, e.g. writing is applied to software aswell as reading material. My goal was to summarize tweets, but I keepthinking reading all of them is interesting!

Packages?

I wondered how many of the tokens correspond to a package name. Ilimited myself to CRAN packages, by using the available.packages()function, but one could have a look at the source code of theavailable package to getan idea of how to find names of packages from Bioconductor and GitHub.

1
2
3
4
5
6
cran_pkgs <- as.character(
 available.packages(contrib.url('https://cran.r-project.org', 'source'))[,"Package"])
pkg_tokens <- dplyr::mutate(tokens,
 token = gsub("#", "", token)) %>%
 dplyr::filter(token %in% cran_pkgs)

Using the data I’ll look at tweets with the most packages, and mostfrequent packages.

1
2
3
4
5
6
7
pkg_tokens %>%
 dplyr::group_by(text) %>%
 dplyr::mutate(pkg_text = paste(toString(token), text)) %>%
 dplyr::count(pkg_text, sort = TRUE) %>%
 head(n = 3) %>%
 dplyr::pull(pkg_text)

1
2
3
4
## [1] "portfolio, blogdown, rmarkdown, knitr, shiny, maps I really like seeing all these #rstats 2019 goals. My own, in order of urgency:\n1) Finish my personal website and online portfolio using blogdown\n2) Get rolling with project workflows, rmarkdown, and knitr \n3) Create shiny apps for custom interactive maps" 
## [2] "inference, projects, import, rvest, httr, xml2 My #rstats 2019 goals:\n1. Improve my statistical modeling and inference skills\n2. Develop business literacy and apply it in data analysis projects\n3. Continue to post on my blog (1 post every 2 months)\n4. Learn to import data using DBI, rvest, httr, and xml2" 
## [3] "shiny, templates, shiny, shiny, bookdown, shiny My #RStats goals for 2019: \n\n1 Improve shinydashboardPlus, bs4Dash and argonDash .. \n\n2 Release new shiny templates \n3 Open a consulting service for https://t.co/k3PAbxyVMa about shiny \n4 Write an advanced shiny book with bookdown \n#rstats #shiny #consulting https://t.co/Fyc7MhaeW8"

There are false positives, e.g. projects was here meant as a word, nota package name. What about the most popular packages among the tweets?

1
2
dplyr::count(pkg_tokens, token, sort = TRUE)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## # A tibble: 66 x 2
## token n
## 
## 1 shiny 16
## 2 blogdown 9
## 3 projects 9
## 4 rmarkdown 9
## 5 tidyverse 6
## 6 bookdown 5
## 7 purrr 4
## 8 track 4
## 9 caret 3
## 10 markdown 3
## # ... with 56 more rows

In this table, we get a glimpse at current popular packages, apart from“projects�, “track� and “markdown�. If I’m reading the list correctlythey’re all developed at RStudio!

In this post I followed an approach similar to JasonBaik’sto summarize tweets about 2019 R goals announced on Twitter: I collectedtweets with rtweet and then used tidytext and the tidyverse tosummarize them. Goals often included learning about stuff, buildingpackages (find my list ofresources and don’t missthis offer by Steph deSilva),and mentions of RStudio packages.

What about my own R goals, that I haven’t tweeted? I have not made anylist, but have exciting projects at work, and hope to keepsemi-consistently posting on this blog. In January I’ll also get tostart 2019 by giving two R talks, one at R-LadiesParis and aremote one at ConectaR 2019!Happy 2019, I hope you can meet your own R goals!

Related