Advent of Code： Most Popular Languages

You might have heard of the Advent of Code,a 25-day challenge involving a programming puzzle a day, to be solvedwith the language of your choice. Iâ€™ve noted the popularity of thisactivity in my Twitter timeline but also in my GitHub timeline whereIâ€™ve seen the creation of a few advent-of-code or so repositories.

If I were to participate one year, Iâ€™d probably use R. Jenny Bryanâ€™stweet above inspired me to try and gauge the popularity of languagesused in the Advent of Code. To do that, in this post, I shall use thesearch endpoint of GitHub V3 API to identify Advent of Code 2018 repos.

Study design ğŸ˜‰

GitHubâ€™s V3 API offers a searchendpoint,that however gives you less results than doing the same search via theweb interface, even whenusing pagination right (or at least, when believing I use paginationright!). Iâ€™m however willing to use that sub-sample as basis for mystudy of language popularity. Itâ€™s actually a sub-sub-sample, since Iâ€™monly looking at Advent of code projects published on GitHub.

In order to circumvent the sub-sub-sampling a bit, Iâ€™ll do the search intwo steps:

Searching for Advent of code 2018 in general among repos, andextracting the language of the repos.

Searching for Advent of code 2018 for each of these languagesseparately and extracting the total count of hits.

Note that I am not filtering the repos by activity, so some of themcould very well have been created for a few days only. If they are emptythough, they do not get assigned a language.

Regarding the language of repos, GitHub assigns a language to eachrepository. This information can be wrong, which is e.g. mentioned inrOpenSciâ€™s developmentguide.Furthermore, my using this piece of information means Iâ€™m disregardingthe fact that some people actually use a mix of technologies to solvethe puzzles.

Actual queries

I first defined a function to search the API whilst respecting the ratelimiting. I even erred on the side of caution and queried very slowly.

.search <- function(page){
 gh::gh("GET /search/repositories",
 q = "adventofcode 2018",
 page = page,
 fork = FALSE)
}

search <- ratelimitr::limit_rate(.search,
 ratelimitr::rate(10, 60))

I then wrote two other functions to help me rectangle the API output foreach repository.

empty_null <- function(x){
 if(is.null(x)){
 ""
 }else{
 x
 }
}

rectangle <- function(item){
 tibble::tibble(full_name = item$full_name,
 language = empty_null(item$language))
}

I created a function putting these two pieces together.

get_page <- function(page){
 results <- try(search(page), silent = TRUE)

 # an early return
 if(inherits(results, "try-error")){
 return(NULL)
 }

 purrr::map_df(results$items,
 rectangle)
}

And I then ran the following pipeline.

total_count <- search(1)$total_count
pages <- 1:(ceiling(total_count/100))

results <- purrr::map_df(pages, get_page)
results <- unique(results)

languages <- unique(results$language)
languages <- languages[languages != ""]

This got me 814 repos, with 46 non empty languages. Repo names are quitevaried: rdmueller/aoc-2018, petertseng/adventofcode-rb-2018,NiXXeD/adventofcode, Arxcis/adventofcode2018,Stupremee/adventofcode-2018, phaazon/advent-of-code-2k18.

With that information obtained, I was able to run a query by language.

.get_one_language_count <- function(language){
 gh::gh("GET /search/repositories",
 q = glue::glue("adventofcode 2018&language:{language}"),
 fork = FALSE)$total_count -> count
 tibble::tibble(language = language,
 count = count)
}

get_one_language_count <- ratelimitr::limit_rate(.get_one_language_count,
 ratelimitr::rate(10, 60))

counts <- purrr::map_df(languages,
 get_one_language_count)

In total, the counts table contains information about 2080repositories, a bit less than half the number of Advent of code 2018repositories Iâ€™d find via the web interface.

Iâ€™ll concentrate on the 15 most popular languages in the sample, whichautomatically excludes R withâ€¦ 8 repositories only.

library("ggplot2")
library("ggalt")
library("hrbrthemes")
library("magrittr")

counts %>%
 dplyr::arrange(- count) %>%
 head(n = 15) %>%
 dplyr::mutate(language = reorder(language, count)) %>%
 ggplot() +
 geom_lollipop(aes(language, count),
 size = 2, col = "salmon") +
 hrbrthemes::theme_ipsum(base_size = 16,
 axis_title_size = 16) +
 coord_flip() +
 ggtitle("Advent of Code Languages",
 subtitle = "Among a sample of GitHub repositories, with language information from GitHub linguist")

The results are not surprising after reading e.g. the insights fromStack Overflowâ€™s 2018survey, although the Python domination is crazy! I find interesting to reflect on thefact that Jenny Bryan says that the challenge is best for C or C++, thatare not the most popular languages in these samplesâ€¦ but still morepopular than R, ok.

In this post I used GitHub V3 API to get a glimpse at the popularity of languages used to solve the Advent of Code. Further work could include looking at the completion of the challenge by language, potentially using the GitHub activity of each repo as an (imperfect) proxy.

I do not take part in the challenge myself, my principal Adventâ€™sspecific activity being instead the lazy and delightful watching of theSwedish TV channel SVT Adventskalender forkids.Incidentally, this yearâ€™sstorylineincludes a Christmas competition, which however features competitiveeating of saffron buns and gingerhouse building rather than programmingpuzzlesâ€¦ Do you participate to Advent of Code this year? If so, withwhich language and why?

Related