This dance, it’s like a weapon: Radiohead’s and Beck’s danceability, valence, popularity, and more from the LastFM and Spotify APIs

Giddy up, giddy it up

Wanna move into a fool’s gold room

With my pulse on the animal jewels

Of the rules that you choose to use to get loose

With the luminous moves

Bored of these limits, let me get, let me get it like

Wow!

When it comes to surreal lyrics and videos, I’m always thinking of Beck. Above, I cited the beginning of the song “Wow” from his latest album “Colors” which has received rather mixed reviews. In this post, I want to show you what I have done with Spotify’s API. We will also see how “Colors” compares to all the other albums by Beck on a range of the API’s measures. This is what we gonna do:

  • Set up the R packages to access the Spotify and Last.FM APIs
  • Get chart information on a specific user (in this case: me) from Last.FM
  • Get information on these top artists from Spotify
  • Visualize some of the information with ggplot2
  • Get information on all the songs by Beck and Radiohead
  • Again: Visualize this information

First, let’s set things up. If you want to download the RLastFM package (it’s not on CRAN), you can follow these instruction. If you need an account for the Spotify API, you can get one on this page. Get an account for the Last.FM API with this form.

library(spotifyr)library(RLastFM)library(dplyr)# Spotifys.id <- “< API ID >”s.sc <- “< API Secret”# LastFMl.ky <- “< API Key >”l.ap <- “http://ws.audioscrobbler.com/2.0/”Sys.setenv(SPOTIFY_CLIENT_ID = s.id)Sys.setenv(SPOTIFY_CLIENT_SECRET = s.sc)access_token <- get_spotify_access_token()I slightly adapted the user.getTopArtists() function in the RLastFM package because I wanted to use the “limit” parameter of the Last.FM API.user.getTopArt2 <- function (username, period = NA, key = l.ky, parse = TRUE, limit = 20) {  params = list(method = “user.gettopartists”, user = username,                 period = period, api_key = key, limit = limit)  params = params[!as.logical(lapply(params, is.na))]  ret = getForm(l.ap, .params = params)  doc = xmlParse(ret, asText = TRUE)  if (parse)     doc = RLastFM:::p.user.gettopartists(doc)  return(doc) }

Also, I adapted the function get_artist_audio_features() from the spotifyr package to override the dialog where I am required to type the name of the artist in interactive sessions.

get.art.audio.feats <- function (artist_name, access_token = get_spotify_access_token()) {  artists <- get_artists(artist_name)  selected_artist <- artists$artist_name[1]  artist_uri <- artists$artist_uri[artists$artist_name ==                                      selected_artist]  albums <- get_albums(artist_uri)  if (nrow(albums) > 0) {    albums <- select(albums, -c(base_album_name, base_album,                                 num_albums, num_base_albums, album_rank))  }  else {    stop(paste0(“Cannot find any albums for \””, selected_artist,                 “\” on Spotify”))  }  album_popularity <- get_album_popularity(albums)  tracks <- get_album_tracks(albums)  track_features <- get_track_audio_features(tracks)  track_popularity <- get_track_popularity(tracks)  tots <- albums %>% left_join(album_popularity, by = “album_uri”) %>%     left_join(tracks, by = “album_name”) %>% left_join(track_features,                                                        by = “track_uri”) %>% left_join(track_popularity, by = “track_uri”)  return(tots) }

Now, we getting the relevant data. First, we are accessing the current top 20 artists (of all time) of a specific Last.FM user.

top.arts <- as.data.frame(user.getTopArt2(“< user name >”, period = “overall”, limit = 20))head(top.arts)                     artist playcount                                 mbid rank1                  Menomena      1601 ad386705-fb8c-40ec-94d7-e690e079e979    12                      Beck      1594 a8baaa41-50f1-4f63-979e-717c14979dfb    23                 Radiohead      1556 a74b1b7f-71a5-4011-9441-d0b5e4122711    34                 PJ Harvey      1546 e795e03d-b5d5-4a5f-834d-162cfb308a2c    45                    Prince      1501 cdc0fff7-54cf-4052-a283-319b648670fd    56 Nick Cave & The Bad Seeds      1243 172e1f1a-504d-4488-b053-6344ba63e6d0    6Now, we are accessing the “artist audio features” for each one of the top 20 artists. I’m including a Sys.sleep(5) here, because I don’t want to access Spotify’s API too frequently. With 5 seconds between each artist, we should be on the safe side – I guess 2 seconds would be enough.

Please note that get.art.audio.feats() returns a tibble that contains one row for each of the artist’s tracks. That is why I have to aggregate the data by artist. Hence, the “audio features” of an artist are the mean values of all the tracks by this artist.

art.info.list <- list()for (i in 1:nrow(top.arts)) {  cat(i, “\n”)  new.art.info <- get.art.audio.feats(top.arts[i, “artist”])  new.art.info$artist <- top.arts[i, “artist”]  art.info.list[[length(art.info.list)+1]] <- new.art.info  Sys.sleep(5)}art.info <- do.call(“rbind”, art.info.list)agg.art.info <- aggregate(cbind(danceability, energy, speechiness,                                acousticness, instrumentalness, liveness,                                valence, tempo, track_popularity) ~ artist,                          data = art.info, FUN = mean)Now, let’s merge Last.FM’s playcount information with the information we got from Spotify’s API. We are also sorting the dataframe by my playcount on Last.FM.agg.art.info <- merge(agg.art.info, top.arts[,c(“playcount”, “rank”, “artist”)], by = “artist”, all.x = T, all.y = F)agg.art.info <- agg.art.info[order(agg.art.info$playcount, decreasing = T),]saveRDS(agg.art.info, file = “agg.art.info.Rds”)What we get is this:                      artist danceability    energy speechiness acousticness instrumentalness  liveness7                   Menomena    0.5397246 0.5475217  0.05414783    0.2678826       0.28483706 0.14512463                       Beck    0.6043865 0.6106874  0.06503006    0.2653964       0.20943021 0.187254013                 Radiohead    0.4095833 0.5755077  0.05504551    0.3131903       0.39465664 0.202622411                 PJ Harvey    0.5155636 0.4578315  0.09909212    0.4328224       0.13406858 0.173983612                    Prince    0.7078350 0.7295825  0.04883786    0.2464166       0.04265596 0.14054958  Nick Cave & The Bad Seeds    0.4147489 0.5152961  0.05414545    0.3465942       0.07602337 0.2952866     valence    tempo track_popularity playcount rank7  0.3084768 131.4496          5.84058      1601    13  0.4795632 115.6165         21.64417      1594    213 0.2858109 118.7648         47.62821      1556    311 0.4331848 118.7181         21.32727      1546    412 0.7672718 126.3280         25.81553      1501    58  0.3521593 119.5002         27.47619      1243    6Just a few quick notes concerning the audio features given by the Spotify API (I’m citing Spotify’s webpage here):

  • danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

  • energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

  • speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

  • acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

  • instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

  • liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

  • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

  • tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

  • track_popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. 

I am only using a few of these features for visualizations here, namely danceability, valence, and track_popularity. Let’s start with a scatter plot. ggplot2 and the really great package “ggrepel” already do a good job here.library(ggplot2)library(ggrepel)library(tidyr)library(scales)library(RColorBrewer)data <- readRDS(“agg.art.info.Rds”)

ggplot(data, aes(x = danceability, y = playcount, size = track_popularity, col = valence)) +

  geom_point() +

  geom_label_repel(aes(label = artist)) +

  labs(x = “Danceability”, y = “Playcount”, size = “Mean popularity”,

       col = “Valence”, title = “Popularity, Valence, and Danceability”,

       subtitle = “TOP 20 Last.FM artists – with track information from Spotify”) +

  scale_color_continuous(low = “darkred”, high = “green”) +

  scale_size(range = c(2, 6)) +

  theme_minimal()

(please click to enlarge)

It’s quite a lot information in this plot. Prince – along with the not very popular Nikka Costa – is definitely the most danceable and most positive (in terms of emotional valence) in my top artists. There is no clear relationship between danceability and my playcount. Also, “Menomena”, my top artist, is really not popular on Spotify – I guess you can call that a hipster factor.

Now, I want to have a semantic profile plot for my top 5 artists. It’s a little bit of coding work before we get this.for (var in c(“tempo”, “track_popularity”, “playcount”)) {  data[, paste0(“norm.”, var)] <- rescale(data[, var], to = c(0, 1))}data.long <- gather(data, key = variable, value = value, c(“danceability”, “energy”,                    “speechiness”, “acousticness”, “instrumentalness”, “liveness”, “valence”))substr(data.long$variable, 1, 1) <- toupper(substr(data.long$variable, 1, 1))data.long$artist <- factor(data.long$artist, levels = data$artist)ggplot(data.long[data.long$rank %in% 1:5,], aes(y = value, x = variable, group = artist, col = artist)) +  geom_line(size = 2, lineend = “round”, alpha = .8) +  scale_color_manual(values = brewer.pal(5, “Spectral”)) +  scale_y_continuous(breaks = NULL) +  labs(x = “”, y = “”, col = “Artist”) +  coord_flip() +  theme_dark() + theme(axis.text.y = element_text(size = 15),

                       panel.grid.major.y = element_line(size = 1))

(please click to enlarge)

It’s still not optimal, but we get a good clue of the different profiles of the artists. We can read this plot horizontally to get a ranking of artists for one feature. But we can also read it vertically to get a clue of the overall distribution of features.

Now, let’s move on to a more detailed analysis of Beck and Radiohead. First, I am analyzing all the albums by Beck with their year of release on the x-axis, their popularity on the y-axis, danceability color-coded and valence coded by size.

beck <- as.data.frame(get.art.audio.feats(“Beck”))

beck.alb <- aggregate(cbind(danceability, energy, speechiness,

                            acousticness, instrumentalness, liveness,

                            valence, tempo, track_popularity) ~ album_name,

                      data = beck, FUN = mean)

beck.alb <- merge(beck.alb, unique(beck[, c(“album_name”, “album_release_year”, “album_popularity”)]), by = “album_name”, all.x = T, all.y = F)

beck.alb$album_release_year <- substr(beck.alb$album_release_year, 1, 4)

beck.alb$n.tracks <- sapply(beck.alb$album_name, USE.NAMES = F, FUN = function (x) table(beck$album_name)[x])

beck.alb[beck.alb$album_name == “One Foot In The Grave (Expanded Edition)”, “album_name”] <- “One Foot In The Grave”

beck.alb <- beck.alb[order(beck.alb$album_release_year),]

ggplot(beck.alb, aes(x = album_release_year, y = album_popularity,

                     label = album_name, size = valence,

                     color = danceability)) +

  geom_point() +

  geom_label_repel() +

  scale_color_continuous(low = “darkred”, high = “green”) +

  scale_size(range = c(3, 10)) +

  labs(x = “Release year”, size = “Valence”, y = “Album popularity”, color = “Danceability”,

       title = “Albums by Beck”, subtitle = “Values taken from Spotify API”) +

  theme_minimal()

(please click to enlarge)

Another thing I wanted to show you is some kind of “album profile” or – more precisely – an answer to the question whether there are tracks on albums that are very salient in terms of track popularity. We can easily do this with a bit of transformation and facet wrapping. I am coding track popularity with both the height of the bar and the color of the bar to get an impression of the popularities of the tracks compared to the overall dataset (because the axes are allowed to very freely for the facets).

beck %>% group_by(album_name) %>%

  mutate(track.no = 1:length(track_name)) %>%

  ungroup() %>% as.data.frame() -> beck

ggplot(beck, aes(x = factor(track.no), y = track_popularity, fill = track_popularity)) +

  geom_bar(stat = “identity”) +

  labs(x = “Track”, y = “Track popularity”, fill = “Track popularity”,

       title = “Popularity of tracks on albums by Beck”,

       subtitle = “Values taken from Spotify API”) +

  facet_wrap(~ album_name, scales = c(“free”))

(please click to enlarge)

See the rather high first bar for “Mellow Gold”? That’s “Loser”, still the most popular of Beck’s tracks with the rest of the tracks on “Mellow Gold” rather in “Guero” territory in terms of popularity. Also, “Colors” – Beck’s latest album – is rather balanced in terms of popularity.

And the same for Radiohead – only that I am excluding two albums (see in the code below) and am using the tempo feature for the coloring.

radiohead <- as.data.frame(get.art.audio.feats(“Radiohead”))

radiohead <- radiohead[!(radiohead$album_name %in% c(“OK Computer OKNOTOK 1997 2017”, “TKOL RMX 1234567”)),]

radiohead.alb <- aggregate(cbind(danceability, energy, speechiness,

                            acousticness, instrumentalness, liveness,

                            valence, tempo, track_popularity) ~ album_name,

                      data = radiohead, FUN = mean)

radiohead.alb <- merge(radiohead.alb, unique(radiohead[, c(“album_name”, “album_release_year”, “album_popularity”)]), by = “album_name”, all.x = T, all.y = F)

radiohead.alb$album_release_year <- substr(radiohead.alb$album_release_year, 1, 4)

radiohead.alb$n.tracks <- sapply(radiohead.alb$album_name, USE.NAMES = F, FUN = function (x) table(radiohead$album_name)[x])

radiohead.alb <- radiohead.alb[order(radiohead.alb$album_release_year),]

ggplot(radiohead.alb, aes(x = album_release_year, y = album_popularity,

                     label = album_name, size = valence,

                     color = tempo)) +

  geom_point() +

  geom_label_repel() +

  scale_color_continuous(low = “darkred”, high = “green”) +

  scale_size(range = c(3, 10)) +

  labs(x = “Release year”, size = “Valence”, y = “Album popularity”, color = “Tempo”,

       title = “Albums by Radiohead”, subtitle = “Values taken from Spotify API”) +

  theme_minimal()

It’s a shame how Disk 2 of “In Rainbows” compares to Disk 1 because it contains really great songs. Also, it’s interesting that “OK Computer” still not reaches the popularity of “Pablo Honey” – as the second plot suggests, this is because of the popularity of “Creep”. For the second plot, I am not double-coding track popularity with color but am using the valence feature for the color.

radiohead %>% group_by(album_name) %>%

  mutate(track.no = 1:length(track_name)) %>%

  ungroup() %>% as.data.frame() -> radiohead

ggplot(radiohead, aes(x = factor(track.no), y = track_popularity, fill = valence)) +

  geom_bar(stat = “identity”) +

  labs(x = “Track”, y = “Track popularity”, fill = “Valence”,

       title = “Popularity and valence of tracks on albums by Radiohead”,

       subtitle = “Values taken from Spotify API”) +

  facet_wrap(~ album_name, scales = c(“free”))

Here we see that this valence feature certainly has some problems. “Fitter Happier” is one of the most positive songs by Radiohead with a value of 0.74??? I beg to differ…

Related