Can You Read My Mind? Analyzing The Killers’ Discography with NLP

The Killers has released five full-length albums since 2004, the first one being the euro-influenced Hot Fuss and the last, Wonderful Wonderful, more of heartland rock. I’ve been a huge fan of The Killers over the years, though not so much with the latest album. Nevertheless, it just felt like a good opportunity to branch out Data Meets Media to music lyric analysis.

For this article, we aim to do two things:

  1. Word frequency analysis per album

  2. Lyrics embedding analysis

Word Frequency Analysis

We check which words appear most frequently in the songs of each album. We do not account for repeat lyrics in the same song, i.e. multiple occurrence of the word ‘baby’ in a song is only counted once. The word frequency of x in this case refers to the number of songs that the word x appeared in. We pass all the words through a stop word filter (i.e. NLTK’s english filter) and we only consider nouns in our analysis.

Below is a visualization of the top 10 words per album. The size in this case is proportional to the frequency of the word in the album’s songs.

Interestingly, the top words in Hot Fuss and Day and Age are most similar, as well as Sam’s Town and Battle Born. Personally, I feel like Hot Fuss and Day and Age are quite similar since they both contain dance-y tunes, whereas Sam’s Town and Battle Born steer more towards rock. The first pair talks more about time, mind, tonight, while the second pair talks about heart and the world. Wonderful Wonderful focuses more on man and money, quite different from the previous records.

Lyrics embedding analysis

Next, let’s look at the semantic similarity of the songs and albums. For this, we’ll be using the 300-dimensional slim Word2vec model.

For those of you who aren’t familiar with Word2vec, it is a type of “embedding” model whose goal is to transform words into N-dimensional dense vectors. This representation has been found to retain the semantics/ associations of the words, and so operations on these vectors retain analogies. For example, adding the vectors of “king” and “woman” and subtracting the vector of “man” would get you “queen”.

Word2vec takes as input a word, but what we actually want is a “song” embedding which converts a song into a dense vector. In this case, we will simply average the vectors of each word in a song to get a “song” vector. Personally, I find this method a bit crude, since there is a massive loss of information involved, but I haven’t really found a better way of going from word to song. There is an embedding called Doc2vec which transforms documents to vectors, but taking into consideration our small sample set and the fact that we have to train our own Doc2vec model, I think averaging the word vectors is our current best bet.

OK, given “song” vectors, which are 300-dimensional, we have to find a way to project them in a 2-dimensional space for our viewing pleasure. For this, we will be using a nonlinear transform called t-SNE.

Below is a t-SNE visualization of the songs and albums. For the album vector, I basically took a mean over the song vectors in the album. Since they’re averaged-out vectors, they appear in the center.

How do you read the graph? Songs in the center can be thought of as “usual” in the sense of semantic content. Songs distant from the center have “unusual” semantic content. Songs that are close to one another are lyrically semantically similar.

Thus, we see here that Sam’s Town’s Enterlude and Exitlude are close and far from the center, Somebody Told Me and Jenny Was a Friend of Mine close together, Heart of a Girl and Runaways, etc.

We also see here how Battle Born songs tend to be close to one another, while songs from Wonderful Wonderful are all over the place. Personally, I think that Battle Born was a conceptually cohesive album, while Wonderful Wonderful was not.

All in all, I think that the analysis is still a bit raw. One big hurdle is going from word vector to song vector, since the current method throws out a lot of information. Nevertheless, this is a good first effort, and a Part II would certainly expound on a different song representation.

For the same analysis but done on Lana Del Rey lyrics, check out my companion piece.