What distinguishes one U2 album from another, from a lyrical perspective? If you were to search U2’s catalogue of albums, what search terms would best represent the uniqueness of that album?
TF-IDF, Term Frequency – Inverse Document Frequency, is a concept used by search engines to present the most relevant web pages to a user. The concept attempts to identify the most important words in a document, among a collection of documents.
In our case, we will look at each U2 album and identify the top 3 TF-IDF words given the entirety of all 14 U2 albums in our dataset.
For example, the TF-IDF analysis of the U2 album Boy finds the words ’stories’, ’tall’, and ’twilight’ are the most unique to that album when compared to all 14 albums.
The words ‘walk’, ‘know’ and ‘away’ show up more frequently in Boy than the words in the graph, but these words show up throughout U2’s catalogue. In fact, there are only 5 times where the words ‘walk’, ‘know’ and ‘away’ actually fail to show up on any of U2’s albums. Clearly, using ‘walk’, ‘know’ and ‘away’ to search for a specific U2 album would not be as helpful as ’stories’, ’tall’, and ’twilight’.
The TF-IDF analysis does a pretty good job of synthesizing the mood, feel, theme of U2 albums to just 3 or 4 words. If we were to play a game and I simply said, ’secret’, ’station’ and ‘zoo’, would you know the album I was talking about?
Songs of Experience, to me, is an amazing example of the power of search engine analytics. The album was written in a time of great nationalistic fervor across the globe, from Brexit to Trump to Duterte. Those three words can be emblazoned on a red hat, summing up the mood of the times: “Hey! my Flag is Best”.
U2 Love and Logic