Pointillism and George Seurat: such a simplistic idea, delivered in a precise manner, led to amazing beauty and paradox. Up close, there is nothing more to observe than colorful dots on a canvas, but step back and the beauty is breathtaking.
A Lexical Diversity and Density Analysis can be similar, it takes very simple ideas and creates some unique visual observations.
The first step is to look at the Lexical Diversity of U2 songs. Below are examples of songs with the lowest Lexical Diversity. The song “Scarlet” only has one lyric repeated throughout the song.
Plotting all the songs on a graph helps identify trends – if they exist. Do the songs of U2 get more diverse in their lexicon over time?
Again, this is simply the number of unique words for any song.
To create a continuous feel for the data, I ordered every song by album and track order, “I Will Follow” has an Order of 1, and “13(There is a Light)” has an Order of 162. Each album is represented by its own color to help identify songs. The two yellow dots represent “Silver and Gold (Live)” with 182 unique words, and “Bullet the Blue Sky (Live)” with 170 unique words. (Note, there are only 161 songs represented; “4th of July”, off of The Unforgettable Fire, is an instrumental.)
Taking these raw counts of unique words found in each song and dividing by the total number of words in the song will give us the Lexical Density of each song.
For example, the song “October” has a total of 28 words. From above, there are 22 unique words, so the Lexical Density is 22/28 or 0.7857.
Although it has one of the lowest Lexical Diversity outcomes, “October” has the highest Lexical Density number.
Creating a similar graph for Lexical Density, we can see that although the songs are getting more wordy (as seen in the Lexical Diversity graph above), the overall average density trends flat.
The album October represented by darker pink dots, had both the highest density song, “October” and the lowest, “Scarlet”.
Putting this information together across albums provides some interesting insights.
There is a lot going on with this graph, so let’s take it step by step.
The albums are in sequential order on the x-axis. The Lexical Density is plotted on the y-axis, again, this is simply the total number of unique words on an album divided by the total words on an album.
The size of the circle represents the frequency of unique words. October’s 11 songs only use 254 unique words across a total of 1,329 words for the album. Songs of Innocence, has a similar Lexical Density score as the album October so it is plotted on the same horizontal line. But SOI has 563 unique words, and represented by a much larger circle.
The large pinkish circle represents Pop which has the most unique words at 744. A close second is Rattle and Hum at 700 words. These two circles are similar in size and also share a similar color; the size and color of a circle represents similar Lexical Diversity characteristics. Similarly, the two blue tinted circles for Achtung Baby and All That You Can’t Leave Behind have roughly the same number of unique words, 640 and 627 respectively.
Although Songs of Experience has the second highest number of words used on an album, it has the lowest Lexicon Density score. In short, SOE appears to be the most repetitive album based on words.
Thank you Art Institute of Chicago!
U2 Love and Logic