Spotify’s popularity data held the promise of greatness, but in the end, the data available to the average user held few meaningful insights.
When a user pulls Spotify data, they are pulling data calculated at that particular point in time. Imagine if a spaceship landed on Earth and took a photograph of New York City. It would seem like there would be much to study – the people, the buildings, the parks, the movement, etc. But, how did the city evolve? From that one photo, it would be hard to see where all the people came from and what keeps them there. This is the problem with Spotify’s point in time database, an analyst cannot answer those questions of how did a song become popular? Slowly? Overnight? Or because there was nothing else available?
Although the goal is to always provide meaningful analysis, perhaps, I could use this data to build out my R skills. Ahhh, the journey.
Popularity – Albums
Spotify’s definition of Album Popularity:
The popularity of the album. The value will be between 0 and 100, with 100 being the most popular. The popularity is calculated from the popularity of the album’s individual tracks.
On the date that I pulled the data for album popularity, mid March 2019, “The Joshua Tree (Super Deluxe)” was the most popular U2 album. Is anyone surprised by this? Probably not. But, I would really like to know how long “Achtung Baby (Deluxe Edition)” has held the second spot as U2’s most popular album on Spotify. How long has it held that position?
Here’s hoping that Bono’s other alter egos, “The Fly” or “Mirror Ball Man”, make a comeback in 2022 for the 30th Anniversary of the Zoo TV Tour. (Argh – The only tour I have ever missed since 1984!)
I could come up with a long list of things I would like to know about album popularity, but unfortunately, Spotify just isn’t sharing.
But I am proud of that great looking graph…lot’s of good data! Album popularity is on the x-axis and all the album names are neatly arranged on the graph. The color of the point represents the decade of the album. And for some of you thinking the red dots seem off…I’ll go over the sloppy nature of Spotify’s database at the end of this blog in the Appendix.
Popularity – Tracks
Spotify’s definition of Album Popularity:
The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity. Note that the popularity value may lag actual popularity by a few days: the value is not updated in real time.
Okay, lots of black box calculations going on in that number.
Simply taking the top ten U2 songs (in the US) on Spotify as of mid-March 2019, we have the following graph:
Again, I am not sure that I am surprised by this list, but I would like to see how this changes over time. Inside Spotify’s proprietary algorithms, there is a time component. These are all decades old songs, so is there a different way to calculate the ‘Top Ten’? Is the time parameter here slightly different?
Another data fun fact, the popularity for tracks found on multiple albums are calculated separately. So we have a variety of popularity levels for New Year’s Day – across albums and mixes.
Again, I need more insight into how Spotify calculates their popularity value, and now I need to know how songs are offered to users, why is the Single Edit version the most popular. And how is it that the remastered version of “War” – the basic album – so under performing the original version?
Finally, I have seen blogs attempt to create a ’time-series’ using Spotify data. But because Spotify data is just a point in time, which is not a series at all, this is not possible. So what most Spotify data users have created are graphs that plot ‘album release date’ as a proxy for the time series. Below, I have plotted the popularity of the remastered tracks for all 14 U2 albums. The only real surprise is the popularity of Songs of Experience in comparison to The Joshua Tree or All That You Can’t Leave Behind. Because recency of play is a driver in popularity, SOE, is the most recent album and it’s popularity is driven up because of this. In other words, unlike Billboard Chart rankings, there is no way to compare the popularity of an album say 10 weeks after its release against other albums for that same time frame.
Popularity is so much more than a point in time.
U2 Love and Logic
Appendix – Spotify’s Data
There were some strange data anomalies throughout the Spotify database, but I will just cover one inconsistency that came up regularly – dating albums. There were 30 U2 albums in the US market. I did not pull compilations or singles for this analysis, so these albums represented original releases and reissues. However, the naming and dating of the albums left a little to be desired. And because of these issues, I suspect, the popularity of a song could be affected. I am not sure I understand why the remastered versions of the album October are so under performing the original masters. And I do not understand why Spotify is date stamping two of the remastered albums with the original album issue date.