Can Twitter predict future music charts?

This week, I‘ll be presenting our paper on „Can Microblogs Predict Music Charts? An Analysis of the Relationship Between #Nowplaying Tweets and Music Charts“ at the 17th International Society for Music Information Retrieval Conference (ISMIR) in New York.

In principle, we looked at to which extent Twitter is a useful source for predicting music charts (particularly, the Billboard Hot 100) and focused on the following research questions (RQ):

– RQ1: To which extent do #nowplaying-tweets resemble the Billboard Hot 100?
– RQ2: How are #nowplaying tweets and the Billboard Hot 100 temporally related?
– RQ3: How can Twitter data be exploited for predicting music charts?

For our analysis, we crawled the Billboard Hot 100 of 2014 and 2015 (886 tracks) and also used #nowplaying tweets having been sent in the same timeframe. #nowplaying tweets are tweets where users tweet about the music they are currently listening to as shown in the following example: “#nowplaying The Beatles – Hey Jude”.

We extracted roughly 111 million #nowplaying tweets from our #nowplaying dataset and computed the overlapping tracks occurring both in the tweets and on the charts to serve as the dataset underlying our analyses.

As for RQ1, we computed the correlation between the number of #nowplaying tweets about a given track with the chart rank of this particular track for any given week and any given track within our dataset. Testing different aggregation methods for the tweets as e.g. taking the mean or median or sum of the #nowplaying tweets per day to represent a weekly Twitter-based score for each track, we only find a mild correlation between #nowplaying tweets and charts data.

For RQ2 and RQ3, we model our data as time series. In a first experiment, we look into whether Twitter data would allow for a prediction of future charts from a timely perspective. I.e., whether trends appear earlier on Twitter than this is reflected within the charts. Therefore, we perform a cross-correlation analysis for the Twitter and charts time series for each track. This analysis showed that for 41% of all tracks, there is a negative lag and hence, these tracks appear and possibly trend earlier on Twitter. However, for 46% of all tracks, we observe a positive lag and for 11% we do not observe any lag.

lags
Cross Correlation Result: Lags (in Week)

For those 42% of all tracks, we propose three different prediction models for the future charts:

– Autoregression based on Billboard time series (BB)
– Extract the lag from cross-correlation analysis, shift base and and compute autoregression based on the shifted difference between Twitter and Billboard (T) .
– Multivariate model based on Twitter and Billboard time series (V).

Evaluating these models by computing the root mean squared errors of predictions (RMSE) shows that the multivariate model works best and hence, Twitter information can be useful for predicting charts. Also, relying solely on Twitter leads to substantially higher results in regards to RMSE as when relying on charts data or a multivariate model containing Twitter and charts data. The RMSE distribution can be seen in the following boxplot (we do not show the T model here as the results

Boxplot of RMSE of Billboard-based model (BB) and the multivariate model (V).
Boxplot of RMSE of Billboard-based model (BB) and the multivariate model (V).

Our evaluations showed that the multivariate model not only significantly decreases the RMSE in comparison to the other models (p < 0.05; Mann-Whitney), it also significantly decreases the variance (p < 0.05; Levene).

For us, the major takeaways from these analyses are:

– From a temporal perspective, there is a positive lag for 48% of all tracks. 11% do not feature a lag.
– 41% of all tracks feature negative lag and would hence allow for a prediction.
– The multivariate model based on Twitter and Billboard charts data significantly reduces the RMSE (p < 0.05; Mann-Whitney) when compared to the Billboard-based model.
– Variance of multivariate model shows significantly lower variance of RMSE than the Billboard-based model (p < 0.05; Levene).
– Mild correlation for track playcounts on Twitter and Billboard charts (p < 0.01; Pearson).

 

  • E. Zangerle, M. Pichl, B. Hupfauf, and G. Specht, “Can microblogs predict music charts? an analysis of the relationship between #nowplaying tweets and music charts,” in Proceedings of the 17th international society for music information retrieval conference 2016 (ismir 2016), 2016.
    [BibTeX] [Download PDF]
    @inproceedings{ismir16,
    title = {Can Microblogs Predict Music Charts? An Analysis of the Relationship between #Nowplaying Tweets and Music Charts},
    author = {Eva Zangerle and Martin Pichl and Benedikt Hupfauf and G\"{u}nther Specht},
    url = {https://www.evazangerle.at/wp-content/uploads/2017/06/ismir16.pdf
    https://wp.nyu.edu/ismir2016/event/proceedings/},
    year = {2016},
    booktitle = {Proceedings of the 17th International Society for Music Information Retrieval Conference 2016 (ISMIR 2016)},
    publisher = {ISMIR},
    abstract = {Twitter is one of the leading social media platforms, where hundreds of millions of tweets cover a wide range of topics, including the music a user is listening to. Such #nowplaying tweets may serve as an indicator for future charts, however, this has not been thoroughly studied yet. Therefore, we investigate to which extent such tweets correlate with the Billboard Hot 100 charts and whether they allow for music charts prediction. The analysis is based on #nowplaying tweets and the Billboard charts of the years 2014 and 2015. We analyze three different aspects in regards to the time series representing #nowplaying tweets and the Billboard charts: (i) the correlation of Twitter and the Billboard charts, (ii) the temporal relation between those two and (iii) the prediction performance in regards to charts positions of tracks. We find that while there is a mild correlation between tweets and the charts, there is a temporal lag between these two time series for 90% of all tracks. As for the predictive power of Twitter, we find that incorporating Twitter information in a multivariate model results in a significant decrease of both the mean RMSE as well as the variance of rank predictions.},
    }

thumbnail of 039_Paper
Paper
thumbnail of final_poster
Poster