Introduction

Taylor Swift is an American singer-songwriter who has an impact on pop culture and the music industry. Taylor is one of the best-selling musicians with over 200 million records sold worldwide. She is also the most-played artist on Spotify and Apple Music. As of 2023, Taylor has released 10 albums, her style has changed a lot with time as well. She was well-known for her country pop songs, and after that, she explored hip-hop elements. In 2020, she hugged indie folk and alternative rock.

Taylor Swift is undoubtedly successful. People may be curious about: “How popular is Taylor Swift outside of music streaming software?”, “What makes her so popular?”, “Will the fans be different between different websites or different applications?”. By addressing these questions, we can learn the composition of fans, the topics that fans like to discuss most, etc. These analysis outcomes can be utilized to identify the trending topics related to a celebrity, to enhance the popularity; or to change the personal style to cater to more people. All in all, it can generate huge profits in the future.

Figure 1: Taylor’s Album 1989

Our group started the project around these questions using Reddit data. Reddit is an American discussion website that covers a wide range of topics. Reddit users create discussion communities with areas of interest under Reddit, these are called “subreddits”. The registered users (also called “Redditors”) can publish posts with photos, videos, links, and text under a subreddit. Once the post is published, comments can be made by more users. Reddit also has a voting system where Redditors can upvote/downvote posts or comments. In this project, we chose Taylor Swift and music-related subreddits for analysis.

Analysis was divided into three sections. First, we conducted a preliminary exploration of Reddit data based on figures and tables, to address problems like the number of mentions over time in Reddit and the active users in the subreddits. Next, focused on the text posted by Redditors, we studied the text associated with Taylor Swift, the topics linked to Taylor Swift, and the emotions inside the texts. Finally, we build models and make analyses of the posts, to see what the fans are more willing to view on Reddit.

Figure 2: Reddit

Analytical Questions

  1. Geographic Trends in Taylor Swift Subreddit
  • Business Goal:
    • Analyzing the geographical trends within the Taylor Swift subreddit aims to gain insights into the composition of the fan base, the level of fan engagement, and Taylor’s popularity in various countries. Providing timely and effective information to active fans is intended to enhance the subreddit’s popularity, thereby expanding the fan community and international influence.
  • Technical Proposal:
    • Utilizing regular expressions to extract countries mentioned from both submissions and comments text data. This provides insights into the proportion of international fans within Taylor Swift subreddit, allowing for a detailed analysis of the activity levels of fans from different countries. On the other hand, employing the Python langdetect package for language analysis, combined with parameters such as scores and text length, offers a comprehensive understanding of the browsing behavior of non-English-speaking fans on the forum.
  1. Taylor Swift Popularity Over Time on Reddit
  • Business Goal:
    • Understand how Taylor Swift’s popularity and public perception have changed over the three years on Reddit.
  • Technical Proposal:
    • Visualize the number of mentions of Taylor Swift in both submissions and comments over the selected time period. Provide a daily plot and a monthly trend plot. Introduce the external Taylor Swift Spotify dataset to explain the possible changes in trends associated with the release of albums over time.
  1. Fan Engagement and Community Analysis
  • Business Goal:
    • Explore the level of engagement and community involvement in discussions related to Taylor Swift.
  • Technical Proposal:
    • Calculate engagement metrics such as the number of comments, upvotes, and downvotes on submissions related to Taylor Swift. Identify the subreddits with the most activity and engagement. Analyze the language and content of comments to understand the nature of fan engagement and community sentiment.
  1. Reddit User Sentiment and Taylor Swift Album Popularity
  • Business Goal:
    • Sentiment analysis provides valuable insights into the public perception of Taylor Swift and the feedback on albums, significantly contributing to shaping the artist’s development trajectory and the stylistic direction of albums.
  • Technical Proposal:
    • Apply a pre-trained model for sentiment analysis on text data in both Taylor Swift subreddit and music-related subreddits. Sentiment analysis will categorize emotions into two levels: positive and gegative. Merge this analysis with Taylor Swift Spotify data and investigate how the popularity of albums is associated with comments and posts related to those albums.
  1. Language and Word Usage in Taylor Swift Discussions
  • Business Goal:
    • Explore the language and specific words used most frequently in discussions about Taylor Swift.
  • Technical Proposal:
    • Apply NLP techniques to analyze the text of submissions and comments mentioning Taylor Swift. Use word frequency analysis and word cloud visualizations to identify the most commonly used words and phrases.
  1. Popular Topics Associated with Taylor Swift
  • Business Goal:
    • Identify the most frequently discussed topics related to Taylor Swift on Reddit during this period.
  • Technical Proposal:
    • Utilize NLP techniques to extract key themes and topics from submissions and comments mentioning Taylor Swift. Use topic modeling algorithms like Latent Dirichlet Allocation (LDA) to categorize these texts into distinct topics. Analyze the distribution of these topics to understand which are most prevalent.
  1. Predicting Post Popularity Using Machine Learning Models
  • Business Goal:
    • Popularity analysis aims to provide insights into the characteristics of popular posts. Accurately predicting and promoting popular posts can enhance user engagement on Reddit, increase submissions viewing, and further amplify the influence of Reddit in conjunction with Taylor Swift.
  • Technical Proposal:
    • Consider variables such as sentiment, text content, posting time, number of comments, originality, and cross-posting as potential features for predicting submission scores. Apply data on various supervised regression models and conduct fine-tuning on hyperparameters for optimal performance. The model accuracy will be evaluated on a test dataset, and the model with the lowest RMSE score will be identified as the best machine learning model for predicting popularity.
  1. Trending Topics and Viral Posts
  • Business Goal:
    • Identify and understand the characteristics of Taylor Swift-related posts that become trending or go viral on Reddit.
  • Technical Proposal:
    • Define metrics for trending or viral posts (e.g., rapid increase in engagement, high upvote ratios). Use machine learning classification to predict which Taylor Swift-related posts will trend or go viral based on their content and initial engagement metrics.
  1. Analysis of Taylor Swift’s Fan Community Growth
  • Business Goal:
    • Evaluate the growth and activity levels of Taylor Swift fan communities on Reddit over the three-month period.
  • Technical Proposal:
    • Identify subreddits dedicated to Taylor Swift or her work. Analyze the change in subscriber counts, post volumes, and engagement levels over time. Investigate any factors that might contribute to growth or declines in activity.
  1. Cross-Platform Comparison
  • Business Goal:
    • Compare discussions about Taylor Swift on Reddit with discussions on other social media platforms.
  • Technical Proposal:
    • Use additional datasets from other social media platforms (Kaggle, Spotify, etc.) to extract discussions about Taylor Swift from these sources. Compare the volume, sentiment, and themes of discussions across platforms. Analyze any unique patterns or discrepancies between platforms.

Data Source

  1. Reddit Archive Dataset
  • This dataset is derived from the publily available Reddit API that contains a variety of topics and statistics.
  • Link
  1. Taylor Swift Spotify Dataset
  • This dataset consist of data from Spotify’s API on all albums listed on Spotify for Taylor Swift that includes Taylor Swift albums, release date, acousticness, danceability, energy, and popularity.
  • Link