BERT- Sentiment Analysis On Youtube Comments

Pewdiepie has become the biggest English speaking channel on Youtube with currently more than 95 million subscribers. With this huge amount of audience, it is interesting to see how the audience reacts to his videos.

In our paper, we analyse the comments of 50 recent videos of Pewdiepie and we evaluate the polarity and toxicity leveraging libraries like TextBlob and the pre-trained model BERT.

Repository at GitHub

Datasets

  • 200,000 top-threaded comments scraped from Pewdiepie videos with Youtube API (4,000 per video, 50 videos)
  • Jigsaw Unintended Bias in Toxicity Classification dataset that contains sentences with a score of toxicity with multiple labels like (”toxic”, ”severe toxic”, ”obscene”, ”threat”, ”insult”, ”identity hate”) indicating the type of toxicity. From this dataset, we used the first 90,000 entries of the training set to fine-tune BERT

Experiment

In order to analyse the comments, we used the TextBlob and Pattern libraries to score the sentiment polarity per comment and then averaged them per video being our general polarity score for such video. Then we fine-tuned BERT using Pytorch and the JUBTC dataset in Google Colab to score the videos toxicity (Our tuned model yielded a .908 accuracy value in the validation set).

Results

Polarity results

Categorized videos with the top 5 highest and least polarity (Higher is more positive):

RankVideo IDCategoryPattern PolarityTextBlob Polarity
1qPnTTA8BC8ABook review0.49330.4956
2C2fRC55rA8wTravel vlog0.32670.3277
3PGbAWTqUuxQHameplay0.31850.3218
4QNLARCvIAToTravel vlog0.29990.3009
5OEUsKLW1th4Gameplay0.26400.2656
46WOSC6uGtBFwMeme review0.09350.0964
47rdaQsl9jqmwGameplay0.09010.0899
48wFxCAWqvmBEMeme review0.06280.0635
49zYZ1Fd7iH90Cringe Tue.0.05810.0587
50DCkydkdhL8MMeme review0.04220.0448

Toxicity results

Categorized top 5 toxic videos and least 5 toxic videos (Higher is more toxic):

RankVideo IDCategoryToxicity
1JLREgYXXdB8Cringe Tue.0.2964
2eHYkTUmsJlYPew news0.1592
3JxAUHg8AguACringe Tue.0.1536
44QnLRnKwFM0Pew news0.1501
53m4mF9-7L-YPew news0.1368
46rc1VR54nHV0Collab.0.0612
47OEUsKLW1th4Gameplay0.0604
48wFxCAWqvmBEMeme re.0.0522
49C2fRC55rA8wTravel vlog0.0498
50qPnTTA8BC8ABook re.0.0482

Analysis

  • Comments are biased, for example, the gameplay of ”Happy Wheel” is the 3rd most positive video while the gameplay of ”The Walking Dead” is in 49th place. The word 'happy' occurred a lot more times since it’s part of the game name which increases the polarity score while the opposite happens with the word 'dead'. Rank 'Happy' frequency 'Dead' frequency Polarity ( Pattern) 3 438 130 0.3185 49 63 173 0.0581
  • From the full results, we found that book review videos are more positive than other categories and also travel vlogs tend to have higher polarity while meme reviews tend to have lower polarity. The polarity of a gameplay video can differ drastically based on the game.
  • Pew News and Cringe Tuesday categories remained in the most toxic videos, there could be multiple explanations to this, one of that we found in our results is that the model is biased and categorize wrongly certain sentences. For example, the most toxic video "I broke my ass" contains misclassified comments like "I love you and your broken ass" with a toxic score of 0.9687.
  • While toxicity and polarity are two different attributes we found that 5 of the top 10 positive videos are also in the top 10 least toxic videos. Furthermore, 4 of the most negative videos are in the top 10 most toxic videos. The difference in the top10 list can be mainly explained due to the bias and the different focus of the algorithms where the polarity of a comment can be low if is sad while it could remain as not toxic.

Conclusion

Based on the results, we can conclude that generally the comments of Pewdiepie’s videos are more positive than negative, and in 80% of the sample videos, less than 10% of the comments are toxic(Table 7). We also found out that the sentiment polarity and toxicity somewhat correlate in the top 10% percentile. Finally, after analysing the results we discovered that the models weren’t unbiased and further research is recommended.

Resources

Bert fined tunned weights (>400mbs)

Rodrigo Alejandro Chávez Mulsa
Rodrigo Alejandro Chávez Mulsa
Machine Learning Engineer

My research interests include computer vision, natural language processing and information retrieval.