Michael Lutz, Sanjana Gadaginmath, Natraj Vairavan, Sriram Srivatsan, Advisor: Phil Mui
1Aspiring Scholars Directed Research Program, 43505 Mission Blvd, Fremont, CA 94539
Given the reach of YouTube as a proliferator of contemporary news and ideas, it is important to understand how YouTube reflects (either magnifies, minimises, or preserves) pre-existing political bias. Moreover, it remains imperative to examine whether YouTube’s reflection of political bias disproportionately favors distinct political groups and whether it encourages the confirmation of biases. Previous research has examined user data and ideological homogeneity within social media groups (see introduction), however, the extent of YouTube’s reflection of its user-based political bias remains relatively unexplored. Principally, our research aims to provide a novel understanding of YouTube’s reflection of user base political bias within its search and video recommendation algorithms. Thus, we created two experiments to understand each of the aforementioned systems. Experiment 1 examines the relationship between videos’ political bias values, measured by applying an optimised BERT NLP regression model to video transcripts, respective of the order they are displayed in. The BERT model is a Transformer-based machine learning technique for natural language processing (NLP) developed by Google. By applying a pre-trained BERT model by The Bipartisan Press, we can quantify the bias in video transcripts. This experiment scraped the top 200 search results over the span of 30 politically charged terms with four distinct political profiles (left, right, combination, null). Experiment 2 examines the progression of bias when repeatedly clicking the “Up-Next” recommended video per each video cycle. Our second experiment examined 1200 “seed” videos from our first experiment with 200 videos per category (extreme-right, moderate-right, minimal-right, minimal-left, moderate-left, extreme-left), examining 10 subsequent click cycles per seed video. Our first experiment finds that YouTube disproportionately ranks left-leaning videos above right-leaning videos within the top 3 search results. Our second experiment finds that YouTube actually minimises the magnitude of bias within subsequent cycles of video recommendations. However, we also find that YouTube has a greater minimisation effect with right-leaning minimally biased seed videos than its left-leaning counterparts and a greater minimisation effect with left-leaning extremely biased videos than their right-leaning counterparts. Ultimately, our results provide a nuanced understanding of YouTube’s reflection of political bias and have vast potential implications.
The idea of free speech in the United States has fundamentally changed as a result of the rise of social media, especially YouTube, an online video sharing platform which garnered many visitors and supporters over the last couple of years. According to YouTube itself, they have over 2 billion users, which is equivalent to one-third of the internet. Now, one does not need to be a journalist or an academic on television to express their political beliefs — they simply just need a camera, WiFi, an internet connection, and an opinion. This democratisation of political reporting has created a newfound sense of political awareness among the public; on the other hand, it has given YouTube extensive influence over public opinion, and it is important that they use this power responsibly. Given that YouTube is a public forum and not a publisher, YouTube must caution against unjustifiable disfavoring against certain political opinions so that the right to freedom of expression is protected for all its video creators. However, in the past couple of years, YouTube has garnered backlash from both its creators as well as the public, as many believe that it is abandoning its original position as a public forum and acting more as a publisher. In fact, many say its algorithms and employees selectively choose which videos are recommended to certain people or, alternatively, which YouTube channels get suspended, shadow banned, and even have their videos demonetised under the pretense that the videos or channel did not follow the website’s community guidelines. For example, channels that post LGBTQ+ activism have had their videos demonetised from the platform, and other politically-inclined channels like PragerU reported having their videos removed. Because of YouTube’s vast impact on its user base, it is important to understand how YouTube reflects bias in its user base and understand metrics of fairness for doing so. We need to challenge the assumption made that all media is biased because of the crucial elections coming up and the constant need to know that what we hear and see is true.
Defining Fairness within Algorithms
To fully understand whether Youtube reflects its user base’s bias fairly, this paper initially defines bias with three metrics, uniform distribution, demographic parity, and uniform outcome. Under the uniform distribution framework, an algorithm is considered fair if it provides equal distributions of ranking amongst all involved groups. Under the fairness metric of demographic parity, an algorithm should rank or recommend videos directly proportional to the political demographic of the user base (refer to methods). The third way of defining fairness is the notion of uniform outcome. Under this framework, videos should receive placement without regards to the political bias of the video. However, because we do not have access to YouTube’s algorithm, we only define fairness using uniform distribution and demographic parity.
Another principal avenue of exploration this paper takes is understanding whether Youtube actively promotes the creation of echo-chambers through search results and recommendations. For the purposes of this paper, we define political echo-chambers as the confirmation of pre-existing beliefs and exclusion of contradictory beliefs within an individual. In essence, individuals hear the same political opinions over and over again until they believe opposing viewpoints incorrect without ever hearing their case. Prior advancements cite the phenomenon of confirmation bias, or bias towards information that reinforces existing beliefs, to be the source of echo chambers amongst individuals. However, our paper attempts to understand whether Youtube’s algorithm actively facilitates the creation of the aforementioned echo chambers within its “Up-Next” recommendation algorithm, later explored in Experiment 2. Moreover, we also examine the existence of systematic echo chambers, which we define as a system creating a feedback loop in which it actively promotes the confirmation of beliefs of the systems’ users. In the context of this paper, Experiment 1 looks for systematic echo chambers by examining whether Youtube’s search algorithm ranks the videos whose political lean aligns to beliefs of the majority of the user base.
Prior to this paper, a large body of research has been on algorithmic bias. A relevant example of this type of research was the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) algorithm. This algorithm, according to The Atlantic, was created to be used as a risk-assessment “to predict hot spots of violent crime, determine the types of supervision that inmates might need, or… provide information that might be useful in sentencing.” But, according to an investigation published by ProPublica, they found that the algorithm was biased against African Americans. They said that, “Blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend.”
Previous studies have also explored political bias within modern media. A study by Julien Phalip utilised sentiment analysis to measure the overall emotional charge of video titles. Scraping YouTube metadata using the YouTube Application Programming Interface (API) and keywords like “Trump” or “Democrat”, Phalip quantified how biased channels like Fox News and MSNBC are. Drawing inspiration from this study, our team used sentiment analysis as well. Another study done by researchers at UC Berkeley and in Australia found that YouTube’s algorithm stayed from directing people to conspiracy theories or radicalised content and tried to push them towards mainstream media channels. In addition, they used the YouTube API and an additional scraping algorithm on 816 selected channels where more than 30% of content is political and has over 10,000 subscribers, and then categorised each channel regarding their average views per month. Ocelot AI conducted a study where they published many empty videos with different ethical views using multiple unbiased websites. They tackled the decisions that YouTube’s algorithm makes and how it impacts certain content creators and found that YouTube only celebrates these individuals in order to make their platform appealing, but seem to take no action when their algorithm demonetised LGBTQ+ content creators. These findings further motivated us to investigate and confirm whether or not there was any bias in YouTube’s algorithm.
Experiment 1: Search Engine Analysis
For our first experiment, our team scraped the top 200 search results for 30 politically charged terms (gun control, abortion, Democrats, etc. with 15 left-leaning terms and 15-right leaning terms) and then measured the political bias for each video to understand the relationship between bias and search engine ranking. We chose the top 200 results so that we would have a sufficiently large data set for statistical significance, knowing that most users will not scroll beyond the 200 term limit anyways. Our team created the list of politically charged terms by looking at polling results from ISideWith.com on numerous political topics from self-identifying conservative and liberal participants. We chose this website because it is unbiased, with over 55 million voters on its polls. If, for instance, a poll asked, “allow abortion,” and more than half of those who identified as liberal voted yes and more than half of those who identified as conservative voted no, it would be coined a liberal search term.
Table 1 shows each conservative and liberal buzzwords we used for our research.
Moreover, before scraping, our team made four new, distinct YouTube accounts: a liberal, conservative, neutral, and null YouTube account. In each of these accounts, our team members watched a list of 20 videos that were measured to be significantly right or left leaning towards a certain keyword using our bias metric (which will be discussed more-in-depth in the following paragraph). This was done to best mimic average users’ YouTube experiences throughout the political spectrum, allowing us to understand how search results differ for each profile whilst also gaining a holistic understanding of YouTube’s search engine.
To scrape the publicly available results, our team used Selenium with Python to get information about the title, video ID, views, channel, duration, and date of publication. We later used the YouTube Transcript API to retrieve each video’s transcript and compiled the raw transcripts and aforementioned metrics into a comma-separated values file. We then created a dataset of 24,000 video results; 6,000 for each of the four bias profiles. After examining multiple alternatives, including running Google’s sentiment analysis on video titles, our group ended up using Bipartisan Press’s RoBERTa API to measure the political sway of video transcripts. The Bipartisan Press bias calculator, which uses of AI and natural language processing to predict political bias, was trained on a set of around 100K articles from the AllTheNews dataset containing many articles and utilised the Roberta-Large machine learning algorithm, an extension of BERT’s language masking strategy of natural language processing, to further enhance the model’s accuracy. Language masking was used to train this model since the purpose of this method is to mask certain words and allow the model to predict them based on their context, or the words surrounding them. The calculator then returns a bias metric on a given article or text ranging from -42 to 42 (negative values are left-leaning, positive values are right-leaning). BERT used language masking by taking 15% of the words in each sequence and replacing it with a [MASK] token. The model then attempts to predict the original value of the masked words, based on the context provided by the other, non-masked, words in the sequence. Ultimately, their algorithm achieved an 82% accuracy when classifying a text body’s political sway resulting in a +- 7.56 mean absolute error when regressing the numerical bias values. Because of this error, we had to prune the data to a sizable but tractable amount to determine bias, since we had a large data set. We decided to look at the top three videos per search term, since according to a 2019 Backlinko analysis of 5,079,491 search queries, there existed a significant decrease in Google Search Result clicks after the third result, with the top three results making up 75.1% of all clicks. Because YouTube involves a similar search engine, we reasoned that a majority of clicks from YouTube searches were also made on the top five results.
Experiment 2: Up-Next Recommendation Analysis
We had 11990 total videos in the Data Frame and we were missing 10 because one of the videos had been taken down prior to the analysis. We also had 10379 bipartisan press scores (BERT results), since some videos were missing transcripts. Then, we randomly sampled 1200 videos from the six categories of leans which are extreme right, moderate right, minimal right, extreme left, moderate left, and minimal left; making that 200 randomly chosen videos for each category in each lean. Each of the 1200 videos was a seed video, and with each video we cycled through the Up-Next recommendation 10 times.
We got rid of any unneeded factors in the algorithm by turning off YouTube search history and YouTube video history, clearing the cache, and using alternating VPNs to change the location. For the first few videos, we tested different locations using the VPNs and got the same recommendations. This justifies both using VPNs and clearing video history. By not signing in, this also replicates a new user to YouTube.
The Ethics of Scraping
Scraping web data from websites like YouTube is protected under the law. In addition, we do not need prior permission from these websites to scrape this data as it is publicly available. Moreover, the scraping our team has done is not only legal, but actually enhances and further protects rights listed in the Constitution. If YouTube, as a public forum, was biased towards left-leaning videos over right-leaning videos, this violates the principle of free speech in this nation — our research may shed light on this bias, if there is any. Therefore, the scraping our team is doing now is done in order to defend the right to free speech in the long-run for every creator on YouTube, as well as other social media platforms.
Experiment 1: Search Engine Analysis
With four accounts, 30 terms per account, and 200 videos per term, we extracted a dataset of 24,000 videos.
We then had to get the transcripts for each of these videos. Even though we had 24,000 videos in our dataset, we couldn’t use all of them: some videos lacked a transcript from YouTube. Using the YouTube Transcript API, we were able to further refine our dataset to only contain videos that included transcripts. The Bipartisan Press API required a text in order to give a political sentiment analysis, therefore to get a sentiment score, we needed a transcript. It may have been the case that the excluded results affected our analysis, but after analysing the graph for videos with and without transcripts, we did not find much of a difference. As seen in the next graph, (Figure 1) the white lines showing videos without transcripts are distributed evenly throughout the entirety of the dataset, meaning the absence of these results would have had a minimal effect on the overall outcome of the research. After removing the videos without transcripts, the 24,000 video results shrunk down to 18,091 videos, hence all videos could now receive a political sentiment score from Bipartisan Press.
Figure 1 represents the count of videos per each order on page from a YouTube search result throughout the dataset of 18,091 values. The graph has no overall trend, showing that each page order has equal likelihood of containing missing values.
Note that a majority of the videos sat under the unbiased category with a magnitude of 0-7, and a smaller but noticeable amount lay on the extremes with a magnitude of 20+. There seems to be no video on YouTube with a political sentiment analysis result greater than 25, which might be the result of YouTube setting bounds for their media platform. Below is a graph of the data (figure two) with the y-axis representing the percentage of the data and the x-axis representing the bias score given by Bipartisan Press’s API.
Figure 2 shows the frequency of specific political sentiments in decimal notation. The trimodal logistic Cauchy distribution allows us to examine the data within YouTube.
The graph’s symmetrical shape explicitly gives us the median and mean of the data to be around 0. The distribution is a trimodal one, with the middle mode having a logistic Cauchy distribution and the first and last mode following a smaller, but similar pattern. This means that the graph is portrayed as an exponential curve both upwards and downwards. The data contains no explicit outliers. Quite peculiarly, a majority of the videos have a magnitude less than 10 or greater than 20 with very few videos identified between extremely-unbiased and extremely-biased (figure 2). Looking at the top 25 results per page and their views as depicted on the graph below (figure 3), we see that other than the first video, YouTube does not recommend videos based on their view count.
Figure 3 shows the average amount of views of a video based on its page order. The interquartile range is denoted by the black lines on each bar. This proves that YouTube’s algorithm does not recommend videos based on their view count.
It should seem intuitive that videos are sorted in order of view count, but the results are quite different. The graph shows that the first video per page has the highest view count by far, but the second video seems to have one of the lowest. Similarly, videos in the range 9-12 and 18-25 have relatively low views (figure 3). The data for views per order on page seems to not follow a set trend, but is rather random. Conclusions drawn from this data show YouTube places little importance on view count in the recommendation of videos.
Another noticeable result is that the accounts with different biases didn’t make much of a difference in terms of video lean.
Figure 4 represents the average political sentiment of recommended videos based on the account the video was recommended to. This graph proves that YouTube shows no bias in recommending videos based on the user’s predetermined bias.
The average political sentiment of recommended videos per account amounts to around the same number, with a difference between accounts of +- 0.1 on a scale of -42 to 42. That is around a measly 0.11% difference (figure 4). This is an interesting observation since it seems intuitive that the algorithm would feed your interests. Based on this information, we can deduce that recommended videos do not relate to the political leaning of the viewer. All search results in the dataset will be accounted for, since none of the data relates to the bias of the person searching the terms.
Understanding YouTube’s Interaction with Political Bias
Figure 5: Towards the beginning of the results, left leaning videos are recommended in larger amounts than right leaning. Towards the middle it is quite the opposite. At the end, however, the number of left leaning and right leaning videos are around the same.
When the null hypothesis H(0) is that in the top 3, there will be a similar amount of left leaning videos than right leaning videos, we can disprove this null hypothesis and show that our results are statistically significant with p<0.001 when applying two tailed z tests. Essentially, there are 49.2% more left-leaning videos in the top three results than right-leaning videos (figure 5). YouTube’s recommendation is proven to have liberal bias in displaying videos when accounting for these points of analysis.
Experiment 2: Up-Next Recommendation Analysis
Holistic understanding of data
We began with the same data we had initially in Experiment 1. Then, using random sampling, which would allow us to make generalisations about the data, we selected 1200 videos from the six categories of leans which are extreme right, moderate right, minimal right, extreme left, moderate left, and minimal left; making that 200 randomly chosen videos for each category in each lean. Each of the 1200 videos was a seed video, and with each video we cycled through the Up-Next recommendation 10 times. Graphed below is how many videos in each cycle for each seed video had a transcript and it shows whether a certain cycle had more missing data compared to other cycles.
Figure 6: The amount of videos with a transcript per cycle number is graphed. This shows that almost all cycles have an even chance to have a transcript, allowing us to have no significant missing data. This also allows us to analyze almost all of the 24,000 videos, while excluding the ones without transcripts.
This represents the amount of videos per cycle. The amount of videos past the seed cycle seem to be around the same, showing that there was no significant missing data amongst the different cycle numbers and it seems to have a constant trend after that (figure 6).
Figure 7 depicts the polymodal relationship between the bipartisan press score and the relative occurrences of that score.
In the distribution plot of the bipartisan press scores for the matrix seed videos (figure 7), there exists five distinct score clusters. Referring to our methods, this clustering is expected because we chose 200 videos within the 7.5 unit category intervals. The clustering at 0 is composed of two opposing clusters, from the minimal-right bias and minimal-left bias categories. Moreover, within the extreme categories, there is a progression of video count towards the magnitude of 20. Similarly, within the minimal biased seed videos, there is a progression of video count towards the magnitude of 0, thus combining the two minimal biases into a single visual cluster (figure 7). Notably, however, the trends observed above are almost symmetrical over the vertical line where the bipartisan press score equals 0. This means that our data sampling did not specifically favor one lean or category, allowing us to further analyse YouTube’s interaction with bias within the specific video leans and categories.
Understanding YouTube’s Interaction with Seed Video Bias
Figure 8 gives the average political sentiment of videos based on cycle number. As shown, all cycles have similar average bipress scores.
Overall, we didn’t find any significant change, so we concluded that it was not statistically significant.
Figure 9 represents the average bipress score of a cycle with a conservative bias seed.
Figure 10 represents the average bipress score of a cycle with a liberal seed bias.
There seems to be an obvious minimising effect (figure 9 & 10). We start with both seed videos at 10 and -10 video bias respectively. Then, the first cycle seems to drastically decrease to a bias score of 4 and -4 and continues to decrease causing an exponential type curve. At the 10th cycle, we end up with a bias score of around 2 which is a drastic decrease from the seed bias score. This gives some evidence that there isn’t an echo chamber effect. This seems to be statistically significant since we used a 1 tail test when we set H(0) to be that bipartisan value of 10 and it doesn’t equal seed BP value with p<0.01 (figure 9 & 10).
Figures 11-13 represent the average political sentiment of the up-next video based on their seed. The three tests in order are weak right, moderate right, strong right.
Figures 14-16 represent the average political sentiment of the up-next video based on their seed. The three tests in order are strong left, moderate left, and weak left.
Examining the distinct categories of biases within seed videos illustrates how we know there are statistically significant differences between the seed bias score and 10th cycle for all except minimal left bias. All of them have a p value smaller than 0.01 (figure 14-16). Ultimately, the videos until the 10th cycle seem to have a decreasing bias score which contradicts the echo-chamber theory. The expectation would be that if YouTube were to be using the echo chamber theory, the more videos we would click on, the more biased the videos would become.
However, there does seem to be a difference in the minimising effect between categories. At the 10th cycle, there are no statistically significant differences between the left and right BP scores. However, there are statistically significant values when comparing left and right with respect to their categories (figure 11-16). In the extreme category, right leaning extreme seed videos ended up with 94.5% higher BP values by the 10th cycle (figure 13-14). When H(0) is that the value of Right-leaning BP Score will not be greater than the value of left-leaning BP Score, it has a p value of < 0.05. For the moderate category, there were 21.2% higher left-leaning BP scores, but not statistically significant (figure 12 & 15). And finally in the minimal category, right seed videos were at 2.58 and decreased by a 127.1% while left seed videos started at 2.51 and decreased by 23.5% (figure 11 & 16). Right leaning videos saw a greater minimisation of bias by the 10th cycle, with the average bias being roughly -0.69 (figure 11-13). The difference of the two means are also statistically significant (p < 0.001) under the H(0) that left-seed videos will have a BP value magnitude greater than right-seed videos.
Experiment 1: Search Engine Analysis
In this experiment, we aimed to understand how YouTube reflects the overall bias of its user base in its search results. To understand YouTube’s user base, we looked at the average political lean of all videos. We found that the average political lean throughout our dataset was 0.11% (p <0.001). This means that the combination of the quantifications of bias for the user base of videos within the top 200 search results represents a roughly neutrally biased user base. To examine how Youtube reflects bias, we can analyse whether Youtube overrepresents a certain political ideology in the top three search results. If such were the case, this would imply that Youtube magnifies the bias of a certain ideology by displaying videos with such ideology at the top of search results. This can be quantified by directly comparing the average bias metrics of videos in the top 3 results to those of the top 200 results.
When examining the top three results (refer to methods), we noticed that there were 49.2% more left leaning videos than right leaning videos. Assuming that a majority of YouTube users only click on the top 3 results, there is a statistically significant higher amount of left leaning videos than right leaning videos. To determine how YouTube reflects user base bias, we must examine it under both metrics of user base bias. Under the user base metric of views, it appears that YouTube severely overrepresents liberal videos in the top three results. Despite liberal video views, likely composed of liberal viewers, having 31.8% fewer views than conservative video views, there were 49.2% more left leaning views in the top three page order results. Moreover, under the user base metric of video count, it appears that YouTube still overrepresents liberal videos in the top three results. This is because 49.2% more liberal videos in the top three are much greater than 6.7% through all 200 results. By looking at the means of each metric, ultimately, it appears YouTube overrepresents the liberal user base in the top three videos.
In the context of these findings, our paper aims to further understand whether YouTube is truly biased or not. As mentioned in our introduction, we will be examining our results under the fairness metrics of demographic parity and uniform distribution. Our discussion on how YouTube reflects user base bias well encapsulates the metric of demographic parity. If the algorithm were to be considered “fair” in this framework, YouTube would place amounts of left and right leaning videos in equal proportion to the user base metrics (views and video count). However, using our previous discussion on our results with respect to YouTube’s user base suggests that YouTube fails the demographic parity framework of fairness. Moreover, with respect to uniform distribution, YouTube also appears to fail the test of fairness. In a fair world under the framework of uniform distribution, there would be equal amounts of liberal and conservative videos in the top three results. However, because there are 49.2% more left leaning videos than right leaning videos, it appears that YouTube fails this metric as well.
Another unexpected finding in our paper includes how YouTube ranks videos with fewer views higher in their algorithm than other videos (refer to Figure 3). As shown in our results section, the first order on page videos receive, on average, four times as many views as views from those from the second order on page. Whilst initially unintuitive, it is likely that YouTube considers many factors in their ranking aside from views, including CTR, average watch length, date of publication, and relevance. Notably, this finding also suggests that YouTube does not create a view-related feedback loop, which could possibly favor videos due to the simple fact that it gets more views, rather than other describing metrics of a video. More importantly, a view-related feedback loop, as described in a prior study, could lead to the homogenisation of user behavior and the creation of politically biased echo chambers.
Despite coming to statistically significant conclusions, this paper has two potential limitations. Principally, we only examined 30 total search terms (refer to Table 1). Moreover, the polarisation of certain terms has the potential to drastically influence our results. For example, a potential search term might have a large amount of extreme left leaning content whilst only having minimal right leaning content (and vice versa) due to the political nature of YouTube. Referring to the distribution of Bipartisan Press scores in Figure 1, it is unlikely that a certain side had more politically charged content than others, however, this potential limitation should be accounted for in future studies. This can be done by determining the polarisation of a certain political term , the sets of left-leaning and right-leaning terms have comparable polarisation. Moreover, a future study should increase the amount of search terms to expand the size of the data set and include more potential topics.
Secondly, our research is predicated off the accuracy of a third party model created by the Bipartisan Press. Despite them achieving high predictive capability in their train sets, there are multiple unknowns as to the accuracy of their model, especially when applied to video transcripts. To minimise this error, a future study should include multiple bias-determining algorithms, including those trained on video transcripts from YouTube.
Thirdly, our results simply show that YouTube disproportionately recommends more left leaning videos than right leaning videos in the top three results and does not offer insight as to the reasoning behind this disparity. Because our team does not have direct access to YouTube’s algorithm we cannot conclude whether YouTube has nefarious motives, arbitrary selection, or whether this disparity is due to the content produced by politically inclined channels. One potential reason for this result is that left-leaning videos might simply appear more relevant towards the search term. YouTube’s search engine is predicated off the relevance of videos, and perhaps a large majority of left leaning videos are more relevant to the topic. To confirm this hypothesis, a future project would have to train their own relevance model and attempt to make it as impartial as possible. Yet another explanation would perhaps include a larger portion of right-leaning content including derogatory content, which in prior studies, had led to an increase in comment moderation. YouTube might be actively attempting to punish hateful content by ranking derogatory videos lower the search search algorithm. To test this hypothesis, a future project would have to train another model to understand the derogatory nature of individual videos and understand it in context to the political lean and search ranking.
Experiment 2: Up-Next Recommendation Analysis
In the second experiment, we aimed to understand how YouTube reflects the bias of a user in their recommendations and understand whether YouTube will actively direct the creation of an “echo chamber” by recommending content with a greater bias magnitude. To fully interpret our results, we must consider the following scenarios: magnification of bias, minimisation of bias, or preservation of bias throughout the different cycles. Ultimately, our results suggest that YouTube follows a consistent minimisation of bias for both left and right leaning seed videos throughout all subsequent cycles.
Moreover, our results suggest that solely within the YouTube Up-Next Recommendations under the model of a new user, YouTube does not facilitate the creation of an echo chamber, but rather facilitates a progressive reduction in bias magnitude with a cycle progression from the seed video. A possible explanation of this result is that YouTube is actively working to prevent echo chambers from being formed and programmed their algorithm to actively minimise bias with subsequent recommendations. Another possible explanation for this result is that YouTube might simply recommend videos from the same channel or other videos without political context, resulting in the lack of bias. Regardless, our results offer a novel perspective into the field of social media echo chambers, which are widely agreed upon to exist. Past research has looked into the empirics of user trends on particular videos and attempted to deduct an algorithmic facilitation of the creation of echo chambers. However, our research illustrates that the creation of these echo chambers is not directly facilitated by YouTube’s recommendations following users’ first few videos on the site, but is rather dependent on users actively searching for biased content to support their existing beliefs. While YouTube has shown to develop echo chamber enforcing user profiles, it will not develop these profiles by sending users down a chain of increasingly biased videos.
Moreover, within the left and right categories, the minimisation of bias seems to be nearly symmetrical, accounting for the nearly flat line in Figure 13 which includes subsequent biases of videos involving all seed videos because the opposing biases have a cancellation effect when aggregated. However, when examining the different categories, unlike the overall grouping of left and right bias, there were significant differences in the magnitude of minimisation. In the seed videos with extreme magnitude (15-22.5) on both sides of the political spectrum, YouTube’s recommendations led left leaning extreme seed videos to have, on average, 5.5% lower bipartisan press scores by the 10th cycle than extreme right leaning seed videos. This means that YouTube’s minimisation effect actually favors right-leaning content in the extreme category. Moreover, in the minimal bias category (0-7.5), YouTube’s algorithm seemed to favor left-leaning videos much more than right leaning videos. In fact, right-leaning videos saw a reversal of bias, with the average seed video bias being 2.51 and the average 10th cycle video bias being -0.69, compared to the average of -1.92 at the 10th cycle for left leaning videos. This number is particularly pertinent. Referring to Figure 2 from “Experiment 1”, a large majority of search results have minimal bias. Thus, if a new user clicks following YouTube’s Up-Next Recommendations from a political search and watches a minimally biased video, they are likely to end up with a minimally biased video with a slight left lean by the 10th cycle, regardless of the original lean of the video. However, it is equally important to note that the difference between -1.92 and -.69 is small and will likely have no impact upon a user’s political beliefs.
The largest weakness of this experiment is the method of determining bias. Despite the original search results from experiment 1 reflecting a roughly neutral Bipartisan Press score distribution, and despite the Bipartisan Press stating that they have a high-accuracy model, there is the possibility that it produced biased results. A future paper might utilise additional political sentiment models to ensure the accuracy of the results.
From the results our team has gathered, YouTube has shown to exhibit a slight left-leaning bias in search results. However, YouTube counteracts this force by minimizing the bias of subsequently recommended videos after a user clicks a video. In other words, a user will not be led down a positive feedback click cycle of increasingly biased videos.
However, that is not to say that YouTube does not create echo chambers – other studies have already documented the site’s building of user profiles that cater to individual biases. Our study simply analyzes the most formative part of these echo chambers: YouTube’s reflection of bias for new users. We found that YouTube will not send new users down an increasingly biased series of recommended videos that silently shape an individual’s beliefs and simultaneously form a biased user profile. If echo chambers form, it is because users actively search for videos with bias.
Given that YouTube has an immense influence on public opinion, it can have monumental impacts on productive, thought-provoking political discourse. In a time when millions receive their news from YouTube, it remains especially pertinent to analyze and scrutinize how the site reflects the biases of its user base. Our paper aims to add a new voice to the scientific discourse and research on social media sites like YouTube. We ultimately hope that further analysis will continue to be conducted as YouTube and other social media sites continue to evolve their algorithms. Otherwise, millions might find their opinions and beliefs shaped by a virtual “bubble.”
We would like to thank ASDRP as well as our advisor Dr. Mui for guiding and educating us through our process of constructing this research paper. We are grateful for their time and commitment towards all the research groups under this program. We would also like to thank Winston Wang and the Bipartisan Press as a whole for giving us access to their roBERTa bias measuring model.
All of the members included on the paper contributed to the project from July 2020 until August 2020. Specifically, Michael Lutz worked on the scraping and analysis of the experiments and contributed to team discussion and paper writing. Sanjana Gadaginmath worked significantly on the writing in the paper, idea generation during meetings, and helped create the presentation. Natraj Vairavan contributed significantly to the paper writing and meeting ideas. Sriram Srivatsan proposed numerous ideas, assisted in scraping, contributed to the paper, and created the presentation.
Sanjana is a part of a collaborative, hardworking research team that analyzed whether YouTube’s reflection of political bias disproportionately favors distinct political groups and whether it encourages the confirmation of biases. She contributed significantly to idea generation, writing of the paper, as well as creating the presentation. She has shown constant interest and attended every meeting.
1. YouTube. “YouTube About.” YouTube, YouTube, www.youtube.com/about/press/.
2. Fredenburg, Jill Nicole, et al. “YouTube as an Ally of Convenience: The Platform’s Building and Breaking with the LGBTQ+ Community.” Georgetown University, Georgetown University, 2020, pp. 1–41.
3. Yong, Ed. “A Popular Algorithm Is No Better at Predicting Crimes Than Random People.” The Atlantic, Atlantic Media Company, 29 Jan. 2018, www.theatlantic.com/technology/archive/2018/01/equivant-compas-algorithm/550646/.
4. Casad, Bettina J.. “Confirmation bias”. Encyclopedia Britannica, 9 Oct. 2019, https://www.britannica.com/science/confirmation-bias. Accessed 30 January 2021.
5. Phalip, Julien. “Identifying Bias In…”. Julien Phalip,
6. Ledwich, Mark, and Anna Zaitsev. “Algorithmic Extremism: Examining YouTube’s Rabbit Hole of Radicalization.” First Monday, 2020, doi:10.5210/fm.v25i3.10419, https://arxiv.org/pdf/1912.11211.pdf.
7. Fredenburg, Jill Nicole, et al. “YouTube as an Ally of Convenience: The Platform’s Building and Breaking with the LGBTQ+ Community.” https://repository.library.georgetown.edu/bitstream/handle/10822/1059448/Fredenburg_georgetown_0076M_14645.pdf?sequence=1&isAllowed=y
8. “Political Issues of 2020.” ISideWith, www.isidewith.com/polls.
9. Adlersays:, Ben, et al. “We Analyzed 5 Million Google Search Results. Here’s What We Learned About Organic CTR.” Backlinko, 27 Aug. 2019, backlinko.com/google-ctr-stats.
10. HIQ LABS, INC., Plaintiff-Appellee, v. LINKEDIN CORPORATION, Defendant-Appellant. No. 17-16783, 9 Sept. 2019, p. 1-38. https://globalfreedomofexpression.columbia.edu/wcontent/uploads/2020/03/HIQ-v-LinkedIn.pdf.
11. Chaney, Allison J.B., et al. How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility. Princeton University, 27 Nov. 2018, arxiv.org/pdf/1710.11214.pdf.
12. Jiang, Shan, et al. “Bias Misperceived: The Role of Partisanship and Misinformation in YouTube Comment Moderation.” Proceedings of the International AAAI Conference on Web and Social Media, 6 July 2019, www.aaai.org/ojs/index.php/ICWSM/article/view/3229.
13. Quattrociocchi, Walter, et al. “Echo Chambers on Facebook.” SSRN, 15 June 2016, papers.ssrn.com/sol3/papers.cfm?abstract_id=2795110.
14. Bessi, Alessandro, et al. “Users Polarization on Facebook and YouTube.” Plos One, vol. 11, no. 8, 2016, doi:10.1371/journal.pone.0159641.