It might be a source of contentment for some that artificial intelligence (AI) has crept into and added convenience into our lives such as self-driving cars and robot vacuum cleaners. However, it comes as a surprise to many that the AI-based products from tech giants that we often rely on have biases in their facial recognition and analysis systems. This concern is observed in multiple scenarios, from Africans having a much lower chance in receiving the same medical treatment as their white counterparts to incorrectly identifying gender and age in surveillance systems when analysing darker skin tones.
What’s wrong with the data?
So what causes this partiality? Evidently, machines are not organisms and do not have inherent prejudice. Nevertheless, the humans responsible for training machine learning systems may make the program biased. Around early 2010s, Joy Buolamwini, a graduate student at the MIT Media Lab, sequentially investigated the data sets used in IBM’s, Microsoft’s, and Face++’s commercial facial-analysis software while she developed her project, Upbeat Walls, which incorporated the same data sets as the operations of large tech companies could not track her darker-skinned employees’ movements during its testing stages. Essentially, the studies found out that the images utilized to feed these neural networks, or webs that receive inputs and process through its many layers to categorize the subjects as either male or female, are made up of more than 77 percent male and at least 83 percent white . She tested the three programs and found out that the error rates were less than one percent when it comes to correctly identifying if the picture illustrated is a white male. In contrast, when it comes to darker-skinned women, the inaccuracy rate leaped to more than 20 percent in one product and around 34 percent in the other two. .
Another notable example of inequality in AI systems was portrayed in 2012 when scientists in Stanford, Princeton, and the University of North Carolina participated in a project called ImageNet, which enabled computers to recognize all sorts of visual images, from plants to skiers. While this no doubt expanded AI potential vastly, researchers pointed out some flaws hiding among the depths. In the images’ labels, white men were associated with jobs like “programmers” while women were linked to careers like “teachers.” In addition, racial slurs like “negro” were included .
Whether they are a technological continuation of intentional prejudices or not, incomplete data is still a main cause of artificial intelligence biases. In both cases, under-representation is no doubt the root of this issue. Both projects mentioned earlier would benefit from a well-rounded representation in the training data in terms of gender, age, and skin tones . To confront the bias in ImageNet, the team used crowdsourcing to remove any derogatory words and replace them with objective labels. To expand the demographic diversity, the team developed networks that can load various images in order to stop stereotyping certain genders with specific occupations .
In order to find a solution to tailor her invention to all races, Buolamwini, the founder of the project Upbeat Walls as mentioned above, also worked on varying the racial and gender representation in her image set in order to increase the accuracy of the model in its task of binary decision to conclude whether the person is a man or woman. This diversification of images would progressively work against discrimination of any demographic group who uses her commodity as it would provide the computer software with a wider variety of factors to consider like distance between eyes and shape of the eyebrows. Buolamwini took one step further in her investigation by working with a dermatologist to assign scores of I to VI to people of different genders and skin tones according to the Fitzpatrick scale (I being the lightest skin color and VI being the darkest skin tone). She ran the three applications utilized by billion-dollar tech companies again and as expected, a clear pattern of higher error rates in women and darker-skinned subjects can be seen when it comes to gender classification. The most astonishing is that the algorithm might as well have been taking random chances when it comes to identifying women assigned with a score of VI as it had an accuracy of only 46.5 to 46.8 percent .
Joy Buolamwini’s research pioneered the standard for algorithms to accurately and fairly serve all cultural groups. She questioned the benchmark of whether machine learning models are really considered successful if accuracy was only evident in one group. After Buolamwini’s publication, IBM and Microsoft announced their dedication to further raise the accuracy for darker-skinned demographic groups in their facial-analysis systems .
Bias in Healthcare
Numerous applications have been made to persist in providing additional resources and improving the degree of medical supervision to serve around 200 million individuals in the United States annually. Although it is hard to pool statistics due to patients’ and overall data privacy, small reports have been gathered on the unfair decisions that have been made by algorithms ranging from our education system to the fabric of healthcare . The continuation of racial disparity has led to a lack of proper treatment for African Americans and Latinos and thus absence of diversified data to feed to applications, which leads to these systems always benefitting the white population only .
All physicians swear by the Hippocratic Oath before practice, but studies have shown just how challenging it is to actually carry it out. Most doctors have good intentions in tending to their patients to the best of their abilities, but prejudices outside of consciousness called “implicit bias” can shape their everyday actions and still harm the minority. While this in no way justifies the unequal service that Latino and Black patients receive, the cause of this discrepancy extends way further to the policies and perceptions that make up healthcare, making up institutional racism. This is the part where artificial intelligence shines a light and gives insight to doctors as it can compare the treatment each physician is giving to patients of different backgrounds with clear percentages and error rates .
Before the pandemic hit and brought a temporary halt to citizens’ lives and many industries, a predictive algorithm has been found to exclude African Americans. Optum, a branch of UnitedHealth group, had made a system that recommends which patients should be consulted first by assigning risk-scores to each. However, this list of priority is made by determining how much they had spent in the past on health care . Although it seems reasonable that higher health costs might correlate to greater health needs, black people spend around $1800 less annually not because they are generally healthier. In fact, the average African American is more prone to developing conditions such as diabetes, anemia, high blood pressure, and more due to genetics and other factors. So what is this issue stemming from? Again, the answer is systemic racism . Findings have reported that when a white man walks into the emergency room with the same Covid-19 symptoms as a black equivalent, Caucasians were tested far more. As the pandemic continues, darker-colored people are still killed at a rate of three times higher than light-skinned ones due to delay of treatment. As a result, dark-skinned civilians were assigned lower risk-scores and had to be sicker than whites with the same medical conditions in order to get treated .
It is always a challenge for programmers to find another variable in context that can mitigate the prejudice. The application might have served the welfare of minorities better if it accounted for other aspects like age, gender, and different ethnical background, but the long-standing prejudice is still apparent in society .
How can we fix this?
Harvard Business Review has proposed a solution that might overcome the bias found in machine learning models; this resolution is what they refer to as a “blind test.” In this test, the algorithm would still be trained by all the data. Next, it would do test runs again without the variable . In the case of the software made by Optum, factors like race might be excluded on the second run to see if the system will make the same prediction. And at the end, researchers discovered that when they remove biased variables, the partiality can be reduced by 84% .
A relatively less technical method is introduced by NYU AI Now Institute has also offered a similar approach- algorithmic impact assessments (AIAs), which incorporates multiple rounds of auditing: internal, external, and public., As a self-check, developer teams can write a “bias impact statement” that offers possible ways that the system is causing discrimination. Next, the product’s targeted audience can be involved to give feedback. Finally, the AIA works with the government to back up users in their freedom to speak up regarding unfair algorithmic decisions. Clearly, the company would look at how their commodity would bring displeasure to their customers in any way which would affect their sales and can also bring up legal troubles. Therefore, it is essential to start with reviewing legal protections around health care, fair housing, employment, and more to have an idea of what needs special attention during designing and testing stages .
The need for humans to diligently work hand in hand with artificial intelligence algorithms is greater than ever as machine learning continues to expand in capacity to bring convenience into our lives. Although this will revolutionize technology, it will not change until we commit to auditing the softwares over and over, a process called algorithmic hygiene, until no one group is excluded.
- Hardesty, Larry. \”Study Finds Gender and Skin-type Bias in Commercial Artificial-intelligence Systems.\” MIT News, February 2018. Accessed July 28, 2021. https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212.
- Lee, Nicol Turner, Paul Resnick, and Genie Barton. \”Algorithmic Bias Detection and Mitigation: Best Practices and Policies to Reduce Consumer Harms.\” Brookings. Accessed July 28, 2021. https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms/.
- Knight, Will. \”AI Is Biased. Here\’s How Scientists Are Trying to Fix It.\” Wired, December 2019. Accessed July 28, 2021. https://www.wired.com/story/ai-biased-how-scientists-trying-fix/.
- Ledford, Heidi. \”Millions of Black People Affected by Racial Bias in Health-care Algorithms.\” Nature, October 24, 2019. Accessed July 28, 2021. https://www.nature.com/articles/d41586-019-03228-6.
- Pearl, Robert, M.D. \”How AI Can Remedy Racial Disparities in Healthcare.\” Forbes, February 16, 2021. Accessed July 28, 2021. https://www.forbes.com/sites/robertpearl/2021/02/16/how-ai-can-remedy-racial-disparities-in-healthcare/?sh=65b1e31130f6.
- Uzzi, Brian. \”A Simple Tactic That Could Help Reduce Bias in AI.\” Harvard Business Review, November 4, 2020. Accessed July 28, 2021. https://hbr.org/2020/11/a-simple-tactic-that-could-help-reduce-bias-in-ai.