From the course: Data Science Foundations: Fundamentals
Bias
From the course: Data Science Foundations: Fundamentals
Bias
- [Instructor] You're doing wonderful technical work. You're working with a great computer. You got good code, and anyhow, sometimes things just glitch and things fall apart, and it just kind of explodes on you. Things can go wrong in data science like they can everywhere else, and there are really too many examples to list. There are social media bots that start behaving very badly, very quickly. There are criminal justice applications that seem to perpetuate a lot of stereotypes. There are credit, loan, and financial applications that get processed by machine learning and seem to exacerbate some of the inequities in life. And there are issues of social representation in social media and other forms of data science that really kind of get problematic. Now, some of these are due to technical issues. Maybe the training data sets had limited variability. They didn't cover all the possible cases, or certainly not in a balanced way with appropriate number of cases per group. There also may have been statistical artifacts from small samples. You can get a correlation in a small sample that's not there in the overall group. Also, you can focus on overall accuracy and ignore the important differences between subgroups. These are technical issues, and they can be dealt with a lot of times in a technical way. There are also, however, failures that really have to do more about the people carrying out the project. There's the failure to gather a diverse training set that does represent all the people, all the situations that your algorithm might encounter. There's the failure to capture diversity of data labels, knowing that what is in this picture? Is this a good thing or a bad thing? And that varies from person to person, but it also varies from one culture or group to another. And there's the failure to use more flexible algorithms. It can look not just at large-scale, macro, overall patterns, but at the more micro, nuanced, individual subgroups and exceptional cases. One of the consequences that this can lead to self-fulfilling prophecies. So for instance, there is research that shows that women get shown job ads for lower paying jobs. And so a woman is shown ads for lower paying jobs, so she's more likely to apply for those and get one of those lower paying jobs. And her job is now taken as data into the algorithm that makes the recommendations, and it just kind of reinforces and exacerbates the problem. And so this is one where a lack of awareness about how the algorithms work can, in fact, create the problem that maybe you wanted to avoid in the first place and that you should have avoided. Now, there are a few things that you can do. Number one is you can, of course, check for biased output, and this requires having judges from a very wide range of perspectives to look at the data. You can consult with all parties. The people who are going to be using the algorithm, the people whose lives will be affected by it, they should be included in the evaluation of the algorithm itself. And of course, include diversity, not just demographic diversity and cultural diversity, but lots of different methods, lots of different ways of looking at the data, and evaluating the impact of your work. Now, this is all true for predictive AI, which is generally trying to predict how a human would act in the same situation. But generative AI introduces even greater risks. So for instance, when making images or writing texts, there's the risk of either overrepresenting or underrepresenting different groups. You don't want to end up with a whole bunch of clones, and you don't want to have algorithms that specifically avoid things that people know historically to be true. And there's also the fact that generative AI, while it's based on amazing programming and amazing mathematical calculations, is frequently really bad at doing math on its own. And it's subject to hallucinations, where it says things that were not there in the training, so that then it makes things up. And so you got to be really careful with that. And then finally, because it's a computer and because these algorithms are coming from these big technical organizations, they tend to encourage a lot of trust and arguably, too much trust in certain situations, where people should keep their wits about them. They should be a little more skeptical from time to time about what the algorithm is giving them. You know, then there's the effect of people realizing that the text or audio that you sent to them are generated by AI, usually because things seem a little off. And aside from not being professional quality, I mean, really people expect better. It leads to an erosion of trust. If you messed up in this one very simple thing, how do we know that we can trust you in things that are more important? And that kind of trust is very difficult to rebuild, so be cautious with what you do. And if you want to learn more about this, we have another course here at the LinkedIn Learning Library on AI accountability essential training that addresses a lot of the issues that come up in data science and AI, and how you can better anticipate them, prepare for them, and respond to them, so you can then use AI in a more effective way to get the insight you need from your work without bringing in new, unexpected, and potentially troublesome difficulties.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.