PDXPLORES

Developing Accessible, Equitable and Evolving AI Language Tools with Ameeta Agrawal and Antonie Jetter

Ameeta Agrawal and Antonie Jetter Season 2 Episode 29

On this Episode of PDXPLORES, Ameeta Agrawal, an Assistant Professor of Computer Science at the Maseeh College of Engineering & Computer Science, and Antonie Jetter, the Associate Dean for Research at the Maseeh College and a Professor of Engineering and Technology Management, discuss applying interdisciplinary research techniques to artificial intelligence (AI) language tools that fairly and accurately serve diverse communities. Through the Compassionate Computing (or, CoCo) labs, they seek to mitigate text-based biases and inequality attributes within AI to evolve this accelerated technology into natural, equitable and representational digital language. 

Follow PSU research on Twitter: @psu_research and Instagram: @portlandstateresearch

Welcome to PDXPLORES, a Portland State research podcast featuring scholarship, innovations and discoveries pushing the boundaries of knowledge, practice, and what is possible for the benefit of our communities and the world. My name is Antonie Jetter. I'm a professor in engineering and technology management and also the Associate Dean for Research in the Maseeh College of Engineering and Computer Science. My name is Ameeta Agrawal, and I'm an assistant professor in the Department of Computer Science in the Maseeh College of Computer Science and Engineering here at PSU. So, I'm interested in developing efficient and accessible computational tools, mostly artificial intelligence tools. Within artificial intelligence, my research focuses on natural language processing, which is the field of making computers understand human language, not just in English, but many other languages. In terms of grants, uh, I've been fortunate to receive a few recently. An NSF grant, along with Antonie, that was the first one that got the whole thing started. It is focusing on strengthening American infrastructure. And it's a planning grant, and we're working with transportation experts. The other one is NSF CRII, which is for increasing diversity and fairness in summarization models. So that was another one. And very recently, again, along with Antonie and our other partner from Coco Lab, Compassionate Computing Lab, we were lucky to get another grant which is going to study trust in cryptocurrency. I'm, um, what's called a participatory system modeler, which is a very huge word, but fundamentally what I'm trying to do is represent very complex systems. Usually systems where technology and people interact in a way that we can make sense of them and make good decisions about them. And the participatory part means that I'm talking to people who know something about these systems because they're either a part of the system or they're experts in this aspect. And I've always been super interested in using technology to do this. And so when I met Ameeta, I immediately said, "Well that's perfect. Here's somebody who does natural language processing and we need to team up." And hence the grants we've worked on. Compassionate computing was an idea born in 2021, and then it took a couple of more months for it to actually conceptualize and become concrete. And it was basically just understanding all of our computational tools, existing ones, how well do they work for diverse communities that actually use them? Do these computational tools actually benefit the people that need them the most? And it was largely inspired from a lot of discussion around fairness in AI, biases in a lot of our artificial intelligence models. So that was one aspect of it. Mitigating the biases and the unfairness attributes. But the other aspect is how do you make these tools available more broadly and make them more accessible. And also educate communities about these evolving tools, which there are so many these days. Along Minu, Mrinalini Tanka from the Department of Anthropology, also an assistant professor. We decided to found Compassionate Computing Lab, or CoCo, and there's been no looking back. Yeah, and we had a lot of discussions around what to call this lab. And we finally settled on Compassionate Computing because we were looking for compassion in technologies. They should be compassionate to the users who have to put up with them. Technical systems should also be compassionate towards different aspects of society, different goals, different constraints. And so we figured that best reflects our vision with this lab. We are an interdisciplinary lab. As such, we are kind of virtual because we have students sitting in the Department of Engineering Technology Management or in Anthropology or in Computer Science. But we have met and we have had students work together actually in the same space and we've done hackathons and things like that. But we're mostly virtual. So natural language processing, or NLP for short, is a subfield of artificial intelligence, or AI. And it boils down to making computers understand as well as generate text in more natural form as humans speak. And of course, this is very difficult, given the very complex nuances of our languages and not just English, so many other languages. By some estimates, there are almost 7, 000 languages in the world. And at the moment, these models roughly cater to about 100 or so. So we have a long way to go. And so that's natural language processing. Yeah, and computational system modeling is simply representing complex system in a computer so that you can do quantitative analysis, forecast how the system will likely behave in the future. And one example that my research has absolutely nothing to do with, but that everybody's familiar with, is climate models. As you must have heard of large language models, or LLMs for short, are a generative AI. These models are humongous, gigantic, massive. They have billions to trillions of parameters, which of course cost a lot to develop these models. What are millions of dollars? But that's only one part of it, given their immense sizes, even deploying these models for regular users with limited technical expertise or resources is also a huge challenge. So that is definitely one of the barriers for making these computational tools a little more accessible. And given the scaling issues these days, uh, we don't foresee these models getting smaller anytime. However, there is a great interest in making these models more efficient. And there's a huge body of work that is looking at more efficient AI or efficient NLP models and how to either make these models smaller or make them more reasonable for inference. So, there's hope. Yeah, my field is ultimately management and I understand business and I understand the motivations of companies building these large language models and very often the motivation is large markets, things that are usable in very, very broad context. And very often good enough is good enough. So it doesn't have to be super precise. It also doesn't have to represent everybody and everybody's opinion. Good enough pays the bills. And that sometimes is a barrier to being truly compassionate towards different groups and different users. I think market forces certainly factor into this because when we look at particularly those large language models, they are, while the fundamental technology came out of research, they are really industry defined. And with all the billions of dollars that need to be spent on this, they don't happen in universities anymore. And so Markets shape what companies invest in and what they care about and where they might say, "Well, we're not quite that hung up on quality or precision." So I think one way to close the gap is to simply think deeply about where gaps might be and to do research understanding these gaps. And I think another piece is by allowing people to use these tools in the context of their needs. To also empower them to fully understand the technology, to see the limitations, to advocate for improvements, and to leverage what's out there for their own goals and their own projects and have access to it. So, Compassionate Computing. Basically, we try to have two priorities. One is to educate the communities by involving them in our discussions and making them aware of all the latest technologies and how they can use it in their context, as Antonie mentioned. And then the other one is hearing their voices and their concerns, and hopefully developing tools that work for them. And so learning from communities and understanding what their priorities are, so we can focus on making tools that are actually usable. So, it goes both ways. I feel it's changing quite rapidly right now, because I think what we are seeing is that also these large language models will be off the shelf capabilities where people can just kind of call them for their own projects. And so I think with a fairly limited amount of understanding of computer science basics, people will be able to build models and use these technologies. The problem, of course, is if you don't know that much about it, it is very difficult to judge,"Am I doing the right thing? Might I even be harming my community?" And of course, even this very basic computer science knowledge is certainly not equally distributed across all populations. And so there will always be people who have more access to the experts than others. And that's one of the goals with the education piece in CoCo Lab. To work towards fixing that. As I mentioned earlier, we have thousands of languages in this world. And these models are trained on data that has been scraped from the Internet. And what we consider some languages have a lot of data available. So if you think of languages such as English, French, Spanish, Mandarin, they already have a lot of data available on the web. Hence, these models are predominantly trained on those language data. For the other languages, the data is less. And for a lot of languages, the data is non existent. So this imbalanced distribution of languages in the modeled training itself makes the model be a little bit more favorable to processing and understanding English text. So that's just one problem. But the problem is, given the massive scale and training and the investment that goes into this, how do we even get to a point where we will have The other languages represented in our training data? Who has the motivation to make that happen? And motivations aside, there are also technical challenges in the foreseeable future. We don't think we will have enough data for those languages, but we would ideally like our models to serve all communities. So that's a technical challenge as well as a financial challenge, as well as who is in the power to make this happen. So all of that really, really complicates the scenario. But you have a lot of other communities who are taking off the shelf models and trying to come up with innovative solutions that work for them in their context. The other part of that English centric view is even when you do try to have these other languages represented, that predominant view still persists. And that's a question that remains open and somebody hopefully will have a solution to it. The Internet consists of a lot of technical texts, a lot of proper English newspaper articles. Certainly male voices are more represented than female voices. If you want to train cat videos, you're in great shape. If you want to train other things, maybe not so much. And so I think even because of how the internet has evolved, we don't have equal amounts of data and representation for the English language either. And so I think there's a lot of space for exploring how this might affect different groups differently and how to fix this. If you see that the models are not reflecting or not answering the way you were expecting them to answer, sometimes even when you ask them to answer in a language X, it might respond in language Y, if it's especially extremely low resource language. There are inaccuracies right there as well. The other risk is, it might actually reflect some stereotypical biases in its responses. And that's because when the data in the beginning is less, it also is not of a very high quality. Sometimes it's plain gibberish or has a lot of strong stereotypes. So those get reflected in its responses. So, not just for other languages, even for English. The response can be inaccurate or biased. But this effect is more extreme in those low resource languages and less represented languages. I think that's probably not even a specific feature of big language models. When whenever people have conversations, they of course influence each other's thinking. But now we build technology where I'm having the conversation with an AI. And that impacts my thinking. And the AI was trained on data that, that might have a certain worldview. And of course, yeah, that does cause people to maybe adopt this worldview a little bit more and reinforce stereotypes. I think that's exactly the big change that is happening. Suddenly, these models talk to me and I talk to them as if I talk to a real person. The landscape of Portland City is very diverse. We have lots of multicultural communities here and that shows in our student base, for instance. But some non-native students, as smart as they are, are now starting to lean on chat GPT type of models to write their emails. So even though they are perfectly capable of writing an email and expressing things that they're wanting to talk to you about, they will still once pass it through a ChatGPT. And of course it is taking away their own style and way of expressing. And everything is starting to look more and more similar. So that's just a change I've noticed as an instructor in just the last few months, a lot of these ChatGPT emails. And not to mention that they are unnecessarily long. What could have been said in just a couple of sentences before is now three paragraphs long. So, there's that concern. The other concern is it is also, all of this artificial intelligence or machine generated text, because it looks so polished on the surface --almost fluent and free of any grammatical errors-- it also sort of limits our ability to distinguish between what's real and what might not be genuine. But just because it's packaged in a nice way, we tend to believe it a little bit more than we would've believed something else in the past. And I think that the other interesting question in this is not only do things start to look very much the same, but as more and more content on the web is produced by AI, and the next generation AI model is then trained on this content, what happens to the quality of these models? What happens to their ability to kind of express nuance? If a large portion of the training data is AI generated already? So I think we're looking at just some pretty interesting development within the next years. I don't think there has been a technology that hasn't impacted our language. I mean, we've used terms like,"I got my wires crossed." Shakespeare certainly didn't use that term, right? So, of course, yeah, I think AI will change language over time. All communities are affected by technology. Honoring community voice would simply be understanding what fears and concerns people have about technology--how they want to use it, where they think it might be beneficial to them-- and make this part of the technology planning to make good decisions. And also look at things like,"Well, do we need to regulate? Do we need to incentivize certain things? What do we want as a society and what is it we don't want?" In an ideal world, we would want an artificial intelligence model, or AI models, that is unbiased and fair and not have these theoretical biases. But it's a challenge to get there. The biases in our AI models? For instance, again, in the context of NLP, assume you have a task such as sentiment analysis, where given a piece of text, you would like it to categorize it as positive or negative. And it happens because, --again, of the training data and sometimes during the training process, the biases are actually amplified-- the model might say text written by certain communities or in different dialects or just based on certain ethnic names present in that text is enough to classify that text as negative. And so these are some superficial associations that the model has picked up over time, and they're extremely hard to revert. And often, given the scale of our data, we may not even notice them for a long time. So that's one thing. So that's in the context of positive and negative. But how does that really affect the communities? So let's say you're on a social media platform, and they're using content moderation tools, where if text is deemed to be toxic or harmful, it may be removed, or flagged, or deleted. So again, the same biased sentiment analysis models that were labeling text as negative may also label it as harmful or toxic, and ultimately there are voices that are being removed or suppressed from being expressed on the social media platforms, even if they were not toxic. Even if they were not harmful. So that is a very real example of how these bias models may end up changing what we hear --the narratives that are being expressed --unfairly. So for the detection part of these biases is where we actually need community involvement. Just a very small set of researchers don't have the knowledge or the resources to really identify these biases. And that is precisely where community involvement in using these models and developing these models is critical. They will be able to identify when these models are being biased or unfair. And that's one of the goals of CoCo, to get communities more involved in the process. Identifying the biases is step one, but mitigating it is even harder. Because we're mostly using these off the shelf models, we don't always have control over the parameters of the model, in that way. So you're not easily able to change the model. And so there's a lot of active work going on in the fairness communities in how to mitigate some of these biases. I think there's even a weird trade-off or potential trade-off between being biased and being a powerful, useful model. We know it's a value to not stereotype, yet if I, um, have a big suitcase --I want to get it on the MAX-- I will not ask the little old lady to help me with this. I will probably look to some young man who looks strong and athletic. Because the stereotype is quite useful in this situation. And so, as you start kind of de-biasing models, you might run into situations where the model just doesn't perform that well anymore. And that's why I think it's so important to research bias in the wild in the context of real applications so that you can really identify, "This is bias that hurts people and needs to be removed. And this is bias that doesn't do anything bad and it helps produce powerful models." Given our other prior work and many other studies in this area, we were interested in looking how our summarization model... so, summarization model is something that takes a really long document and generates a shorter, more concise version of it, or takes a set of documents, multiple documents, and tries to generate a coherent summary. So, summarization model is a very interesting question or problem to think about because our input can be very, very diverse. It can be from different communities of different dialects. And so the question that we're exploring in this project is, "When you provide very diverse input to your models, is that diversity of input actually reflected in the summary that was generated?" Our initial preliminary analysis into this showed us that the answer was "no." And so as scientists, we had to quantify it. So we first created a new data set. So that's what you call DivSumm. And we collected text from three different dialects --African American Vernacular English, Hispanic English, and White English-- and we created a summarization data set. So you have all of these dialects represented in the text. And the question is, "First of all, how would humans summarize such diverse data?" So that brought us back to community involvement where we had to, through the help of many self expressed diverse students who are fluent in these different dialects, help us summarize the text. And it was very, very satisfying to see that humans generate very balanced summaries. So in the summaries that were actually generated by humans, we saw a proportional distribution of these dialects, which is great. Gives us hope that we can go ahead and actually try to do this computationally. And like I mentioned before, the off the shelf models are not really good at generating balanced summaries. So this project then explores this problem and tries to propose new summarization models that can actually generate more balanced, more representative, summaries. And one of the ideas that went into this project actually came from The world of participatory system modeling where we learn that there is this notion that if you ask a lot of folks in the community and they all contribute to the model, the more knowledge you embed, the better the model. And in our own work, we found that that is not always true. But sometimes the collective intelligence of way too many people causes our system models to kind of focus on the evidence stuff that everybody can agree on. And to use all of the diversity and knowledge about system aspects that not everybody has observed. So I kind of posed the question to Ameeta and said, "Well, doesn't that happen in NLP too? Is more not always better?" Because currently, NLP models are trained on the most amount of data available. So if you have this assumption, it always gets better as you add data points. And at least for the summarization, we found that some of the tricks that work for the system modeling approach to fix this problem of crowding-out important information also applies to summarization and NLP models. So as educators, we've noticed that students get really, really, really enthusiastic and excited to try out any new technology. So when ChatGPT was introduced, it was very natural for CoCo to start thinking of how can we come up with an activity, an event, that is centered around all of this new AI tools. And from there, we decided to have a one day event, so we called it the Ideathon, based on what Antonia has once done in the past. And also at that point, I realized that the word "hackathon," which I'm more familiar with as a computer scientist, doesn't have the same connotation in other communities and disciplines. So, we settled on Ideathon, and it was AI tools for creative storytelling. And of course, you need to bond your problem a little bit, so the scope was climate change in Oregon. Something that we're all very, very passionate about, worried about. So, we brainstormed for a couple of weeks, and thanks to a lot of help from our students, from computer science, as well as ETM, we all came up with this very neatly planned one day event. And, it was open to students from any discipline, frankly, at PSU. And at the end we had about 10 to 12 different disciplines represented that day. It was amazing, to say the least, to see what the students came up with in a span of mere hours. So of course we had breakfast and a team building activity, and then lunch. Of these students go in very different, diverse teams. So one of the constraints that we put on the ideathon was there should be at least three different disciplines represented in each team. And so that was good. And these themes went along, brainstormed a couple of ideas, came back for lunch, let these ideas simmer, and then actually went back again and executed those ideas, which was followed by a presentation by the students. And it was mind blowing to see what they had created. So there was a team that did a mock up of a phone application, or an app, for showing what the world --or, specifically Oregon-- would look like if we don't address the effects of climate change versus if we did. So you would still have Mount Hood to go to, or, you know, what would the Oregon coast look like? And so they were showing you all of these doomsday scenarios versus, uh, what if we took better care of our planet? And of course, I should clarify this point, that ChatGPT is just one of the AI tools that we have. It does text and increasingly starting to also do visuals, but there are also many other tools that the students use that day. One of them is called Midjourney, which generates images, and then there's several other AI tools that can help you generate something. So a combination of tools were used that day. Another team did a chatbot, a conversational model, except that it speaks like an Oregon farmer. Because, you know, when farmers have questions and they would like to maybe possibly use one of these technologies, how can they get more contextualized responses? So you know that, what does it mean to ask for how to take care of strawberries in June in Oregon versus in Florida? Another team created a storybook for children, because naturally they're the future and how can we convey climate change to young minds in a more creative, visual way, in a more interesting way? And that was fabulous. That was really really very cool to see the visuals that they came up with and the characters were also generated using AI. So you had a beaver, a mountain hood and a Doug fur, as your main characters in that storybook. And then my personal favorite was a play. Again, all the screenplay, the writing, the backdrops and the narratives, everything generated using multiple AI tools. And the motivation for that was,"Well, climate change affects farmers or rural community severely. But how do we convey this to the community?" And of course, they're not reading research papers or technical articles. They go to community fairs and learn and discuss from friends and family. So this was a play to convey the effects and possible mitigation strategies, adaptation strategies. And that was my personal favorite. But, the teams, they blew my mind. It's something that I'm still thinking about a couple of months later. And it was amazing. And I can't wait to do another one. We can't wait. Of course, all this was done not just by CoCo, but we also had some fabulous mentors --faculty mentors-- who were very generous with their time and their ideas. They came to the event, and I'd like to mention a special thanks to Dr. Brianne Suldovsky from the Science Communication Department and Dr. Kathi Inman Berens from the English Literature Department, who really helped the students think about and emphasize not just what they're developing, but also first, who they're developing this for, which actually shapes what you develop. My name is Antonio Jenner, and in my research I try to make sense of super complex systems with the help of what people know about them. And increasingly, I'm trying to do this with NLP. And my name is Ameeta Agrawal, and I'm interested in developing AI models that actually work for people, that reflect their voices, and also in educating people about these AI tools. We have this CoCo Lab, and under this umbrella of interdisciplinary research, we are able to do things that we probably and most likely were not going to be able to do otherwise. Thank you for listening to PDXPLORES. If you liked what you heard on this episode, please rate and follow the show anywhere you get your podcasts.