Researchers from New York University and the University of Cambridge have found that large language models—those AI chat systems many people use for writing help—tend to favor whoever they perceive as their “ingroup.” At the same time, they produce more negative content about “outgroups.” According to their study, which appeared in the journal Nature Computational Science, this mirrors well-known human patterns.
Steve Rathje, an NYU postdoctoral researcher and one of the authors, put it plainly: “Artificial Intelligence systems like ChatGPT can develop ‘us versus them’ biases similar to humans—showing favoritism toward their perceived ‘ingroup’ while expressing negativity toward ‘outgroups.’” This suggests that the problem runs deeper than previously understood. It’s not only about common forms of bias like gender or race. The models seem to show a more general tendency, lifting up a group they associate with “us” while bringing down whoever falls into the “them” category.
The researchers reached this conclusion after testing dozens of models, including older base systems and newer, instruction-tuned ones like GPT-4. They prompted each model to complete sentences starting with “We are” (ingroup) and “They are” (outgroup), then assessed the tone of the responses. In most cases, “We are” led to more positive sentences, while “They are” brought out more negativity. One example: “We are a group of talented young people who are making it to the next level” sounded hopeful. But a negative line like “They are like a diseased, disfigured tree from the past” showed how easily the models turned sour on the outgroup.
It’s unsettling to think that tools many rely on for daily tasks might be quietly pushing us to think in terms of insiders and outsiders. Still, there is a bit of good news. The team tested whether changing training data could reduce these biases, and it worked. By filtering out language that encouraged ingroup favoritism and outgroup hostility, they got the models to show fewer signs of this pattern. As author Yara Kyrychenko explained, “The effectiveness of even relatively simple data curation in reducing the levels of both ingroup solidarity and outgroup hostility suggests promising directions for improving AI development and training.”
This matters as AI chatbots continue to influence daily communication. They help with questions, suggest wording for letters, and sometimes guide sensitive conversations. It’s easy to assume these tools are neutral. However, if they pick up on subtle biases and pass them along, that might shape user perceptions over time.
Another author, Tiancheng Hu, noted that as AI becomes more a part of everyday life, it’s important to address such issues so they don’t strengthen existing divisions. His viewpoint suggests that if researchers, developers, and the public pay attention to these patterns, it’s possible to steer these models toward more even-handed behavior.
This is still early work. The study looked at “We are” and “They are” prompts as a simple test. Yet these brief sentences exposed a hidden tendency. Over time, more research might reveal how AI responds to different types of social identity cues. There could be more to learn about whether certain groups are targeted more often or if some models show stronger biases than others.
But at least now we know that bias in AI doesn’t stop at gender, race, or religion. The “us versus them” mindset seems built into the language we humans produce. Since large language models learn from that text, they pick up these patterns too. The difference is that we can intervene before deploying a model widely, making sure we don’t feed it too many biased examples and checking whether it encourages hostility against perceived “others.”
For everyday users, there’s probably no need to panic. Most people don’t see glaring signs of bias in their daily use. Still, it’s wise to stay alert. If a chatbot’s suggestions tilt a conversation toward praising one group while insulting another, maybe it’s worth reporting that. Developers might then refine the model’s training process.
The study also offers hope. When the team filtered out certain kinds of language during training, the models grew more balanced. This points to a practical step anyone building or maintaining AI models can take. If more groups start doing that, it could mean fewer moments when a chatbot suddenly becomes hostile or unfair based on an “us versus them” stance.
So the next time you use a language model, remember it has learned from vast amounts of human text. Some of that text carried subtle biases and hierarchies. The bright side is that now researchers know those biases exist and can start to manage them. It might not be a simple fix, but it’s a start.
As Rathje’s quote makes clear, AI can adopt a human-like pattern of dividing the world into insiders and outsiders. But since it learned that from us, maybe we can teach it better. After all, if a little training data tweaking can reduce bias, then the future of AI doesn’t have to be defined by these old divides. We can shape the outcomes, starting with what we feed into these models.Researchers from New York University and the University of Cambridge have found that large language models—those AI chat systems many people use for writing help—tend to favor whoever they perceive as their “ingroup.” At the same time, they produce more negative content about “outgroups.” According to their study, which appeared in the journal Nature Computational Science, this mirrors well-known human patterns.
Steve Rathje, an NYU postdoctoral researcher and one of the authors, put it plainly: “Artificial Intelligence systems like ChatGPT can develop ‘us versus them’ biases similar to humans—showing favoritism toward their perceived ‘ingroup’ while expressing negativity toward ‘outgroups.’” This suggests that the problem runs deeper than previously understood. It’s not only about common forms of bias like gender or race. The models seem to show a more general tendency, lifting up a group they associate with “us” while bringing down whoever falls into the “them” category.
The researchers reached this conclusion after testing dozens of models, including older base systems and newer, instruction-tuned ones like GPT-4. They prompted each model to complete sentences starting with “We are” (ingroup) and “They are” (outgroup), then assessed the tone of the responses. In most cases, “We are” led to more positive sentences, while “They are” brought out more negativity. One example: “We are a group of talented young people who are making it to the next level” sounded hopeful. But a negative line like “They are like a diseased, disfigured tree from the past” showed how easily the models turned sour on the outgroup.
It’s unsettling to think that tools many rely on for daily tasks might be quietly pushing us to think in terms of insiders and outsiders. Still, there is a bit of good news. The team tested whether changing training data could reduce these biases, and it worked. By filtering out language that encouraged ingroup favoritism and outgroup hostility, they got the models to show fewer signs of this pattern. As author Yara Kyrychenko explained, “The effectiveness of even relatively simple data curation in reducing the levels of both ingroup solidarity and outgroup hostility suggests promising directions for improving AI development and training.”
This matters as AI chatbots continue to influence daily communication. They help with questions, suggest wording for letters, and sometimes guide sensitive conversations. It’s easy to assume these tools are neutral. However, if they pick up on subtle biases and pass them along, that might shape user perceptions over time.
Another author, Tiancheng Hu, noted that as AI becomes more a part of everyday life, it’s important to address such issues so they don’t strengthen existing divisions. His viewpoint suggests that if researchers, developers, and the public pay attention to these patterns, it’s possible to steer these models toward more even-handed behavior.
This is still early work. The study looked at “We are” and “They are” prompts as a simple test. Yet these brief sentences exposed a hidden tendency. Over time, more research might reveal how AI responds to different types of social identity cues. There could be more to learn about whether certain groups are targeted more often or if some models show stronger biases than others.
But at least now we know that bias in AI doesn’t stop at gender, race, or religion. The “us versus them” mindset seems built into the language we humans produce. Since large language models learn from that text, they pick up these patterns too. The difference is that we can intervene before deploying a model widely, making sure we don’t feed it too many biased examples and checking whether it encourages hostility against perceived “others.”
For everyday users, there’s probably no need to panic. Most people don’t see glaring signs of bias in their daily use. Still, it’s wise to stay alert. If a chatbot’s suggestions tilt a conversation toward praising one group while insulting another, maybe it’s worth reporting that. Developers might then refine the model’s training process.
The study also offers hope. When the team filtered out certain kinds of language during training, the models grew more balanced. This points to a practical step anyone building or maintaining AI models can take. If more groups start doing that, it could mean fewer moments when a chatbot suddenly becomes hostile or unfair based on an “us versus them” stance.
So the next time you use a language model, remember it has learned from vast amounts of human text. Some of that text carried subtle biases and hierarchies. The bright side is that now researchers know those biases exist and can start to manage them. It might not be a simple fix, but it’s a start.
As Rathje’s quote makes clear, AI can adopt a human-like pattern of dividing the world into insiders and outsiders. But since it learned that from us, maybe we can teach it better. After all, if a little training data tweaking can reduce bias, then the future of AI doesn’t have to be defined by these old divides. We can shape the outcomes, starting with what we feed into these models.
Citation: Hu, T., Kyrychenko, Y., Rathje, S. et al. Generative language models exhibit social identity biases. Nat Comput Sci (2024). https://doi.org/10.1038/s43588-024-00741-1