On the Relationship between Truth and Political Bias in Language Models

Even when language reward models are trained only on true versus false data, they can still display a political bias.

Political bias has been well-documented in large language models. However, our study shows that this bias can show up even if LLMs are aligned only on objectively true versus false data. This has interesting implications for AI alignment, as it potentially points to biases in the pretrained models that are then exacerbated during fine-tuning.

LLM Bias and Equity, Political Bias, AI Alignment

Home     >     Research     >     On the Relationship between Truth and Political Bias in Language Models


In the news

Study: Some language reward models exhibit political bias
Research from the MIT Center for Constructive Communication finds this effect occurs even when reward models are trained on factual data.

Large language models (LLMs) that drive generative artificial intelligence apps, such as ChatGPT, have been proliferating at lightning speed and have improved to the point that it is often impossible to distinguish between something written through generative AI and human-composed text. However, these models can also sometimes generate false statements or display a political bias.

12.12.2024 | MIT News