Study: Some language reward models exhibit political bias

Research from the MIT Center for Constructive Communication finds this effect occurs even when reward models are trained on factual data.

MIT News | 12.12.2024