Understanding LLM cultural and demographic knowledge

Probing LLMs to understand gaps in their knowledge around certain traits (gender, race, etc.)

Recent advances in large language models (LLMs) are seeing them employed directly as informational knowledge assistants. However, their knowledge, taken from large undocumented web corpora, has uneven and poorly understood representation across groups of people. In this work we isolate protected identity characteristics, including gender, sexual orientation, ethnicity, religion, and geography, and evaluate large language models knowledge across these groups. We find gaping asymmetries not only in knowledge, but also in the propensity and types of hallucinations observed by identity group. We plan to release our benchmark and evaluation tools for future work to probe misrepresentation in identity and culture.

Program: Maps
Phase: In Progress
Tags: LLM Bias and Equity
Umbrella Street in Romania. Photo by Haseeb Jamil on Unsplash.
Back to Projects