Pollster introduces a new approach for measuring the effects of media consumption, and for predicting media-driven public opinion. Our approach leverages advances in deep neural network -based language modeling to model populations with specific media diets. We validate our approach using ground-truth surveys in two domains: attitudes towards COVID-19, and consumer confidence. This approach could be used to supplement existing surveys, help public health officials and others improve their messaging, and ultimately inform policy makers.
Media has a large impact on society, such as controlling the diffusion of information, setting national agendas, framing key political and economic issues, and activating public expression, but measuring and predicting the effects of consuming particular media diets remains a difficult task. One important effect that encapsulates and mediates several of these outcomes is the media's influence on public opinion. Public opinion can serve as a “thermostat” of public will, with shifts in public opinion predicting shifts in public policy and spurring social movements. The need to better understand and predict the relationship between media and public opinion has also garnered increasing attention in recent years, as concerns mount about a misinformed public, fake news, and echo chambers as they relate to the functioning of a healthy democracy. Despite this, the traditional tool for understanding public opinion and media effects – surveys – is expensive, rigid, and limited in its ability to be connected to the content of media messaging.
Recent natural language processing research has explored “probing” neural language models for factual information. We extended this concept to develop “media diet models” – pretrained language models adapted to news, TV broadcast, and radio show content – that can be probed with fill-in-the-blank cloze-style prompts to predict the beliefs of someone who consumes a particular media diet. Our approach solves several shortcomings in media effects and surveying simultaneously: (1) using a large, pretrained language model like BERT models the semantic content of media messages, (2) the probing connects that content representation to specific questions, and (3) the model can be repeatedly probed with infinite questions. We validate our approach by predicting beliefs measured in ground-truth, nationally representative surveys on COVID-19 and consumer confidence.