Extracting Protest Events from Newspaper Articles with ChatGPT

Abstract

This research note examines the abilities of a large language model (LLM), ChatGPT, to extract structured data on protest events from media accounts. Based on our analysis of 500 articles on Black Lives Matter protests, after an iterative process of prompt improvement on a training dataset, ChatGPT can produce data comparable to or better than a hand-coding method with an enormous reduction in time and minimal cost. While the technique has limitations, LLMs show promise and deserve further study for their use in protest event analysis.

Publication
SocArXiv
Neal Caren
Neal Caren
Associate Professor of Sociology

My research interests include social movements, protest events, web scraping, and text analysis.