PhD Defense: Developing and Measuring Latent Constructs in Text

Talk
Alexander Hoyle
Time: 
08.01.2024 11:00 to 13:00
Location: 

Constructs---like inflation, populism, or paranoia---are of fundamental concern to social science. Constructs are the vocabulary over which theory operates, and so a central activity is the development and measurement of latent constructs from observable data. Although the social sciences comprise fields with different epistemological norms, they share a concern for valid operationalizations that transparently map between data and measurement. Economists at the US Bureau of Labor Statistics, for example, follow a hundred-page handbook to sample the egg prices that constitute CPI-U; Clinical psychologists rely on suites of psychometric tests to diagnose schizophrenia.

In many fields, this observable data takes the form of language: as a social phenomenon, language data can encode many of the latent social constructs that social scientists care about. Commensurate with both increasing sophistication in language technologies and amounts of available data, there has thus emerged a "text-as-data" paradigm aimed at "amplifying and augmenting" the analyses that contribute to research. At the same time, Natural Language Processing (NLP), the field from which analysis tools originate, tends to remain separate from real-world problems and guiding theories.

This dissertation focuses on NLP methods and evaluations that facilitate two core activities in the social sciences: the development and measurement of latent constructs from natural language. These efforts remain sensitive to needs for interpretability and validity. This work is comprised of new methods to facilitate the inductive conceptualization and human-centered measurement of constructs; it also includes the validation of existing methods in the context of this use case.