Back To Home

ideological books corpus

The Ideological Books Corpus (IBC) consists of 4,062 sentences annotated for political ideology at a sub-sentential level as described in our paper. Specifically, it contains 2025 liberal sentences, 1701 conservative sentences, and 600 neutral sentences. Each sentence is represented by a parse tree where annotated nodes are associated with a label in {liberal, conservative, neutral}.

A 150-sentence sample of the data can be found here, along with a Python script that shows how to access the sentences, phrases, and annotations.

To obtain the full dataset, or for any questions / comments about the data, please send me an email at

If you use the IBC in your research, please cite the original IBC paper in addition to ours (e.g., "we used the Ideological Books Corpus (Sim et al., 2013) with sub-sentential annotations (Iyyer et al., 2014) for our work..."):