CORPS: A CORpus of tagged Political Speeches
January 2011: Thanks to the collaboration with CELCT a new release of CORPS is now available.
CORPS is a corpus of political speeches tagged with specific audience reactions, such as APPLAUSE or LAUGHTER.
In collecting this corpus we relied on the hypothesis that such tags about public reaction are indicators of hotspots, where persuasion attempts succeeded or, at least, a persuasive attempt has been recognized by the audience. This corpus can be usefully employed in many fields such as:
- qualitative analysis of political communication.
- NLP based persuasive expression mining.
- Automatic production of persuasive communication
At present, there are more than 3600 speeches in the corpus, about 7.9 millions words, and more than 67 thousand tags about audience reaction.
The speeches are all native English language, and all represent monological situations (i.e. there is only one speaker addressing an audience).
These speeches have been collected from internet, and an automatic conversion of audience reactions tags - to make them homogeneous in formalism and labeling - has been performed.
Metadata regarding the speech has also been added (title, event, speaker, date, description).
How to obtain it
CORPS is freely available for research purposes, for further information please write to guerini[at]fbk.eu and strappa[at]fbk.eu
Whenever making reference to this resource please cite one of the paper in the reference section.
Guerini M., Strapparava C. & Stock O. “CORPS: A Corpus of Tagged Political Speeches for Persuasive Communication Processing”. Journal of Information Technology & Politics, 5(1): 19-32, Routledge, 2008.
Guerini M., Strapparava C. & Stock O. “Audience Reactions for information extraction about persuasive language in political communication”. In M. Maybury (ed.) Multimodal Information Extraction, MIT press, to appear.