You are here
A Computational System for Poetry Style Analysis and an Expressive Poetry Reader
Wednesday, 26 February, 2014 - 15:00
We present a system that computes an extended number of linguistic parameters to evaluate a poem, a set or collection of poems and compares the results. The system can also compare the collection of poems of two or more poets and choose similar poems, on the basis of Pearson's correlation coefficients. Parameters are extracted from a number of complementary fields that coexist in poetry and make it so special when compared to written text: a first field, is the one of surface quantitative measures - also used in the past and currently in the field of Corpus Linguistics -, like number of characters per word, number of words per sentence, number of sentences per poem. More specific measures are derived from semantic lexical classification: number of rare words, abstract words vs concrete words; number of adjectives, verbs, nouns, conjunctions; then words expressing affectivity or emotions. A second field is derived from text complexity measures - also part of PITR workshops on Text Readability -, this time from syntactic and semantic analysis at sentence and text level, measuring how many complex noun groups there are; how many complex and subordinate sentences there are, etc. More specific measures regard discourse structure computed at clause level and then paramount to modify intonational level parameters to input to the Text-To-Speech system or TTS. Eventually, the more specific field regards how a poem sounds: this is derived from a complete translation of the poem into phonetic symbols. This translation is based on the existence of phonetic dictionaries for English, that can be taken off-the-shelf (in this case from CMU); but it is itself complicated by the need to supply translations for words which are out of vocabulary or OOVWs, which is quite frequent and is done in our case by a dedicated phonological parser. As a more specific additional analysis, the system also provides standardized theoretic durational measures in msecs at syllable level: this is partly also derived from a database but requires further algorithms for missing syllables provided by the same parser. All these measures are then evaluated statistically to derive mean values and standard deviations, and are used to decide how similar lines are organized into stanzas or simply how many repetitions there are, and how well are these distributed in the poem. Phonetic transcription is then used to search for rhetorical devices like rhymes, assonances, alliterations etc. that are used in poetry to characterize the poet's style and set the poem apart from other poems.
We use these measure to build a number of macro-indices which are graphically visualized and allow the user to compare instantly two or more different poems. They are: Poetic Rhetoric Devices, Metrical Length, Semantic Density, Prosodic Structure Dispersion, Deep Conceptual Index, Rhyming Scheme Comparison. Parameters values, standard deviation, skewness and kurtosis are then used to distribute poems in a space and organize clusters for those poems which are similar: in this case, outliers are again visually identifiable.
Eventually, all the previously derived parameters, together with the phonetic transcription and the durational prosodic measures are used to tell a TTS system how to
pronounce the poem in order to make its reading pleasant and attractive, by adding expressive features to the appropriate word, line and sentence.
A demo will be presented.
A description of the system is available here
Rodolfo Delmonte is Associate Professor of Computational Linguistics at Ca' Foscari University of Venice from 1985, and he teaches courses at all levels for students from both humanities and computer science. He graduated in Venice and got his Ph.D. at Monash University, Melbourne - Australia in 1976. He started his career collaborating with the Faculty of Engineering of the University of Padua and worked at the backbone of a system for Text to Speech which in the '80s was funded by Digital Equipment for the Italian version of DecTalk. Another important achievement was a multimedia and multilingual system for language learning called SLIM in the beginning of the '90s, which incorporated ASR and Speech Synthesis.
He has more than 150 publications, including 8 books. He is referee for a number of international journals, including Speech Communication, where he published a paper in 2000, on the importance of prosody for speech recognition systems in the field of language learning. His latest work includes two books with the title Computational Linguistic Text Processing (from NovaScience Publishers - New York), dedicated the first to sentence level processing and the second to discourse level which includes chapters on the use of semantic and pragmatic approaches to text processing. Amongst latest publications, he has produced a chapter with the title "Getting Past the Language Gap: Innovations in Machine Translation", where he describes state of the art and best practices for future improvements in the field of MT.
He has been invited professor in 2004 Boulder, Colorado, and in 2006 UTD Texas. In 2007 he was invited by the University of Besançon. In 2010 he was keynote speaker at ALTA conference in Sydney where he delivered the talk "OPINION MINING, SUBJECTIVITY and FACTUALITY". In 2011 he was invited by German Rigau to give a course on semantics in NLP at IXA - San Sebastian. He is referee for a number of international journals and is part of Scientific and Program Committees of six/seven conferences and workshops on advanced themes in computational linguistics every year, including ACL.