FBK > IT > Content

Arianna Bisazza (FBK - irst)

21 Jan 2010 - 10:00
21 Jan 2010 - 11:00

Morphological Pre-Processing for Turkish-to-English Statistical Machine Translation

Sala Grande Palazza B, Thursday, 21st Jan 2010, h:10.00 

 Morphology plays a fundamental role in any NLP application involving agglutinative languages. This is particularly true for statistical machine translation (SMT) from Turkish into English, because of the severe mismatch between word formation mechanisms of the two languages. We approached this problem through morphological segmentation of Turkish, by taking advantage of linguistic knowledge of both the source and target languages. In particular we focused on the comparison of different segmentation rule sets in order to find an effective preprocessing scheme for the Turkish-English task organized by the IWSLT09 workshop. By minimizing differences between lexical granularities of source and target languages, we could produce more refined alignments and a better modeling of the translation task, which resulted in a considerable improvement of the translation quality. This work shows how a specific linguistic preprocessing can benefit a purely statistics-based, language-independent NLP application like SMT.