FBK > IT > HLT

Technology

Research results lead to open source software and solutions for the industry.

The HLT unit develops state-of-the-art technology in all the main research areas it operates in. The group has performed consistently well in several international evaluations, and is currently engaged in international projects for open source software development (e.g. the Moses platform for statistical machine translation). Research on speech recognition also meets the highest standards, and has reached the application market in several occasions.
Moreover, people of the unit are key-players of many international initiatives around evaluation and benchmarking. HLT provides technological support and high-level services in order to optimize the activities of the Research Unit. Providing a shared and efficient environment, specific for the HLT issues, ranges from the management of special hardware equipments and software tools, up to the creation and management of large scale linguistic resources.

Software

  • EDITS (Edit Distance Textual Entailment Suite) is an open source software package aimed at recognizing entailment relations between two portions of text
  • TextPro is a suite of modular Natural Language Processing (NLP) tools for analysis of Italian and English texts.
  • Moses is a phrase-based decoder for statistical machine translation
  • IRSTLM is a toolkit for statistical language modeling
  • jLSI is an open source Java tool for Latent Semantic Indexing
  • jSRE is an open source Java tool for Relation Extraction
  • jWeb1T is an open source Java tool for efficiently searching the Web 1T 5-gram corpus
  • The Tool-box for lexicographers: a web-based application for accessing and updating lexical resources
  • TIES: Trainable Information Extraction System
  • jFex and jInFil: java tools for Feature Extraction and Instance Filtering
  • XIG: a system for generating Italian sentences from an interlingua representation (Interchange Format)
  • jExSLI is a open source java tool for language identification.

Databases

  • MultiWordNet: a Multilingual (English/Italian) Lexical Database
  • WordNet Domains: a systematic labelling of WordNet synsets with domain labels

Corpora

  • Textual Entailment Specialized Data Sets: 90 RTE-5 Test Set pairs annotated with linguistic phenomena + 203 monothematic pairs (i.e. pairs where only one linguistic phenomenon is relevant to the entailment relation) created from the 90 annotated pairs. Provided jointly with CELCT.
  • MultiSemCor: an English/Italian parallel corpus
  • CORPS: CORpus of tagged Political Speaches
  • I-CAB: Italian Content Annotation Bank
  • QALL-ME Benchmark: annotated spoken requests in the tourism domain (Italian, Spanish, English and German)

Electronic Dictionaries/Spell Checkers

Demos

  • TextPro: a suit of tools for analysis of English and Italian texts