FBK > IT > HLT

Technology

Research results lead to open source software and solutions for the industry.

The HLT unit develops state-of-the-art technology in all the main research areas it operates in. The group has performed consistently well in several international evaluations, and is currently engaged in international projects for open source software development (e.g. the Moses platform for statistical machine translation). Research on speech recognition also meets the highest standards, and has reached the application market in several occasions.
Moreover, people of the unit are key-players of many international initiatives around evaluation and benchmarking. HLT provides technological support and high-level services in order to optimize the activities of the Research Unit. Providing a shared and efficient environment, specific for the HLT issues, ranges from the management of special hardware equipments and software tools, up to the creation and management of large scale linguistic resources.

Software

  • EDITS
    (Edit Distance Textual Entailment Suite) is an open source software
    package aimed at recognizing entailment relations between two portions
    of text
  • TextPro is a suite of modular Natural Language Processing (NLP) tools for analysis of Italian and English texts.
  • Moses is a phrase-based decoder for statistical machine translation
  • IRSTLM is a toolkit for statistical language modeling
  • jLSI is an open source Java tool for Latent Semantic Indexing
  • jSRE is an open source Java tool for Relation Extraction
  • jWeb1T is an open source Java tool for efficiently searching the Web 1T 5-gram corpus
  • The Tool-box for lexicographers: a web-based application for accessing and updating lexical resources
  • TIES: Trainable Information Extraction System
  • jFex and jInFil: java tools for Feature Extraction and Instance Filtering
  • XIG: a system for generating Italian sentences from an interlingua representation (Interchange Format)

Databases

  • MultiWordNet: a Multilingual (English/Italian) Lexical Database
  • WordNet Domains: a systematic labelling of WordNet synsets with domain labels

Corpora

  • MultiSemCor: an English/Italian parallel corpus
  • CORPS: CORpus of tagged Political Speaches
  • I-CAB: Italian Content Annotation Bank
  • QALL-ME Benchmark: annotated spoken requests in the tourism domain (Italian, Spanish, English and German)

Electronic Dictionaries/Spell Checkers

Demos

  • TextPro: a suit of tools for analysis of English and Italian texts