WP3 Extend basic dictionaries


Based on the parallel corpora acquired in WP 1, dictionaries will be derived using established techniques of automatic sentence alignment and word alignment. Some parallel corpora may already be sentence aligned. For word alignment Giza++ will be used. It is an implementation of translation models 1 to 5 as proposed by the IBM research group that pioneered statistical machine translation.

Source documents

Aligned Europarl texts

The Europarl source texts of the respective language pairs were aligned using the sentence aligner from the Europarl Project. Word alignments were established based on GIZA++ alignments using the Moses system.


statistical dictionaries from Europarl

200 most frequent words with most probable translation /all files in UTF-8):