WP4 Improve dictionary coverage for single words


In order to expand the dictionaries using a set of monolingual comparable corpora, the basic approach pioneered by Fung & Keown and ourselves will be further developed and refined as to obtain a practical tool that can be used in an industrial context. The basic assumption underlying the approach is that across languages there is a correlation between the co- occurrences of words that are translations of each other. If †for example †in a text of one language two words A and B co-occur more often than expected by chance, then in a text of another language those words that are translations of A and B should also co-occur more frequently than expected. As a basic dictionary is required for this method, we will exploit the information of the dictionaries from WP 3 as input. We will apply the method to the monolingual comparable corpora compiled in WP 1 and will use the results for extending the dictionaries.

  • R. Rapp, S. Sharoff and B. Babych (2012) Identifying Word Translations from Comparable Documents Without a Seed Lexicon, In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey