1 research outputs found

    UTACLIR @ CLEF 2001: New features for handling compound words and untranslatable proper names

    No full text
    We participated in CLEF’2001 with four automated bilingual runs. UTACLIR is an automatic query translation and construction system for cross-language information retrieval. The system automatically extracts topical information from request sentences written in one of the source languages and constructs a target language query, based on translations given by a translation dictionary. The new features for the CLIR process from Finnish, Swedish and German to English focus on matching compound words and a new n-gram based technique for matching proper names and other non-translatable words. The results for all the four runs are good. Average precision for all the queries shows clear improvements. For German – English we have tested two types of dictionaries (two runs). The first one included all translations from the standard dictionary. The second contained the same data, except that all direct translations of compounds were excluded. The test with two dictionaries for the German runs gives an indication that the new features in the UTACLIR process work well also with a limited dictionary.
    corecore