208 research outputs found

    Sheffield University CLEF 2000 submission - bilingual track: German to English

    Get PDF
    We investigated dictionary based cross language information retrieval using lexical triangulation. Lexical triangulation combines the results of different transitive translations. Transitive translation uses a pivot language to translate between two languages when no direct translation resource is available. We took German queries and translated then via Spanish, or Dutch into English. We compared the results of retrieval experiments using these queries, with other versions created by combining the transitive translations or created by direct translation. Direct dictionary translation of a query introduces considerable ambiguity that damages retrieval, an average precision 79% below monolingual in this research. Transitive translation introduces more ambiguity, giving results worse than 88% below direct translation. We have shown that lexical triangulation between two transitive translations can eliminate much of the additional ambiguity introduced by transitive translation

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

    Categories and classifications in EuroWordNet

    Get PDF
    In EuroWordNet we develop wordnets in 8 European languages, which are structured along the same lines as the Princeton WordNet. The wordnets are inter-linked in a multilingual database, where they can be compared. This comparison reveals many different lexicalizations of classes across the languages that also lead to important differences in the hierarchical structure of the wordnets. It is not feasible to include all these classes (the superset) in each language-specific wordnet and to reach consensus on the implicational effects across all the languages. Each wordnet is therefore limited to the lexicalized words and expressions of a language. The wordnets are thus autonomous language-specific structures that capture valuable information about the lexicalization of each language, which is important for information retrieval, machine translation and language generation. By connecting the wordnets to a separate ontology, semantic inferencing can still be guaranteed. Still, different types of classification schemes can be distinguished among the lexicalized classes. In this paper we will further describe the properties of these different classes and discuss the advantages and effects of distinguishing them in wordnet-like structures

    Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

    Get PDF
    This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

    EuroWordNet: final report

    Get PDF

    EuroWordNet: building a multilingual database with Wordnets for European languages

    Get PDF

    Automatic sense clustering in EuroWordNet

    Get PDF
    This paper addresses ways in which we envisage to reduce the fine-grainedness of WordNet and express in a more systematic way the relations between its numerous sense distinctions. In the EuroWordNet project, we have distinguished various automatic methods for grouping senses into more coarse-grained sense groups. These resulting clusters reflect aspects of lexical organization, displaying a variety of semantic regularities or generalizations. In this way, the compatibility of the language-specific wordnets in the EuroWordNet multilingual knowledge base is increased
    corecore