208 research outputs found
Sheffield University CLEF 2000 submission - bilingual track: German to English
We investigated dictionary based cross language information
retrieval using lexical triangulation. Lexical triangulation combines the results
of different transitive translations. Transitive translation uses a pivot language
to translate between two languages when no direct translation resource is
available. We took German queries and translated then via Spanish, or Dutch
into English. We compared the results of retrieval experiments using these
queries, with other versions created by combining the transitive translations or
created by direct translation. Direct dictionary translation of a query introduces
considerable ambiguity that damages retrieval, an average precision 79% below
monolingual in this research. Transitive translation introduces more ambiguity,
giving results worse than 88% below direct translation. We have shown that
lexical triangulation between two transitive translations can eliminate much of
the additional ambiguity introduced by transitive translation
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
Categories and classifications in EuroWordNet
In EuroWordNet we develop wordnets in 8 European languages, which are structured along the same lines as the Princeton WordNet. The wordnets are inter-linked in a multilingual database, where they can be compared. This comparison reveals many different lexicalizations of classes across the languages that also lead to important differences in the hierarchical structure of the wordnets. It is not feasible to include all these classes (the superset) in each language-specific wordnet and to reach consensus on the implicational effects across all the languages. Each wordnet is therefore limited to the lexicalized words and expressions of a language. The wordnets are thus autonomous language-specific structures that capture valuable information about the lexicalization of each language, which is important for information retrieval, machine translation and language generation. By connecting the wordnets to a separate ontology, semantic inferencing can still be guaranteed. Still, different types of classification schemes can be distinguished among the lexicalized classes. In this paper we will further describe the properties of these different classes and discuss the advantages and effects of distinguishing them in wordnet-like structures
Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon
This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are
extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented
Automatic sense clustering in EuroWordNet
This paper addresses ways in which we envisage to reduce the fine-grainedness of WordNet and express in a more systematic way the relations between its numerous sense distinctions. In the EuroWordNet project, we have distinguished various automatic methods for grouping senses into more coarse-grained sense groups. These resulting clusters reflect aspects of lexical organization, displaying a variety of semantic regularities or generalizations. In this way, the compatibility of the language-specific wordnets in the EuroWordNet multilingual knowledge base is increased
- …