5,501 research outputs found

    Using WordNet for Building WordNets

    Full text link
    This paper summarises a set of methodologies and techniques for the fast construction of multilingual WordNets. The English WordNet is used in this approach as a backbone for Catalan and Spanish WordNets and as a lexical knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL

    Lexical typology : a programmatic sketch

    Get PDF
    The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

    Bilingualism and the single route/dual route debate

    Get PDF
    The debate between single and dual route accounts of cognitive processes has been generated predominantly by the application of connectionist modeling techniques to two areas of psycholinguistics. This paper draws an analogy between this debate and bilingual language processing. A prominent question within bilingual word recognition is whether the bilingual has functionally separate lexicons for each language, or a single system able to recognize the words in both languages. Empirical evidence has been taken to support a model which includes two separate lexicons working in parallel (Smith, 1991; Gerard and Scarborough, 1989). However, a range of interference effects has been found between the bilingual’s two sets of lexical knowledge (Thomas, 1997a). Connectionist models have been put forward which suggest that a single representational resource may deal with these data, so long as words are coded according to language membership (Thomas, 1997a, 1997b, Dijkstra and van Heuven, 1998). This paper discusses the criteria which might be used to differentiate single route and dual route models. An empirical study is introduced to address one of these criteria, parallel access, with regard to bilingual word recognition. The study fails to find support for the dual route model

    Combining Multiple Methods for the Automatic Construction of Multilingual WordNets

    Full text link
    This paper explores the automatic construction of a multilingual Lexical Knowledge Base from preexisting lexical resources. First, a set of automatic and complementary techniques for linking Spanish words collected from monolingual and bilingual MRDs to English WordNet synsets are described. Second, we show how resulting data provided by each method is then combined to produce a preliminary version of a Spanish WordNet with an accuracy over 85%. The application of these combinations results on an increment of the extracted connexions of a 40% without losing accuracy. Both coarse-grained (class level) and fine-grained (synset assignment level) confidence ratios are used and evaluated. Finally, the results for the whole process are presented.Comment: 7 pages, 4 postscript figure

    Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus

    Get PDF
    Within the project Twenty-One, which aims at the effective dissemination of information on ecology and sustainable development, a sytem is developed that supports cross-language information retrieval in any of the four languages Dutch, English, French and German. Knowledge of this application domain is needed to enhance existing translation resources for the purpose of lexical disambiguation. This paper describes an algorithm for the automated acquisition of a translation lexicon from a parallel corpus. New about the presented algorithm is the statistical language model used. Because the algorithm is based on a symmetric translation model it becomes possible to identify one-to-many and many-to-one relations between words of a language pair. We claim that the presented method has two advantages over algorithms that have been published before. Firstly, because the translation model is more powerful, the resulting bilingual lexicon will be more accurate. Secondly, the resulting bilingual lexicon can be used to translate in both directions between a language pair. Different versions of the algorithm were evaluated on the Dutch and English version of the Agenda 21 corpus, which is a UN document on the application domain of sustainable development

    PRESERVING VERNACULARS IN INDONESIA: A BILINGUAL VERNACULAR-ENGLISH DICTIONARY APPROACH

    Get PDF
    English learners in Indonesia learn the English language through the Indonesian language, the language oinstruction in the country's education, despite the fact that 80% of the country's population speakvernaculars as mother tongue. The provision of materials for learning, including bilingual dictionaries,therefore follow this convention while bilingual dictionaries accommodating the learners speakingvernaculars natively are barely provided. This condition insists that every Indonesian must comprehend theIndonesian language first to learn English albeit theories on foreign language learning suggest theotherwise. Apart from this, the use of vernaculars of Indonesia itself tends to decline yet the bilinguadictionaries linking the vernaculars with a widely-known language such as English still lack. This articleelaborates the issues of (1) English vocabulary learning and (2) the maintenance of the vernaculars oIndonesia with discussions about Butzkamm's theory and UNESCO's suggestion on foreign languagelearning, Nation's New General Service List as the core of the English vocabulary, and the application otechnology in the lexicography of bilingual dictionary. Choosing Cirebon dialect of Javanese as an example,this article suggests that the provision of a bilingual dictionary functioning as a reference material foEnglish vocabulary learning yet as a documentation of vernacular maintenance is possible

    Automatic Construction of Clean Broad-Coverage Translation Lexicons

    Full text link
    Word-level translational equivalences can be extracted from parallel texts by surprisingly simple statistical techniques. However, these techniques are easily fooled by {\em indirect associations} --- pairs of unrelated words whose statistical properties resemble those of mutual translations. Indirect associations pollute the resulting translation lexicons, drastically reducing their precision. This paper presents an iterative lexicon cleaning method. On each iteration, most of the remaining incorrect lexicon entries are filtered out, without significant degradation in recall. This lexicon cleaning technique can produce translation lexicons with recall and precision both exceeding 90\%, as well as dictionary-sized translation lexicons that are over 99\% correct.Comment: PostScript file, 10 pages. To appear in Proceedings of AMTA-9
    corecore