371 research outputs found

    Building A Semantic-Primitive-Based Lexical Consultation System.

    Get PDF
    The paper describes the design of semantic primitive-based lexical consultation system and the possible processes which will be performed on a machine-readable dictionary (MRD) and corpus to produce a machine-tractable dictionary

    TRANSDUCER FOR AUTO-CONVERT OF ARCHAIC TO PRESENT DAY ENGLISH FOR MACHINE READABLE TEXT: A SUPPORT FOR COMPUTER ASSISTED LANGUAGE LEARNING

    Get PDF
    There exist some English literary works where some archaic words are still used; they are relatively distinct from Present Day English (PDE). We might observe some archaic words that have undergone regular changing patterns: for instances, archaic modal verbs like mightst, darest, wouldst. The –st ending historically disappears, resulting on might, dare and would. (wouldst > would). However, some archaic words undergo distinct processes, resulting on unpredictable pattern; The occurrence frequency for archaic english pronouns like thee ‘you’, thy ‘your’, thyself ‘yourself’ are quite high. Students that are Non-Native speakers of English might come across many difficulties when they encounter English texts which include these kinds of archaic words. How might computer be a help for the student? This paper aims on providing some supports from the perspective of Computer Assisted Language Learning (CALL). It proposes some designs of lexicon transducers by using Local Grammar Graphs (LGG) for auto-convert of the archaic words to PDE in a literature machine readable text. The transducer is applied to a machine readable text that is taken from Sir Walter Scott’s Ivanhoe. The archaic words in the corpus can be converted automatically to PDE. The transducer also allows the presentation of the two forms (Arhaic and PDE), the PDE lexicons-only, or the original (Archaic Lexicons) form-only. This will help students in understanding English literature works better. All the linguistic resources here are machine readable, ready to use, maintainable and open for further development. The method might be adopted for lexicon tranducer for another language too

    Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary

    Get PDF
    International audienceGLAWI is a free, large-scale and versatile Machine-Readable Dictionary (MRD) that has been extracted from the French language edition of Wiktionary, called Wiktionnaire. In (Sajous and Hathout, 2015), we introduced GLAWI, gave the rationale behind the creation of this lexicographic resource and described the extraction process, focusing on the conversion and standardization of the heterogeneous data provided by this collaborative dictionary. In the current article, we describe the content of GLAWI and illustrate how it is structured. We also suggest various applications, ranging from linguistic studies, NLP applications to psycholinguistic experimentation. They all can take advantage of the diversity of the lexical knowledge available in GLAWI. Besides this diversity and extensive lexical coverage, GLAWI is also remarkable because it is the only free lexical resource of contemporary French that contains definitions. This unique material opens way to the renewal of MRD-based methods, notably the automated extraction and acquisition of semantic relations

    Sense Tagging: Semantic Tagging with a Lexicon

    Full text link
    Sense tagging, the automatic assignment of the appropriate sense from some lexicon to each of the words in a text, is a specialised instance of the general problem of semantic tagging by category or type. We discuss which recent word sense disambiguation algorithms are appropriate for sense tagging. It is our belief that sense tagging can be carried out effectively by combining several simple, independent, methods and we include the design of such a tagger. A prototype of this system has been implemented, correctly tagging 86% of polysemous word tokens in a small test set, providing evidence that our hypothesis is correct.Comment: 6 pages, uses aclap LaTeX style file. Also in Proceedings of the SIGLEX Workshop "Tagging Text with Lexical Semantics

    Converting Language Computation into Mathematical Operations

    Get PDF

    Sheffield University CLEF 2000 submission - bilingual track: German to English

    Get PDF
    We investigated dictionary based cross language information retrieval using lexical triangulation. Lexical triangulation combines the results of different transitive translations. Transitive translation uses a pivot language to translate between two languages when no direct translation resource is available. We took German queries and translated then via Spanish, or Dutch into English. We compared the results of retrieval experiments using these queries, with other versions created by combining the transitive translations or created by direct translation. Direct dictionary translation of a query introduces considerable ambiguity that damages retrieval, an average precision 79% below monolingual in this research. Transitive translation introduces more ambiguity, giving results worse than 88% below direct translation. We have shown that lexical triangulation between two transitive translations can eliminate much of the additional ambiguity introduced by transitive translation

    Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evaluation of Word Sense Disambiguation (WSD) methods in the biomedical domain is difficult because the available resources are either too small or too focused on specific types of entities (e.g. diseases or genes). We present a method that can be used to automatically develop a WSD test collection using the Unified Medical Language System (UMLS) Metathesaurus and the manual MeSH indexing of MEDLINE. We demonstrate the use of this method by developing such a data set, called MSH WSD.</p> <p>Methods</p> <p>In our method, the Metathesaurus is first screened to identify ambiguous terms whose possible senses consist of two or more MeSH headings. We then use each ambiguous term and its corresponding MeSH heading to extract MEDLINE citations where the term and only one of the MeSH headings co-occur. The term found in the MEDLINE citation is automatically assigned the UMLS CUI linked to the MeSH heading. Each instance has been assigned a UMLS Concept Unique Identifier (CUI). We compare the characteristics of the MSH WSD data set to the previously existing NLM WSD data set.</p> <p>Results</p> <p>The resulting MSH WSD data set consists of 106 ambiguous abbreviations, 88 ambiguous terms and 9 which are a combination of both, for a total of 203 ambiguous entities. For each ambiguous term/abbreviation, the data set contains a maximum of 100 instances per sense obtained from MEDLINE.</p> <p>We evaluated the reliability of the MSH WSD data set using existing knowledge-based methods and compared their performance to that of the results previously obtained by these algorithms on the pre-existing data set, NLM WSD. We show that the knowledge-based methods achieve different results but keep their relative performance except for the Journal Descriptor Indexing (JDI) method, whose performance is below the other methods.</p> <p>Conclusions</p> <p>The MSH WSD data set allows the evaluation of WSD algorithms in the biomedical domain. Compared to previously existing data sets, MSH WSD contains a larger number of biomedical terms/abbreviations and covers the largest set of UMLS Semantic Types. Furthermore, the MSH WSD data set has been generated automatically reusing already existing annotations and, therefore, can be regenerated from subsequent UMLS versions.</p
    corecore