2,202 research outputs found

    Lexical typology : a programmatic sketch

    Get PDF
    The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Automated system for the creation and replenishment of users' electronic lexicographical resources

    Get PDF
    This article proposes a solution to improve the efficiency of automated generation of electronic lexicographical resources based on strongly-structured electronic information arrays processing. The developed automated information system for lexicographical resources creation and replenishment have been described is this article. Several supporting subsystems of developed automated system have been characterized. The effectiveness of the information system has been evaluated

    Marrying Universal Dependencies and Universal Morphology

    Full text link
    The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. With compatibility of tags, each project's annotations could be used to validate the other's. Additionally, the availability of both type- and token-level resources would be a boon to tasks such as parsing and homograph disambiguation. To ease this interoperability, we present a deterministic mapping from Universal Dependencies v2 features into the UniMorph schema. We validate our approach by lookup in the UniMorph corpora and find a macro-average of 64.13% recall. We also note incompatibilities due to paucity of data on either side. Finally, we present a critical evaluation of the foundations, strengths, and weaknesses of the two annotation projects.Comment: UDW1

    Recognition and translation Arabic-French of Named Entities: case of the Sport places

    Get PDF
    The recognition of Arabic Named Entities (NE) is a problem in different domains of Natural Language Processing (NLP) like automatic translation. Indeed, NE translation allows the access to multilingual in-formation. This translation doesn't always lead to expected result especially when NE contains a person name. For this reason and in order to ameliorate translation, we can transliterate some part of NE. In this context, we propose a method that integrates translation and transliteration together. We used the linguis-tic NooJ platform that is based on local grammars and transducers. In this paper, we focus on sport domain. We will firstly suggest a refinement of the typological model presented at the MUC Conferences we will describe the integration of an Arabic transliteration module into translation system. Finally, we will detail our method and give the results of the evaluation

    Automatic indexing and retrieval as a tool to improve information and technology transfer

    Get PDF
    During the last 20 years, linguistic data processing mainly has been seen as a tool to develop linguistic regularities (or detect irregularities) of a given natural language, especially to handle large textual databases ("Corpora"). A second motivation to use a computer was to test some theories or models of a language system (or a part of it) using a simulation program. As a result of both strategies, the "Saarbrücken Text Analysis System" has been implemented. At present, a very large lexical database is available to analyse written German texts morphologically and syntactically. The syntactic parser is able to handle every German sentence with more than 90% "correct" results. On the other hand, the development of large (textual) databases within different fields (e.g. law, patent specifications, medicine) is increasing rapidly. Therefore, a computer aided indexing system ("Computergestützte Texterschließung: CTX") has been developed at Regensburg and Saarbrücken University to improve the (even natural language oriented) access to textual data ("free text") applying linguistic strategies to information retrieval processes. Main results of feasibility studies, especially in the field of German Patent Documentation, are presented
    corecore