2,202 research outputs found
Lexical typology : a programmatic sketch
The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Automated system for the creation and replenishment of users' electronic lexicographical resources
This article proposes a solution to improve the efficiency of automated generation of electronic lexicographical resources based on strongly-structured electronic information arrays processing. The developed automated information system for lexicographical resources creation and replenishment have been described is this article. Several supporting subsystems of developed automated system have been characterized. The effectiveness of the information system has been evaluated
Marrying Universal Dependencies and Universal Morphology
The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects
each present schemata for annotating the morphosyntactic details of language.
Each project also provides corpora of annotated text in many languages - UD at
the token level and UniMorph at the type level. As each corpus is built by
different annotators, language-specific decisions hinder the goal of universal
schemata. With compatibility of tags, each project's annotations could be used
to validate the other's. Additionally, the availability of both type- and
token-level resources would be a boon to tasks such as parsing and homograph
disambiguation. To ease this interoperability, we present a deterministic
mapping from Universal Dependencies v2 features into the UniMorph schema. We
validate our approach by lookup in the UniMorph corpora and find a
macro-average of 64.13% recall. We also note incompatibilities due to paucity
of data on either side. Finally, we present a critical evaluation of the
foundations, strengths, and weaknesses of the two annotation projects.Comment: UDW1
Recognition and translation Arabic-French of Named Entities: case of the Sport places
The recognition of Arabic Named Entities (NE) is a problem in different
domains of Natural Language Processing (NLP) like automatic translation.
Indeed, NE translation allows the access to multilingual in-formation. This
translation doesn't always lead to expected result especially when NE contains
a person name. For this reason and in order to ameliorate translation, we can
transliterate some part of NE. In this context, we propose a method that
integrates translation and transliteration together. We used the linguis-tic
NooJ platform that is based on local grammars and transducers. In this paper,
we focus on sport domain. We will firstly suggest a refinement of the
typological model presented at the MUC Conferences we will describe the
integration of an Arabic transliteration module into translation system.
Finally, we will detail our method and give the results of the evaluation
Automatic indexing and retrieval as a tool to improve information and technology transfer
During the last 20 years, linguistic data processing mainly has been seen as a tool to develop linguistic regularities (or detect irregularities) of a given natural language, especially to handle large textual databases ("Corpora"). A second motivation to use a computer was to test some theories or models of a language system (or a part of it) using a simulation program. As a result of both strategies, the "Saarbrücken Text Analysis System" has been implemented. At present, a very large lexical database is available to analyse written German texts morphologically and syntactically. The syntactic parser is able to handle every German sentence with more than 90% "correct" results. On the other hand, the development of large (textual) databases within different fields (e.g. law, patent specifications, medicine) is increasing rapidly. Therefore, a computer aided indexing system ("Computergestützte Texterschließung: CTX") has been developed at Regensburg and Saarbrücken University to improve the (even natural language oriented) access to textual data ("free text") applying linguistic strategies to information retrieval processes.
Main results of feasibility studies, especially in the field of German Patent Documentation, are presented
- …