24 research outputs found

    Detección de la unidad central en dos géneros y lenguajes diferentes: un estudio preliminar en portugués brasileño y euskera

    Get PDF
    The aim of this paper is to present the development of a rule-based automatic detector which determines the main idea or the most pertinent discourse unit in two different languages such as Basque and Brazilian Portuguese and in two distinct genres such as scientific abstracts and argumentative answers. The central unit (CU) may be of interest to understand texts regarding relational discourse structure and it can be applied to Natural Language Processing (NLP) tasks such as automatic summarization, question-answer systems or sentiment analysis. In the case of argumentative answer genre, the identification of CU is an essential step for an eventual implementation of an automatic evaluator for this genre. The theoretical background which underlies the paper is Mann and Thompson’s (1988) Rhetorical Structure Theory (RST), following discourse segmentation and CU annotation. Results show that the CUs in different languages and in different genres are detected automatically with similar results, although there is space for improvement.El objetivo de este trabajo es presentar las mejoras de un detector automático basado en reglas que determina la idea principal o unidad discursiva más pertinente de dos lenguas tan diferentes como el euskera y el portugués de Brasil y en dos géneros muy distintos como son los resúmenes de los artículos científicos y las respuestas argumentativas. La unidad central (CU, por sus siglas en inglés) puede ser de interés para entender los textos partiendo de la estructura discursiva relacional y poderlo aplicar en tareas de Procesamiento del Lenguaje Natural (PLN) tales como resumen automático, sistemas de pregunta-respuesta o análisis de sentimiento. En los textos de respuesta argumentativa, identificar la CU es un paso esencial para un evaluador automático de considere la estructura discursiva de dichos textos. El marco teórico en el que hemos desarrollado el trabajo es la Rhetorical Structure Theory (RST) de Mann y Thompson (1988), que parte de la segmentación discursiva y finaliza con la anotación de la unidad central. Los resultados demuestran que las unidades centrales en diferentes lenguas y géneros son detectadas con similares resultados automáticamente, aunque todavía hay espacio para mejora

    Elaboration of a RST Chinese Treebank

    Get PDF
    [EN] As a subfield of Artificial Intelligence (AI), Natural Language Processing (NLP) aims to automatically process human languages. Fruitful achievements of variant studies from different research fields for NLP exist. Among these research fields, discourse analysis is becoming more and more popular. Discourse information is crucial for NLP studies. As the most spoken language in the world, Chinese occupy a very important position in NLP analysis. Therefore, this work aims to present a discourse treebank for Chinese, whose theoretical framework is Rhetorical Structure Theory (RST) (Mann and Thompson, 1988). In this work, 50 Chinese texts form the research corpus and the corpus can be consulted from the following aspects: segmentation, central unit (CU) and discourse structure. Finally, we create an open online interface for the Chinese treebank.[EU] Adimen Artifizialaren (AA) barneko arlo bat izanez, Hizkuntzaren Prozesamenduak (HP) giza-hizkuntzak automatikoko prozesatzea du helburu. Arlo horretako ikasketa anitzetan lorpen emankor asko eman dira. Ikasketa-arlo ezberdin horien artean, diskurtso-analisia gero eta ezagunagoa da. Diskurtsoko inforamzioa interes handikoa da HPko ikasketetan. Munduko hiztun gehien duen hizkuntza izanda, txinera aztertzea oso garrantzitsua da HPan egiten ari diren ikasketetarako. Hori dela eta, lan honek txinerako diskurtso-egituraz etiketaturiko zuhaitz-banku bat aurkeztea du helburu, Egitura Erretorikoaren Teoria (EET) (Mann eta Thompson, 1988) oinarrituta. Lan honetan, ikerketa-corpusa 50 testu txinatarrez osatu da, ea zuhaitz-bankua hiru etiketatze-mailatan aurkeztuko da: segmentazioa, unitate zentrala (UZ) eta diskurtso-egitura. Azkenik, corpusa webgune batean argitaratu da zuhaitz-bankua kontsultatzeko

    Studies in the linguistic sciences. 17-18 (1987-1988)

    Get PDF

    Temporal Relations in English and German Narrative Discourse

    Get PDF
    Institute for Communicating and Collaborative SystemsUnderstanding the temporal relations which hold between situations described in a narrative is a highly complex process. The main aim of this thesis is to investigate the factors we have to take into account in order to determine the temporal coherence of a narrative discourse. In particular, aspectual information, tense, and world and context knowledge have to be considered and the interplay of all these factors must be specified. German is aspectually speaking an interesting language, because it does not possess a grammaticalised distinction between a perfective and imperfective aspect. In this thesis I examine the German aspectual system and the interaction of the factors which have an influence on the derived temporal relation for short discourse sequences. The analysis is carried out in two steps: First, the aspectual and temporal properties of German are investigated, following the cross-linguistic framework developed by Carlota S. Smith. An account for German is given which emphasises the properties which are peculiar to this language and explains why it has to be treated differently to, for example, English. The main result for the tense used in a narrative text—the Preterite—is that information regarding the end point of a described situation is based on our world knowledge and may be overridden provided context knowledge forces us to do this. Next, the more complex level of discourse is taken into account in order to derive the temporal relations which hold between the described situations. This investigation provides us with insights into the interaction of different knowledge sources like aspectual information as well as world and context knowledge. This investigation of German discourse sequences gives rise to the need for a time logic which is capable of expressing fine as well as coarse (or underspecified) temporal relations between situations. An account is presented to describe exhaustively all conceivable temporal relations within a computationally tractable reasoning system, based on the interval calculus by James Allen. However, in order to establish a coherent discourse for larger sequences, the hierarchical structure of a narrative has to be considered as well. I propose a Tree Description Grammar — a further development of Tree Adjoining Grammars — for parsing the given discourse structure, and stipulate discourse principles which give an explanation for the way a discourse should be processed. I furthermore discuss how a discourse grammar needs to distinguish between discourse structure and discourse processing. The latter term can be understood as navigating through a discourse tree, and reflects the process of how a discourse is comprehended. Finally, a small fragment of German is given which shows how the discourse grammar can be applied to short discourse sequences of four to seven sentences. The conclusion discusses the outcome of the analysis conducted in this thesis and proposes likely areas of future research

    Connective-Lex: A Web-Based Multilingual Lexical Resource for Connectives

    Get PDF
    In this paper, we present a tangible outcome of the TextLink network: a joint online database project displaying and linking existing and newly-created lexicons of discourse connectives in multiple languages. We discuss the definition and demarcation of the class of connectives that should be included in such a resource, and present the syntactic, semantic/pragmatic, and lexicographic information we collected. Further, the technical implementation of the database and the search functionality are presented. We discuss how the multilingual integration of several connective lexicons provides added value for linguistic researchers and other users interested in connectives, by allowing crosslinguistic comparison and a direct linking between discourse relational devices in different languages. Finally, we provide pointers for possible future extensions both in breadth (i.e., by adding lexicons for additional languages) and depth (by extending the information provided for each connective item and by strengthening the crosslinguistic links).Nous présentons dans cet article un résultat tangible du réseau TextLink : un projet conjoint de base de données en ligne, qui montre et relie des lexiques, aussi bien existants que créés récemment, de connecteurs discursifs dans plusieurs langues. Nous commençons par considérer la définition et la délimitation de la classe des connecteurs qui devraient être inclus dans une telle ressource, et nous présentons l’information syntaxique, sémantico-pragmatique et lexicographique que nous avons recueillie. D’autre part, l’implémentation technique de cette base de données et les fonctionnalités de recherche qu’elle permet sont aussi décrites. Nous discutons de quelle manière l’intégration multilingue de plusieurs lexiques de connecteurs apporte une valeur ajoutée aux chercheurs en linguistique et aux autres utilisateurs qui s’intéressent aux connecteurs, en permettant de comparer plusieurs langues et de relier directement les connecteurs dans différentes langues. Pour finir, nous donnons des indications quant à une possible extension future en termes d’ampleur (par exemple, en ajoutant des lexiques pour de nouvelles langues) et de profondeur (en augmentant l’information qui est donnée pour chaque connecteur et en renforçant les liens entre lexiques)

    Identifying nocuous ambiguity in natural language requirements

    Get PDF
    This dissertation is an investigation into how ambiguity should be classified for authors and readers of text, and how this process can be automated. Usually, authors and readers disambiguate ambiguity, either consciously or unconsciously. However, disambiguation is not always appropriate. For instance, a linguistic construction may be read differently by different people, with no consensus about which reading is the intended one. This is particularly dangerous if they do not realise that other readings are possible. Misunderstandings may then occur. This is particularly serious in the field of requirements engineering. If requirements are misunderstood, systems may be built incorrectly, and this can prove very costly. Our research uses natural language processing techniques to address ambiguity in requirements. We develop a model of ambiguity, and a method of applying it, which represent a novel approach to the problem described here. Our model is based on the notion that human perception is the only valid criterion for judging ambiguity. If people perceive very differently how an ambiguity should be read, it will cause misunderstandings. Assigning a preferred reading to it is therefore unwise. In text, such ambiguities should be located and rewritten in a less ambiguous form; others need not be reformulated. We classify the former as nocuous and the latter as innocuous. We allow the dividing line between these two classifications to be adjustable. We term this the ambiguity threshold, and it represents a level of intolerance to ambiguity. A nocuous ambiguity can be an unacknowledged or an acknowledged ambiguity for a given set of readers. In the former case, they assign disparate readings to the ambiguity, but each is unaware that the others read it differently. In the latter case, they recognise that the ambiguity has more than one reading, but this fact may be unacknowledged by new readers. We present an automated approach to determine whether ambiguities in text are nocuous or innocuous. We use heuristics to distinguish ambiguities for which there is a strong consensus about how they should be read. These are innocuous ambiguities. The remaining nocuous ambiguities can then be rewritten at a later stage. We find consensus opinions about ambiguities by surveying human perceptions on them. Our heuristics try to predict these perceptions automatically. They utilise various types of linguistic information: generic corpus data, morphology and lexical subcategorisations are the most successful. We use coordination ambiguity as the test case for this research. This occurs where the scope of words such as and and or is unclear. Our research contributes to both the requirements engineering and the natural language processing literatures. Ambiguity is known to be a serious problem in requirements engineering, but has rarely been dealt with effectively and thoroughly. Our approach is an appropriate solution, and our flexible ambiguity threshold is a particularly useful concept. For instance, high ambiguity intolerance can be implemented when writing requirements for safety-critical systems. Coordination ambiguities are widespread and known to cause misunderstandings, but have received comparatively little attention. Our heuristics show that linguistic data can be used successfully to predict preferred readings of very diverse coordinations. Used in combination, these heuristics demonstrate that nocuous ambiguity can be distinguished from innocuous ambiguity under certain conditions. Employing appropriate ambiguity thresholds, accuracy representing 28% improvement on the baselines can be achieved

    Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation. ICAME41 Extended Book of Abstracts

    Get PDF
    This is a collection of papers, work-in-progress reports, and other contributions that were part of the ICAME41 digital conference

    Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation. ICAME41 Extended Book of Abstracts

    Get PDF
    This is a collection of papers, work-in-progress reports, and other contributions that were part of the ICAME41 digital conference
    corecore