Search CORE

3,392 research outputs found

Learning Recursive Segments for Discourse Parsing

Author: Afantenos Stergos
Danlos Laurence
Denis Pascal
Muller Philippe
Publication venue
Publication date: 28/03/2010
Field of study

Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse like SDRT allows for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques combined with a simple repairing heuristic that enforces global coherence. Our system was developed and evaluated on the first round of annotations provided by the French Annodis project (an ongoing effort to create a discourse bank for French). Cross-validated on only 47 documents (1,445 EDUs), our system achieves encouraging performance results with an F-score of 73% for finding EDUs.Comment: published at LREC 201

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Introduction to the CoNLL-2001 Shared Task: Clause Identification

Author: Dejean Herve
Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2001
Field of study

We describe the CoNLL-2001 shared task: dividing text into clauses. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

EusEduSeg: Un Segmentador Discursivo para el Euskera Basado en Dependencias

Author: Iruskieta Quintian Mikel
Zapirain Benat
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2015
Field of study

We present the first discursive segmenter for Basque implemented by heuristics based on syntactic dependencies and linguistic rules. Preliminary experiments show F1 values of more than 85% in automatic EDU segmentation for Basque.Presentamos en este artículo el primer segmentador discursivo para el euskera (EusEduSeg) implementado con heurísticas basadas en dependencias sintácticas y reglas lingüísticas. Experimentos preliminares muestran resultados de más del 85 % F1 en el etiquetado de EDUs sobre el Basque RST TreeBank

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Corpus-based Evaluation of Centering Theory

Author: Cheng H
di Eugenio B
Hitzeman J
Poesio M
Stevenson R
Publication venue: CSM-369
Publication date: 01/04/2002
Field of study

University of Essex Research Repository

Centering, Anaphora Resolution, and Discourse Structure

Author: Walker Marilyn A.
Publication venue
Publication date: 11/08/1997
Field of study

Centering was formulated as a model of the relationship between attentional state, the form of referring expressions, and the coherence of an utterance within a discourse segment (Grosz, Joshi and Weinstein, 1986; Grosz, Joshi and Weinstein, 1995). In this chapter, I argue that the restriction of centering to operating within a discourse segment should be abandoned in order to integrate centering with a model of global discourse structure. The within-segment restriction causes three problems. The first problem is that centers are often continued over discourse segment boundaries with pronominal referring expressions whose form is identical to those that occur within a discourse segment. The second problem is that recent work has shown that listeners perceive segment boundaries at various levels of granularity. If centering models a universal processing phenomenon, it is implausible that each listener is using a different centering algorithm.The third issue is that even for utterances within a discourse segment, there are strong contrasts between utterances whose adjacent utterance within a segment is hierarchically recent and those whose adjacent utterance within a segment is linearly recent. This chapter argues that these problems can be eliminated by replacing Grosz and Sidner's stack model of attentional state with an alternate model, the cache model. I show how the cache model is easily integrated with the centering algorithm, and provide several types of data from naturally occurring discourses that support the proposed integrated model. Future work should provide additional support for these claims with an examination of a larger corpus of naturally occurring discourses.Comment: 35 pages, uses elsart12, lingmacros, named, psfi

arXiv.org e-Print Archive

CiteSeerX

Splitting Arabic Texts into Elementary Discourse Units

Author: Abdul-Mageed M.
Abu-Jbara A.
Afantenos S.
Afantenos S. D.
Al-Saif A.
Al-Saif A.
Belguith H. L.
Boujelben I.
Charoensuk J.
Da Cunha I.
Darwish K.
Diab M.
Diab M.
Eskander R.
Farah Benamara Zitoune
Fisher S.
Green S.
Gridach M.
Habash N.
Iskandar Keskes
Kamp H.
Keskes I.
Khalifa I.
Lamia Hadrich Belguith
Lüngen H.
Maamouri M.
Maamouri M.
Mourad A.
Nivre J.
Polanyi L.
Prasad A.
Sadat F.
Sawalha M.
Subba R.
Sumita K.
Tofiloski M.
Trigui O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte