1,281 research outputs found
GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection
In this paper we present GumDrop, Georgetown University's entry at the DISRPT
2019 Shared Task on automatic discourse unit segmentation and connective
detection. Our approach relies on model stacking, creating a heterogeneous
ensemble of classifiers, which feed into a metalearner for each final task. The
system encompasses three trainable component stacks: one for sentence
splitting, one for discourse unit segmentation and one for connective
detection. The flexibility of each ensemble allows the system to generalize
well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking
(DISRPT2019
Splitting Arabic Texts into Elementary Discourse Units
International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Reassessing second language reading comprehension: Insights from the psycholinguistics notion of sentence processing
Theories and practices in second language reading pedagogy often overlook the sentence processing description from the psycholinguistics perspective. Second language reading comprehension is easily associated with vocabulary learning or discourse strategy. Yet, such activities can lead to an unnatural way of reading such as translating vocabularies or pointing out information as required. Meanwhile the authentic way of reading should encourage a natural stream of ideas to be interpreted from sentence to sentence. As suggested by the sentence processing notion from the psycholinguistics point of view, syntax appears to be the key to effective and authentic reading as opposed to the general belief of semantic or discourse information being the primary concern. This article argues that understanding the architecture of sentence processing, with syntactic parsing at the core of the underlying mechanism, can offer insights into the second language reading pedagogy. The concepts of syntactic parsing, reanalysis, and sentence processing models are described to give the idea of how sentence processing works. Additionally, a critical review on the differences between L1 and L2 sentence processing is presented considering the recent debate on individual differences as significant indicators of nativelike L2 sentence processing. Lastly, implications for the L2 reading pedagogy and potential implementation in instructional setting are discussed
DiSeg: an automatic discourse segmenter for Spanish
Hoy en dÃa el análisis discursivo automático es un tema de investigación relevante. Sin embargo, no existen analizadores del discurso para textos en español. El primer paso para desarrollar esta herramienta es la segmentación discursiva. En este artÃculo presentamos DiSeg, el primer segmentador discursivo para el español que utiliza el marco de la Rhetorical Structure Theory (Mann y Thompson, 1988) y se basa en reglas léxicas y sintácticas. Describimos el sistema y evaluamos sus resultados con un corpus gold standard, obteniendo resultados prometedores.Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish that uses the framework of the Rhetorical Structure Theory (Mann and Thompson, 1988) and is based on lexical and syntactic rules. We describe the system and we evaluate its performance with a gold standard corpus, obtaining promising results.Parte de este trabajo ha sido financiado mediante una ayuda de movilidad posdoctoral otorgada por el Ministerio de Ciencia e Innovación de España (Programa Nacional de Movilidad de Recursos Humanos de Investigación; Plan Nacional de Investigación CientÃfica, Desarrollo e Innovación 2008-2011) a Iria da Cunha
Discourse Structure in Machine Translation Evaluation
In this article, we explore the potential of using sentence-level discourse
structure for machine translation evaluation. We first design discourse-aware
similarity measures, which use all-subtree kernels to compare discourse parse
trees in accordance with the Rhetorical Structure Theory (RST). Then, we show
that a simple linear combination with these measures can help improve various
existing machine translation evaluation metrics regarding correlation with
human judgments both at the segment- and at the system-level. This suggests
that discourse information is complementary to the information used by many of
the existing evaluation metrics, and thus it could be taken into account when
developing richer evaluation metrics, such as the WMT-14 winning combined
metric DiscoTKparty. We also provide a detailed analysis of the relevance of
various discourse elements and relations from the RST parse trees for machine
translation evaluation. In particular we show that: (i) all aspects of the RST
tree are relevant, (ii) nuclearity is more useful than relation type, and (iii)
the similarity of the translation RST tree to the reference tree is positively
correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse
analysis. Computational Linguistics, 201
- …