Search CORE

1,281 research outputs found

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

Author: Gong Mackenzie
Liu Yan
Liu Yang
Peng Siyao
Yu Yue
Zeldes Amir
Zhu Yilun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking (DISRPT2019

arXiv.org e-Print Archive

Crossref

Splitting Arabic Texts into Elementary Discourse Units

Author: Abdul-Mageed M.
Abu-Jbara A.
Afantenos S.
Afantenos S. D.
Al-Saif A.
Al-Saif A.
Belguith H. L.
Boujelben I.
Charoensuk J.
Da Cunha I.
Darwish K.
Diab M.
Diab M.
Eskander R.
Farah Benamara Zitoune
Fisher S.
Green S.
Gridach M.
Habash N.
Iskandar Keskes
Kamp H.
Keskes I.
Khalifa I.
Lamia Hadrich Belguith
Lüngen H.
Maamouri M.
Maamouri M.
Mourad A.
Nivre J.
Polanyi L.
Prasad A.
Sadat F.
Sawalha M.
Subba R.
Sumita K.
Tofiloski M.
Trigui O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney

Reassessing second language reading comprehension: Insights from the psycholinguistics notion of sentence processing

Author: Ashadi Ashadi
Zudianto Hardian
Publication venue: 'Universitas Islam Sultan Agung'
Publication date: 28/02/2021
Field of study

Theories and practices in second language reading pedagogy often overlook the sentence processing description from the psycholinguistics perspective. Second language reading comprehension is easily associated with vocabulary learning or discourse strategy. Yet, such activities can lead to an unnatural way of reading such as translating vocabularies or pointing out information as required. Meanwhile the authentic way of reading should encourage a natural stream of ideas to be interpreted from sentence to sentence. As suggested by the sentence processing notion from the psycholinguistics point of view, syntax appears to be the key to effective and authentic reading as opposed to the general belief of semantic or discourse information being the primary concern. This article argues that understanding the architecture of sentence processing, with syntactic parsing at the core of the underlying mechanism, can offer insights into the second language reading pedagogy. The concepts of syntactic parsing, reanalysis, and sentence processing models are described to give the idea of how sentence processing works. Additionally, a critical review on the differences between L1 and L2 sentence processing is presented considering the recent debate on individual differences as significant indicators of nativelike L2 sentence processing. Lastly, implications for the L2 reading pedagogy and potential implementation in instructional setting are discussed

EduLite: Journal of English Education, Literature and Culture

Portal Jurnal Universitas Islam Sultan Agung (UNISSULA)

DiSeg: an automatic discourse segmenter for Spanish

Author: Castellón Masalles Irene
Cunha Fanego Iria da
Lloberes Salvatella Marina
SanJuan Eric
Torres Moreno Juan Manuel
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2010
Field of study

Hoy en día el análisis discursivo automático es un tema de investigación relevante. Sin embargo, no existen analizadores del discurso para textos en español. El primer paso para desarrollar esta herramienta es la segmentación discursiva. En este artículo presentamos DiSeg, el primer segmentador discursivo para el español que utiliza el marco de la Rhetorical Structure Theory (Mann y Thompson, 1988) y se basa en reglas léxicas y sintácticas. Describimos el sistema y evaluamos sus resultados con un corpus gold standard, obteniendo resultados prometedores.Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish that uses the framework of the Rhetorical Structure Theory (Mann and Thompson, 1988) and is based on lexical and syntactic rules. We describe the system and we evaluate its performance with a gold standard corpus, obtaining promising results.Parte de este trabajo ha sido financiado mediante una ayuda de movilidad posdoctoral otorgada por el Ministerio de Ciencia e Innovación de España (Programa Nacional de Movilidad de Recursos Humanos de Investigación; Plan Nacional de Investigación Científica, Desarrollo e Innovación 2008-2011) a Iria da Cunha

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Discourse Structure in Machine Translation Evaluation

Author: Guzmán Francisco
Joty Shafiq
Màrquez Lluís
Nakov Preslav
Publication venue
Publication date: 01/01/2017
Field of study

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse analysis. Computational Linguistics, 201

arXiv.org e-Print Archive

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)