Search CORE

793 research outputs found

Towards Understanding Egyptian Arabic Dialogues

Author: Abdou Sherif M
Elmadany Abdelrahim A
Gheith Mervat
Publication venue: 'Foundation of Computer Science'
Publication date: 13/07/2015
Field of study

Labelling of user's utterances to understanding his attends which called Dialogue Act (DA) classification, it is considered the key player for dialogue language understanding layer in automatic dialogue systems. In this paper, we proposed a novel approach to user's utterances labeling for Egyptian spontaneous dialogues and Instant Messages using Machine Learning (ML) approach without relying on any special lexicons, cues, or rules. Due to the lack of Egyptian dialect dialogue corpus, the system evaluated by multi-genre corpus includes 4725 utterances for three domains, which are collected and annotated manually from Egyptian call-centers. The system achieves F1 scores of 70. 36% overall domains.Comment: arXiv admin note: substantial text overlap with arXiv:1505.0308

arXiv.org e-Print Archive

CiteSeerX

Learning Recursive Segments for Discourse Parsing

Author: Afantenos Stergos
Danlos Laurence
Denis Pascal
Muller Philippe
Publication venue
Publication date: 28/03/2010
Field of study

Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse like SDRT allows for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques combined with a simple repairing heuristic that enforces global coherence. Our system was developed and evaluated on the first round of annotations provided by the French Annodis project (an ongoing effort to create a discourse bank for French). Cross-validated on only 47 documents (1,445 EDUs), our system achieves encouraging performance results with an F-score of 73% for finding EDUs.Comment: published at LREC 201

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Cross-lingual and cross-domain discourse segmentation of entire documents

Author: Braud Chloé
Lacroix Ophélie
Søgaard Anders
Publication venue
Publication date: 01/01/2017
Field of study

Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold standard sentence and token segmentation, and relying on high-quality syntactic parses and rich heuristics that are not generally available across languages and domains. In this paper, we propose statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations. We also consider the problem of learning discourse segmenters when no labeled data is available for a language. Our fully supervised system obtains 89.5% F1 for English newswire, with slight drops in performance on other domains, and we report supervised and unsupervised (cross-lingual) results for five languages in total.Comment: To appear in Proceedings of ACL 201

arXiv.org e-Print Archive

Copenhagen University Research Information System

Discourse structure and language technology

Author: Agarwal
Al-Saif
Al-Saif
Asher
B. WEBBER
Baldridge
Barzilay
Barzilay
Bex
Buch-Kromann
Buch-Kromann
Bunt
Burchardt
Burstein
Callison-Birch
Chambers
Chen
Chiarcos
Choi
Dale
Daume
Do
Eales
Egg
Eisenstein
Elsner
Elsner
Elwell
Finlayson
Foster
Galley
Ghorbel
Ghosh
Ghosh
Grosz
Grosz
Grosz
Gu
Guo
Halliday
Hardmeier
Hardt
Hearst
Higgins
Hirohata
Holler
Hovy
Ide
Kan
Kingsbury
Koppel
Lee
Lee
Liakata
Lin
Lochbaum
Louis
M. EGG
Maamouri
Malioutov
Mandler
Marcu
Marcu
Marcu
Marcus
Martin
Maslennikov
McDonald
McKnight
Meyer
Mladová
Moore
Moore
Moore
Moser
Nagard
Oza
Palau
Pang
Pang
Paris
Patwardhan
Petukhova
Petukhova
Pitler
Pitler
Polanyi
Polanyi
Polanyi
Prasad
Prasad
Prasad
Prasad
Propp
Purver
Purver
Sagae
Sagae
Say
Schank
Sibun
Soricut
Stede
Subba
Taboada
Teufel
Thione
Tonelli
Turney
V. KORDONI
Versley
Voll
Walker
Wang
Webber
Webber
Wellner
Woods
Zeyrek
Zeyrek
Zeyrek
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 08/12/2011
Field of study

This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.An increasing number of researchers and practitioners in Natural Language Engineering face the prospect of having to work with entire texts, rather than individual sentences. While it is clear that text must have useful structure, its nature may be less clear, making it more difficult to exploit in applications. This survey of work on discourse structure thus provides a primer on the bases of which discourse is structured along with some of their formal properties. It then lays out the current state-of-the-art with respect to algorithms for recognizing these different structures, and how these algorithms are currently being used in Language Technology applications. After identifying resources that should prove useful in improving algorithm performance across a range of languages, we conclude by speculating on future discourse structure-enabled technology.Peer Reviewe

Crossref

Edinburgh Research Explorer

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Dialogue Act Recognition via CRF-Attentive Structured Network

Author: Ang Jeremy
Chen Yun-Nung
Chen Zheqian
Geertzen Jeroen
Kalchbrenner Nal
Lee Ji Young
Pan Boyuan
Publication venue
Publication date: 15/11/2017
Field of study

Dialogue Act Recognition (DAR) is a challenging problem in dialogue interpretation, which aims to attach semantic labels to utterances and characterize the speaker's intention. Currently, many existing approaches formulate the DAR problem ranging from multi-classification to structured prediction, which suffer from handcrafted feature extensions and attentive contextual structural dependencies. In this paper, we consider the problem of DAR from the viewpoint of extending richer Conditional Random Field (CRF) structural dependencies without abandoning end-to-end training. We incorporate hierarchical semantic inference with memory mechanism on the utterance modeling. We then extend structured attention network to the linear-chain conditional random field layer which takes into account both contextual utterances and corresponding dialogue acts. The extensive experiments on two major benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder Dialogue Act (MRDA) datasets show that our method achieves better performance than other state-of-the-art solutions to the problem. It is a remarkable fact that our method is nearly close to the human annotator's performance on SWDA within 2% gap.Comment: 10 pages, 4figure

arXiv.org e-Print Archive

Crossref

Splitting Arabic Texts into Elementary Discourse Units

Author: Abdul-Mageed M.
Abu-Jbara A.
Afantenos S.
Afantenos S. D.
Al-Saif A.
Al-Saif A.
Belguith H. L.
Boujelben I.
Charoensuk J.
Da Cunha I.
Darwish K.
Diab M.
Diab M.
Eskander R.
Farah Benamara Zitoune
Fisher S.
Green S.
Gridach M.
Habash N.
Iskandar Keskes
Kamp H.
Keskes I.
Khalifa I.
Lamia Hadrich Belguith
Lüngen H.
Maamouri M.
Maamouri M.
Mourad A.
Nivre J.
Polanyi L.
Prasad A.
Sadat F.
Sawalha M.
Subba R.
Sumita K.
Tofiloski M.
Trigui O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing

Author: Kübler Sandra
Piskorski Jakub
Przepiorkowski Adam
Publication venue
Publication date: 03/11/2008
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

Author: Gong Mackenzie
Liu Yan
Liu Yang
Peng Siyao
Yu Yue
Zeldes Amir
Zhu Yilun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking (DISRPT2019

arXiv.org e-Print Archive

Crossref