Search CORE

2 research outputs found

An Annotation Scheme of A Large-scale Multi-party Dialogues Dataset for Discourse Parsing and Machine Comprehension

Author: Li Jiaqi
Liu Ming
Liu Ting
Qin Bing
Zheng Zihao
Publication venue
Publication date: 08/11/2019
Field of study

In this paper, we propose the scheme for annotating large-scale multi-party chat dialogues for discourse parsing and machine comprehension. The main goal of this project is to help understand multi-party dialogues. Our dataset is based on the Ubuntu Chat Corpus. For each multi-party dialogue, we annotate the discourse structure and question-answer pairs for dialogues. As we know, this is the first large scale corpus for multi-party dialogues discourse parsing, and we firstly propose the task for multi-party dialogues machine reading comprehension

arXiv.org e-Print Archive

Improving Online Forums Summarization via Unifying Hierarchical Attention Networks with Convolutional Neural Networks

Author: Hua Kien A.
Jafariakinabad Fereshteh
Tarnpradab Sansiri
Publication venue
Publication date: 24/03/2021
Field of study

Online discussion forums are prevalent and easily accessible, thus allowing people to share ideas and opinions by posting messages in the discussion threads. Forum threads that significantly grow in length can become difficult for participants, both newcomers and existing, to grasp main ideas. This study aims to create an automatic text summarizer for online forums to mitigate this problem. We present a framework based on hierarchical attention networks, unifying Bidirectional Long Short-Term Memory (Bi-LSTM) and Convolutional Neural Network (CNN) to build sentence and thread representations for the forum summarization. In this scheme, Bi-LSTM derives a representation that comprises information of the whole sentence and whole thread; whereas, CNN recognizes high-level patterns of dominant units with respect to the sentence and thread context. The attention mechanism is applied on top of CNN to further highlight the high-level representations that capture any important units contributing to a desirable summary. Extensive performance evaluation based on three datasets, two of which are real-life online forums and one is news dataset, reveals that the proposed model outperforms several competitive baselines.Comment: 27 pages, 7 figure

arXiv.org e-Print Archive