Search CORE

17 research outputs found

Graph-based Neural Multi-Document Summarization

Author: Meelu Kshitijh
Pareek Ayush
Radev Dragomir
Srinivasan Krishnan
Yasunaga Michihiro
Zhang Rui
Publication venue
Publication date: 01/01/2017
Field of study

We propose a neural multi-document summarization (MDS) system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences while avoiding redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.Comment: In CoNLL 201

arXiv.org e-Print Archive

Crossref

A reinforcement learning formulation to the complex question answering problem

Author: Chali Yllias
Hasan Sadid A.
Mojahid Mustapha
Publication venue: 'Elsevier BV'
Publication date: 01/05/2015
Field of study

International audienceWe use extractive multi-document summarization techniques to perform complex question answering and formulate it as a reinforcement learning problem. Given a set of complex questions, a list of relevant documents per question, and the corresponding human generated summaries (i.e. answers to the questions) as training data, the reinforcement learning module iteratively learns a number of feature weights in order to facilitate the automatic generation of summaries i.e. answers to previously unseen complex questions. A reward function is used to measure the similarities between the candidate (machine generated) summary sentences and the abstract summaries. In the training stage, the learner iteratively selects the important document sentences to be included in the candidate summary, analyzes the reward function and updates the related feature weights accordingly. The final weights are used to generate summaries as answers to unseen complex questions in the testing stage. Evaluation results show the effectiveness of our system. We also incorporate user interaction into the reinforcement learner to guide the candidate summary sentence selection process. Experiments reveal the positive impact of the user interaction component on the reinforcement learning framework

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Learning to Create Sentence Semantic Relation Graphs for Multi-Document Summarization

Author: Antognini Diego
Faltings Boi
Publication venue
Publication date: 01/01/2019
Field of study

Linking facts across documents is a challenging task, as the language used to express the same information in a sentence can vary significantly, which complicates the task of multi-document summarization. Consequently, existing approaches heavily rely on hand-crafted features, which are domain-dependent and hard to craft, or additional annotated data, which is costly to gather. To overcome these limitations, we present a novel method, which makes use of two types of sentence embeddings: universal embeddings, which are trained on a large unrelated corpus, and domain-specific embeddings, which are learned during training. To this end, we develop SemSentSum, a fully data-driven model able to leverage both types of sentence embeddings by building a sentence semantic relation graph. SemSentSum achieves competitive results on two types of summary, consisting of 665 bytes and 100 words. Unlike other state-of-the-art models, neither hand-crafted features nor additional annotated data are necessary, and the method is easily adaptable for other tasks. To our knowledge, we are the first to use multiple sentence embeddings for the task of multi-document summarization.Comment: 10 pages, 4 tables, 1 figure, Accepted at 2019 Empirical Methods in Natural Language Processing - Workshop on New Frontiers in Summarizatio

arXiv.org e-Print Archive

Crossref

Topic-focused multi-document summarization using an approximate oracle score

Author: Dianne P. O’leary
John M. Conroy
Judith D. Schlesinger
Publication venue
Publication date: 01/01/2006
Field of study

We consider the problem of producing a multi-document summary given a collection of documents. Since most successful methods of multi-document summarization are still largely extractive, in this paper, we explore just how well an extractive method can perform. We introduce an “oracle ” score, based on the probability distribution of unigrams in human summaries. We then demonstrate that with the oracle score, we can generate extracts which score, on average, better than the human summaries, when evaluated with ROUGE. In addition, we introduce an approximation to the oracle score which produces a system with the best known performance for the 2005 Document Understanding Conference (DUC) evaluation.

CiteSeerX

Crossref

Multi Domain Semantic Information Retrieval Based on Topic Model

Author: Lee Sanghoon
Publication venue: ScholarWorks @ Georgia State University
Publication date: 07/05/2016
Field of study

Over the last decades, there have been remarkable shifts in the area of Information Retrieval (IR) as huge amount of information is increasingly accumulated on the Web. The gigantic information explosion increases the need for discovering new tools that retrieve meaningful knowledge from various complex information sources. Thus, techniques primarily used to search and extract important information from numerous database sources have been a key challenge in current IR systems. Topic modeling is one of the most recent techniquesthat discover hidden thematic structures from large data collections without human supervision. Several topic models have been proposed in various fields of study and have been utilized extensively for many applications. Latent Dirichlet Allocation (LDA) is the most well-known topic model that generates topics from large corpus of resources, such as text, images, and audio.It has been widely used in many areas in information retrieval and data mining, providing efficient way of identifying latent topics among document collections. However, LDA has a drawback that topic cohesion within a concept is attenuated when estimating infrequently occurring words. Moreover, LDAseems not to consider the meaning of words, but rather to infer hidden topics based on a statisticalapproach. However, LDA can cause either reduction in the quality of topic words or increase in loose relations between topics. In order to solve the previous problems, we propose a domain specific topic model that combines domain concepts with LDA. Two domain specific algorithms are suggested for solving the difficulties associated with LDA. The main strength of our proposed model comes from the fact that it narrows semantic concepts from broad domain knowledge to a specific one which solves the unknown domain problem. Our proposed model is extensively tested on various applications, query expansion, classification, and summarization, to demonstrate the effectiveness of the model. Experimental results show that the proposed model significantly increasesthe performance of applications

ScholarWorks @ Georgia State University

An Application of Natural Language Processing for Triangulation of Cognitive Load Assessments in Third Level Education

Author: Contreras Luis Alfredo
Publication venue: Technological University Dublin
Publication date: 01/01/2018
Field of study

Work has been done to measure Mental Workload based on applications mainly related to ergonomics, human factors, and Machine Learning. The influence of Machine Learning is a reflection of an increased use of new technologies applied to areas conventionally dominated by theoretical approaches. However, collaboration between MWL and Natural Language Processing techniques seems to happen rarely. In this sense, the objective of this research is to make use of Natural Languages Processing techniques to contribute to the analysis of the relationship between Mental Workload subjective measures and Relative Frequency Ratios of keywords gathered during pre-tasks and post-tasks of MWL activities in third-level sessions under different topics and instructional designs. This research employs secondary, empirical and inductive methods to investigate Cognitive Load theory, instructional designs, Mental Workload foundations and measures and Natural Language Process Techniques. Then, NASA-TLX, Workload Profile and Relative Frequency Ratios are calculated. Finally, the relationship between NASA-TLX and Workload Profile and Relative Frequency Ratios is analysed using parametric and non-parametric statistical techniques. Results show that the relationship between Mental Workload and Relative Frequency Ratios of keywords, is only medium correlated, or not correlated at all. Furthermore, it has been found out that instructional designs based on the process of hearing and seeing, and the interaction between participants, can overcome other approaches such as those that make use of videos supported with images and text, or of a lecturer\u27s speech supported with slides

Arrow@TUDublin

An Application of Natural Language Processing for Triangulation of Cognitive Load Assessments in Third Level Education

Author: Contreras Luis Alfredo
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2018
Field of study

Arrow@TUDublin

Automatic Summarization

Author: McKeown Kathleen
Nenkova Ani
Publication venue: ScholarlyCommons
Publication date: 01/06/2011
Field of study

It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field

ScholarlyCommons@Penn