Search CORE

346 research outputs found

Towards Personalized and Human-in-the-Loop Document Summarization

Author: Ghodratnama Samira
Publication venue
Publication date: 30/09/2021
Field of study

The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.Comment: PhD thesi

arXiv.org e-Print Archive

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

Author: A Roberts
A Schalley
Chris Fox
D Radev
D Wang
E Benmamoun
E Lloret
F Diehl
G Giannakopoulos
H Luhn
I Foster
I Hmeidi
J Yeh
K Dukes
L Abouenour
L Al-Sulaiti
M Baroni
M Diab
M Fattah
M Outahajala
M Poesio
M Sawalha
Mahmoud El-Haj
Udo Kruschwitz
W Banzhaf
Y Benajiba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented

University of Essex Research Repository

University of Regensburg Publication Server

Crossref

Lancaster E-Prints

Text and data mining for information extraction for scientific documents

Author: Muhammad Bello Aliyu
Publication venue
Publication date: 01/02/2021
Field of study

Coventry University Pure Portal

MultiGBS: A multi-layer graph approach to biomedical summarization

Author: Davoodijam Ensieh
Ghadiri Nasser
Rinaldi Fabio
Shahreza Maryam Lotfi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values

arXiv.org e-Print Archive

Western Sydney ResearchDirect

COMPENDIUM: a text summarisation tool for generating summaries of multiple purposes, domains, and genres

Author: Lloret Elena
Palomar Manuel
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2012
Field of study

In this paper, we present a Text Summarisation tool, compendium, capable of generating the most common types of summaries. Regarding the input, single- and multi-document summaries can be produced; as the output, the summaries can be extractive or abstractive-oriented; and finally, concerning their purpose, the summaries can be generic, query-focused, or sentiment-based. The proposed architecture for compendium is divided in various stages, making a distinction between core and additional stages. The former constitute the backbone of the tool and are common for the generation of any type of summary, whereas the latter are used for enhancing the capabilities of the tool. The main contributions of compendium with respect to the state-of-the-art summarisation systems are that (i) it specifically deals with the problem of redundancy, by means of textual entailment; (ii) it combines statistical and cognitive-based techniques for determining relevant content; and (iii) it proposes an abstractive-oriented approach for facing the challenge of abstractive summarisation. The evaluation performed in different domains and textual genres, comprising traditional texts, as well as texts extracted from the Web 2.0, shows that compendium is very competitive and appropriate to be used as a tool for generating summaries.This research has been supported by the project “Desarrollo de Técnicas Inteligentes e Interactivas de Minería de Textos” (PROMETEO/2009/119) and the project reference ACOMP/2011/001 from the Valencian Government, as well as by the Spanish Government (grant no. TIN2009-13391-C04-01)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas