    Multi-document extractive summarization using semantic graph

    La generación automática de resúmenes consiste en sintetizar en un texto corto la información más relevante contenida en documentos, y permite reducir los problemas generados por la sobrecarga de información. En este trabajo se presenta un método no supervisado de generación de resúmenes extractivos a partir de múltiples documentos. En esta propuesta, la conceptualización y estructura semántica subyacente del contenido textual se representa en un grafo semántico usando WordNet y se aplica un algoritmo de agrupamiento de conceptos para identificar los tópicos tratados en los documentos, con los cuales se evalúa la relevancia de las oraciones para construir el resumen. El método fue evaluado con corpus de textos de MultiLing 2015, y se usaron métricas de ROUGE para medir la calidad de los resúmenes generados. Los resultados obtenidos se compararon con los de otros sistemas participantes en MultiLing 2015, evidenciándose mejoras en la mayoría de los casos.The automatic texts summarization consists in synthesizing in a short text the most relevant information contained in text documents, and allows to reduce the generated problems by the information overload. In this paper, an unsupervised method for extractive multi-document summarization is presented. In this proposal, the conceptualization and underlying semantics structure of the textual content is represented in a semantic graph using WordNet, and a concept clustering algorithm is applied to identifying the topics of the documents set, with which the relevance of the sentences is evaluated to build the summary. The method was evaluated with texts corpus from MultiLing 2015, and ROUGE metrics were used to measure the quality of the generated summaries. The obtained results were compared with those other participant systems in MultiLing 2015, evidencing improves in most of the cases.Este trabajo ha sido parcialmente soportado por el Fondo Europeo de Desarrollo Regional (FEDER) y el Ministerio Español de Economía y Competitividad, bajo la subvención del proyecto METODOS RIGUROSOS PARA EL INTERNET DEL FUTURO (MERINET) Ref. TIN2016-76843-C4-2-R (AEI/FEDER, UE)

    Towards Personalized and Human-in-the-Loop Document Summarization

    The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.Comment: PhD thesi