Search CORE

861 research outputs found

An Efficient Method of Summarizing Documents Using Impression Measurements

Author: Aoe Jun-ichi
Atlam El-Sayed
Fuketa Masao
Kitagawa Hiroya
Morita Kazuhiro
Ubul Abdunabi
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 23/05/2013
Field of study

Automatic generic document summarization based on unsupervised schemes is a very useful approach because it does not require training data. Although techniques using latent semantic analysis (LSA) and non-negative matrix factorization (NMF) have been applied to determine topics of documents, there are no researches on reduction of matrix and speeding up of computation of the NMF method. In order to achieve this scheme, this paper utilizes the generic impressive expressions from newspapers to extract important sentences as summary. Therefore, it has no stemming processes and no filtering of stop words. Generally, novels are typical documents providing sentimental impression for readers. However, newspapers deliver different impressions for new knowledge because they inform readers about current events, informative articles and diverse features. The proposed method introduces impressive expressions for newspapers and their measurements are applied to the NMF method. From 100 KB text data of experimental results by the proposed method, it turns out that the matrix size reduces by 80 % and the computation of the NMF method becomes 7 times faster than with the original method, without degrading the relevancy of extracted sentences

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

CRF based feature extraction applied for supervised automatic text summarization

Author: A. Aziz Normaziah
I. Shafie Sharil
K. Batcha Nowshath
Publication venue: Elsevier Ltd.
Publication date: 01/01/2013
Field of study

Feature extraction is the promising issue to be addressed in algebraic based Automatic Text Summarization (ATS) methods. The most vital role of any ATS is the identification of most important sentences from the given text. This is possible only when the correct features of the sentences are identified properly. Hence this paper proposes a Conditional Random Field (CRF) based ATS which can identify and extract the correct features which is the main issue that exists with the Non-negative Matrix Factorization (NMF) based ATS. This work proposes a trainable supervised method. Result clearly indicates that the newly proposed approach can identify and segment the sentences based on features more accurately than the existing method addressed

Elsevier - Publisher Connector

The International Islamic University Malaysia Repository

Document Summarization Using NMF and Pseudo Relevance Feedback Based on K-Means Clustering

Author: Cha ByungRae
Kim JongWon
Park Sun
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 02/11/2016
Field of study

According to the increment of accessible text data source on the internet, it has increased the necessity of the automatic text document summarization. However, the performance of the automatic methods might be poor because the semantic gap between high level user's summary requirement and low level vector representation of machine exists. In this paper, to overcome that problem, we propose a new document summarization method using a pseudo relevance feedback based on clustering method and NMF (non-negative matrix factorization). Relevance feedback is effective technique to minimize the semantic gap of information processing, but the general relevance feedback needs an intervention of a user. Additionally, the refined query without user interference by pseudo relevance feedback may be biased. The proposed method provides an automatic relevance judgment to reformulate query using the clustering method for minimizing a bias of query expansion. The method also can improve the quality of document summarization since the summarized documents are influenced by the semantic features of documents and the expanded query. The experimental results demonstrate that the proposed method achieves better performance than the other document summarization methods

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

A Novel ILP Framework for Summarizing Content with High Lexical Variety

Author: Almeida
Celikyilmaz
Conroy
DIANE LITMAN
FEI LIU
Goodfellow
Li
Li
Luo
Luo
Luo
Martins
Mazumder
Mosteller
Narayan
Qian
Ren
Tarnpradab
Wang
WENCAN LUO
Wilson
Xiong
ZITAO LIU
Publication venue
Publication date: 25/07/2018
Field of study

Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Calculating the Upper Bounds for Multi-Document Summarization using Genetic Algorithms

Author: GARCIA HERNANDEZ RENE ARNULFO
GARCIA HERNANDEZ RENE ARNULFO
LEDENEVA YULIA NIKOLAEVNA
LEDENEVA YULIA NIKOLAEVNA
ROJAS SIMON JONATHAN
ROJAS SIMON JONATHAN
Publication venue: Computación y Sistemas
Publication date: 10/01/2018
Field of study

Over the last years, several Multi-Document Summarization (MDS) methods have been presented in Document Understanding Conference (DUC), workshops. Since DUC01, several methods have been presented in approximately 268 publications of the stateof-the-art, that have allowed the continuous improvement of MDS, however in most works the upper bounds were unknowns. Recently, some works have been focused to calculate the best sentence combinations of a set of documents and in previous works we have been calculated the significance for single-document summarization task in DUC01 and DUC02 datasets. However, for MDS task has not performed an analysis of significance to rank the best multi-document summarization methods. In this paper, we describe a Genetic Algorithm-based method for calculating the best sentence combinations of DUC01 and DUC02 datasets in MDS through a Meta-document representation. Moreover, we have calculated three heuristics mentioned in several works of state-of-the-art to rank the most recent MDS methods, through the calculus of upper bounds and lower bounds

Red Mexicana de Repositorios Institucionales

Repositorio Institucional de la Universidad Autónoma del Estado de México

A framework for the Comparative analysis of text summarization techniques

Author: Ghosh Trijit
Publication venue
Publication date: 06/04/2022
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceWe see that with the boom of information technology and IOT (Internet of things), the size of information which is basically data is increasing at an alarming rate. This information can always be harnessed and if channeled into the right direction, we can always find meaningful information. But the problem is this data is not always numerical and there would be problems where the data would be completely textual, and some meaning has to be derived from it. If one would have to go through these texts manually, it would take hours or even days to get a concise and meaningful information out of the text. This is where a need for an automatic summarizer arises easing manual intervention, reducing time and cost but at the same time retaining the key information held by these texts. In the recent years, new methods and approaches have been developed which would help us to do so. These approaches are implemented in lot of domains, for example, Search engines provide snippets as document previews, while news websites produce shortened descriptions of news subjects, usually as headlines, to make surfing easier. Broadly speaking, there are mainly two ways of text summarization – extractive and abstractive summarization. Extractive summarization is the approach in which important sections of the whole text are filtered out to form the condensed form of the text. While the abstractive summarization is the approach in which the text as a whole is interpreted and examined and after discerning the meaning of the text, sentences are generated by the model itself describing the important points in a concise way

Repositório da Universidade Nova de Lisboa