Search CORE

7,905 research outputs found

Extractive Text Summarization Using Machine Learning

Author: Acharya Swapnil
Publication venue: The Repository at St. Cloud State
Publication date: 01/04/2022
Field of study

We routinely encounter too much information in the form of social media posts, blogs, news articles, research papers, and other formats. This represents an infeasible quantity of information to process, even for selecting a more manageable subset. The process of condensing a large amount of text data into a shorter form that still conveys the important ideas of the original document is text summarization. Text summarization is an active subfield of natural language processing. Extractive text summarization identifies and concatenates important sections of a document sections to form a shorter document that summarizes the contents of the original document. We discuss, implement, and compare several unsupervised machine learning algorithms including latent semantic analysis, latent dirichlet allocation, and k-means clustering. ROUGE-N metric was used to evaluate summaries generated by these machine learning algorithms. Summaries generated by using tf-idf as a feature extraction scheme and latent semantic analysis had the highest ROUGE-N scores. This computer-level assessment was validated using an empirical analysis survey

St. Cloud State University

On the Application of Generic Summarization Algorithms to Music

Author: de Matos David Martins
Raposo Francisco
Ribeiro Ricardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/06/2014
Field of study

Several generic summarization algorithms were developed in the past and successfully applied in fields such as text and speech summarization. In this paper, we review and apply these algorithms to music. To evaluate this summarization's performance, we adopt an extrinsic approach: we compare a Fado Genre Classifier's performance using truncated contiguous clips against the summaries extracted with those algorithms on 2 different datasets. We show that Maximal Marginal Relevance (MMR), LexRank and Latent Semantic Analysis (LSA) all improve classification performance in both datasets used for testing.Comment: 12 pages, 1 table; Submitted to IEEE Signal Processing Letter

arXiv.org e-Print Archive

Crossref

Repositório Institucional do ISCTE-IUL

Scalable Multi-document Summarization Using Natural Language Processing

Author: Prabhala Bhargav
Publication venue: RIT Scholar Works
Publication date: 01/06/2014
Field of study

In this age of Internet, Natural Language Processing (NLP) techniques are the key sources for providing information required by users. However, with the extensive usage of available data, a secondary level of wrappers that interact with NLP tools have become necessary. These tools must extract a concise summary from the primary data set retrieved. The main reason for using text summarization techniques is to obtain this secondary level of information. Text summarization using NLP techniques is an interesting area of research with various implications for information retrieval. This report deals with the use of Latent Semantic Analysis (LSA) for generic text summarization and compares it with other models available. It proposes text summarization using LDS in conjunction with open-source NLP frameworks such as Mahout and Lucene. The LSA algorithm can be scaled to multiple large-sized documents using these framworks. The performance of this algorithm is then compared with other models commonly used for summarization and Recall-Oriented Understudy of Gisting Evaluation (ROUGE) scores. This project implements a text summarization framework, which uses available open-source tools and cloud resources to summarize documents from many languages such as, in the case of this study, English and Hindi

RIT Scholar Works

Text Summarization Techniques: A Brief Survey

Author: Allahyari Mehdi
Assefi Mehdi
Gutierrez Juan B.
Kochut Krys
Pouriyeh Seyedamin
Safaei Saeid
Trippe Elizabeth D.
Publication venue
Publication date: 01/01/2017
Field of study

In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.Comment: Some of references format have update

arXiv.org e-Print Archive

Crossref

Georgia Southern University: Digital Commons@Georgia Southern

Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

Author: Dufour Richard
Linarès Georges
Morchid Mohamed
Ramírez-Rodríguez Javier
Torres-Moreno Juan-Manuel
Publication venue
Publication date: 01/01/2016
Field of study

The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wikipedia articles. One of the main difficulty in using topic model on huge data collection is related to the material resources (CPU time and memory) required for model estimate. To deal with this issue, we propose to build topic spaces from summarized documents. In this paper, we present a study of topic space representation in the context of big data. The topic space representation behavior is analyzed on different languages. Experiments show that topic spaces estimated from text summaries are as relevant as those estimated from the complete documents. The real advantage of such an approach is the processing time gain: we showed that the processing time can be drastically reduced using summarized documents (more than 60\% in general). This study finally points out the differences between thematic representations of documents depending on the targeted languages such as English or latin languages.Comment: 16 pages, 4 tables, 8 figure

arXiv.org e-Print Archive

PolyPublie