Search CORE

1,194 research outputs found

Extractive Text Summarization Using Machine Learning

Author: Acharya Swapnil
Publication venue: The Repository at St. Cloud State
Publication date: 01/04/2022
Field of study

We routinely encounter too much information in the form of social media posts, blogs, news articles, research papers, and other formats. This represents an infeasible quantity of information to process, even for selecting a more manageable subset. The process of condensing a large amount of text data into a shorter form that still conveys the important ideas of the original document is text summarization. Text summarization is an active subfield of natural language processing. Extractive text summarization identifies and concatenates important sections of a document sections to form a shorter document that summarizes the contents of the original document. We discuss, implement, and compare several unsupervised machine learning algorithms including latent semantic analysis, latent dirichlet allocation, and k-means clustering. ROUGE-N metric was used to evaluate summaries generated by these machine learning algorithms. Summaries generated by using tf-idf as a feature extraction scheme and latent semantic analysis had the highest ROUGE-N scores. This computer-level assessment was validated using an empirical analysis survey

St. Cloud State University

Text Summarization Techniques: A Brief Survey

Author: Allahyari Mehdi
Assefi Mehdi
Gutierrez Juan B.
Kochut Krys
Pouriyeh Seyedamin
Safaei Saeid
Trippe Elizabeth D.
Publication venue
Publication date: 01/01/2017
Field of study

In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.Comment: Some of references format have update

arXiv.org e-Print Archive

Georgia Southern University: Digital Commons@Georgia Southern

Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information

Author: Chen Berlin
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

[[abstract]]The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In the paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as the HMM model. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained.

National Taiwan Normal University Repository