43 research outputs found

    Automatic bilingual text document summarization.

    Get PDF
    Lo Sau-Han Silvia.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 137-143).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Definition of a summary --- p.2Chapter 1.2 --- Definition of text summarization --- p.3Chapter 1.3 --- Previous work --- p.4Chapter 1.3.1 --- Extract-based text summarization --- p.5Chapter 1.3.2 --- Abstract-based text summarization --- p.8Chapter 1.3.3 --- Sophisticated text summarization --- p.9Chapter 1.4 --- Summarization evaluation methods --- p.10Chapter 1.4.1 --- Intrinsic evaluation --- p.10Chapter 1.4.2 --- Extrinsic evaluation --- p.11Chapter 1.4.3 --- The TIPSTER SUMMAC text summarization evaluation --- p.11Chapter 1.4.4 --- Text Summarization Challenge (TSC) --- p.13Chapter 1.5 --- Research contributions --- p.14Chapter 1.5.1 --- Text summarization based on thematic term approach --- p.14Chapter 1.5.2 --- Bilingual news summarization based on an event-driven approach --- p.15Chapter 1.6 --- Thesis organization --- p.16Chapter 2 --- Text Summarization based on a Thematic Term Approach --- p.17Chapter 2.1 --- System overview --- p.18Chapter 2.2 --- Document preprocessor --- p.20Chapter 2.2.1 --- English corpus --- p.20Chapter 2.2.2 --- English corpus preprocessor --- p.22Chapter 2.2.3 --- Chinese corpus --- p.23Chapter 2.2.4 --- Chinese corpus preprocessor --- p.24Chapter 2.3 --- Corpus thematic term extractor --- p.24Chapter 2.4 --- Article thematic term extractor --- p.26Chapter 2.5 --- Sentence score generator --- p.29Chapter 2.6 --- Chapter summary --- p.30Chapter 3 --- Evaluation for Summarization using the Thematic Term Ap- proach --- p.32Chapter 3.1 --- Content-based similarity measure --- p.33Chapter 3.2 --- Experiments using content-based similarity measure --- p.36Chapter 3.2.1 --- English corpus and parameter training --- p.36Chapter 3.2.2 --- Experimental results using content-based similarity mea- sure --- p.38Chapter 3.3 --- Average inverse rank (AIR) method --- p.59Chapter 3.4 --- Experiments using average inverse rank method --- p.60Chapter 3.4.1 --- Corpora and parameter training --- p.61Chapter 3.4.2 --- Experimental results using AIR method --- p.62Chapter 3.5 --- Comparison between the content-based similarity measure and the average inverse rank method --- p.69Chapter 3.6 --- Chapter summary --- p.73Chapter 4 --- Bilingual Event-Driven News Summarization --- p.74Chapter 4.1 --- Corpora --- p.75Chapter 4.2 --- Topic and event definitions --- p.76Chapter 4.3 --- Architecture of bilingual event-driven news summarization sys- tem --- p.77Chapter 4.4 --- Bilingual event-driven approach summarization --- p.80Chapter 4.4.1 --- Dictionary-based term translation applying on English news articles --- p.80Chapter 4.4.2 --- Preprocessing for Chinese news articles --- p.89Chapter 4.4.3 --- Event clusters generation --- p.89Chapter 4.4.4 --- Cluster selection and summary generation --- p.96Chapter 4.5 --- Evaluation for summarization based on event-driven approach --- p.101Chapter 4.6 --- Experimental results on event-driven summarization --- p.103Chapter 4.6.1 --- Experimental settings --- p.103Chapter 4.6.2 --- Results and analysis --- p.105Chapter 4.7 --- Chapter summary --- p.113Chapter 5 --- Applying Event-Driven Summarization to a Parallel Corpus --- p.114Chapter 5.1 --- Parallel corpus --- p.115Chapter 5.2 --- Parallel documents preparation --- p.116Chapter 5.3 --- Evaluation methods for the event-driven summaries generated from the parallel corpus --- p.118Chapter 5.4 --- Experimental results and analysis --- p.121Chapter 5.4.1 --- Experimental settings --- p.121Chapter 5.4.2 --- Results and analysis --- p.123Chapter 5.5 --- Chapter summary --- p.132Chapter 6 --- Conclusions and Future Work --- p.133Chapter 6.1 --- Conclusions --- p.133Chapter 6.2 --- Future work --- p.135Bibliography --- p.137Chapter A --- English Stop Word List --- p.144Chapter B --- Chinese Stop Word List --- p.149Chapter C --- Event List Items on the Corpora --- p.151Chapter C.1 --- "Event list items for the topic ""Upcoming Philippine election""" --- p.151Chapter C.2 --- "Event list items for the topic ""German train derail"" " --- p.153Chapter C.3 --- "Event list items for the topic ""Electronic service delivery (ESD) scheme"" " --- p.154Chapter D --- The sample of an English article (9505001.xml). --- p.15

    Methods of Text Document Summarization

    Get PDF
    Diplomová práce se zabývá jednodokumentovou sumarizací textových dat. Část práce je věnována přípravě dat, která je tvořena hlavně normalizací. Uvedeny jsou v ní některé algoritmy stemizace a obsahuje i popis lematizace. Hlavní část práce je věnována Luhnově sumarizační metodě a jejímu rozšíření za pouţití slovníku WordNet. Popsána a implementována byla i Oswaldova metoda. Navrţená a implementovaná aplikace provádí automatickou tvorbu abstraktů za pouţití zmíněných metod. Byla provedena i sada experimentů, kterými byla ověřena správná funkčnost aplikace.This thesis deals with one-document summarization of text data. Part of it is devoted to data preparation, mainly to the normalization. Listed are some of the stemming algorithms and it contains also description of lemmatization. The main part is devoted to Luhn"s method for summarization and its extension of use WordNet dictionary. Oswald summarization method is described and applied as well. Designed and implemented application performs automatic generation of abstracts using these methods. A set of experiments where developed, which verified correct functionality of the application and of extension of Luhn"s summarization method too.

    Enhanced sentence extraction through neuro-fuzzy technique for text document summarization

    Get PDF
    A summary system comprises a subtraction of text documents to generate a new form that delivers the essentials contents of the documents. Due to the hassle of documents overload, getting the right information and effectively-developed summaries are essential in retrieving information. Reduction of information allows users to find the information needed quickly without the need to read the full document collection, in particular, multi documents. In the recent past, soft computing-based approaches have gained popularity in its ability to determine important information across documents. A number of studies have modelled summarization systems based on fuzzy logic reasoning in order to select important sentences to be included in the summary. Although past studies support the benefits of employing fuzzy based reasoning for extracting important sentences from the document, there is a limitation concerning this method. Human or linguistic experts are required to determine the rules for the fuzzy system. Furthermore, the membership functions need to be manually tuned. These can be a very tedious and time-consuming process. Moreover, the performance of the fuzzy system can be affected by the choice of rules and parameters of membership function. Therefore, this study proposes a text summarization model based on classification using neuro-fuzzy approach. A classifier is first trained to identify summary sentences. Then, we use the proposed model to score and filter high-quality summary sentences. We compare the performance of our proposed model with the existing approaches, which are based on fuzzy logic and neural network techniques. In this study, we also evaluate the performance of sentence scoring and clustering in the process of generating text summaries. The proposed neuro-fuzzy model was used to score the sentences and clustering were performed using K-Means and Hierarchical Clustering (HC) approaches. The proposed approach showed improved results compared to the previous techniques in terms of precision, recall and F-measure on the Document Understanding Conference (DUC) data corpus. However, it was found that no improvements in the quality of the generated summaries obtained by simply performing clustering

    Hybrid Optimization Based Hindi Document Summarization Using Deep Learning Technique

    Get PDF
    The proliferation of textual information today is a result of the internet's recent development, which is widely accessible to anybody, at any time. Generally speaking, several Natural Language Processing (NLP) techniques can be used to analyze the textual information that is offered on the basis of text documents. In recent years, various text summarization techniques have been implemented in English text documents but a little amount of work is carried out in Hindi text documents summarization. In this research investigation, the Coot Remora Optimization (CRO) technique based on Deep Recurrent Neural Network (DRNN) is used to summarize Hindi documents. Here, the CRO algorithm is used to train the DRNN, which is used to compute the sentence scores.The highest scored sentences are going to included in the summary. When compared to recent optimization algorithmic techniques, such as MCRMR-SSO, Graph-based_PSO, Genetic Algorithms (GA), and Political Elephant Herding Optimization (PEHO) based Deep Long Short Term Memory (DLSTM) algorithm, the developed method is shown to be superior. Additionally, three evaluation metrics such as precision, recall, f-measure are used to analyze the performance of the CRO based DRNN technique and obtained high performance

    Keyword Merging Based Multi Document Enhanced Summarization

    Get PDF
    Automatic text summarization is a wide research area. There are several ways in which one can characterize different approaches to text summarization: extractive and abstractive from single document or multi document. Summary is text that is produced from one or more text. Document summarization is a procedure that building coated version of document that gives respected data to the client, and multi-document summarization is to produce a summary conveying the larger part of data substance from a set of documents about an implicit or explicit primary point.This paper describes a system for the summarization of multiple documents. The system produces multi-document summaries using data merging techniques. For combining multiple document on same thing the system uses Bisecting k-means algorithm which works better than basic K-means algorithm.Our System uses Enhanced Summarization algorithm to summarize multiple document.The Enhanced algorithm is applied separately on each cluster. According to results this system gives better results as compared to NEWSUM algorithm. DOI: 10.17762/ijritcc2321-8169.150711

    Document Summarization Using NMF and Pseudo Relevance Feedback Based on K-Means Clustering

    Get PDF
    According to the increment of accessible text data source on the internet, it has increased the necessity of the automatic text document summarization. However, the performance of the automatic methods might be poor because the semantic gap between high level user's summary requirement and low level vector representation of machine exists. In this paper, to overcome that problem, we propose a new document summarization method using a pseudo relevance feedback based on clustering method and NMF (non-negative matrix factorization). Relevance feedback is effective technique to minimize the semantic gap of information processing, but the general relevance feedback needs an intervention of a user. Additionally, the refined query without user interference by pseudo relevance feedback may be biased. The proposed method provides an automatic relevance judgment to reformulate query using the clustering method for minimizing a bias of query expansion. The method also can improve the quality of document summarization since the summarized documents are influenced by the semantic features of documents and the expanded query. The experimental results demonstrate that the proposed method achieves better performance than the other document summarization methods

    OPTIMAL APPROACH FOR TEXT SUMMARIZATION

    Get PDF
    ABSTRACT Large amount of unstructured information is available on the internet. Retrieving relevant documents containing the required information is difficult, because of huge amount of data. The query-specific document summarization has become an important problem. It is difficult task for the user to go through all these documents, as the number of documents available on particular topic will be more Evaluation of algorithms will be performing on the basis of parameters like precision, recall, time, space complexity, and quality of summary. After evaluating these algorithms suggest better algorithm for summarization. So it will help to find the better query dependent clustering algorithm for text document summarization

    Language Identification: Contrivance Learning Process Using Web Based Disquisition

    Get PDF
    Language identification is the foremost task in the study of linguistics .The projections of language identification & conversions such as Google translate or any other hypothetical translator works in wonders. The mechanism of detecting the language performed by these translators is a real marvel. Hence in this divertissement it is of the primary importance to study the methods of identifying the language. In this paper, the methodologies of recognizing some of the Natural Languages such as English, Kannada, Hindi & Telugu is explained on the basis of N-Gram algorithm and the respective vowels and consonants of each of the languages are retrieved and stored for building the syntactic structure of the corpus
    corecore