19 research outputs found

    An Effective Sentence Ordering Approach For Multi-Document Summarization Using Text Entailment

    Get PDF
    With the rapid development of modern technology electronically available textual information has increased to a considerable amount. Summarization of textual inform ation manually from unstructured text sources creates overhead to the user, therefore a systematic approach is required. Summarization is an approach that focuses on providing the user with a condensed version of the origina l text but in real time applicat ions extended document summarization is required for summarizing the text from multiple documents. The main focus of multi - document summarization is sentence ordering and ranking that arranges the collected sentences from multiple document in order to gene rate a well - organized summary. The improper order of extracted sentences significantly degrades readability and understandability of the summary. The existing system does multi document summarization by combining several preference measures such as chronology, probabilistic, precedence, succession, topical closeness experts to calculate the preference value between sentences. These approach to sent ence ordering and ranking does not address context based similarity measure between sentences which is very ess ential for effective summarization. The proposed system addresses this issues through textual entailment expert system. This approach builds an entailment model which incorpo rates the cause and effect between sentences in the documents using the symmetric measure such as cosine similarity and non - symmetric measures such as unigram match, bigram match, longest common sub - sequence, skip gram match, stemming. The proposed system is efficient in providing user with a contextual summary which significantly impro ves the readability and understandability of the final coherent summa

    Evaluating Centering for Information Ordering Using Corpora

    Get PDF
    In this article we discuss several metrics of coherence defined using centering theory and investigate the usefulness of such metrics for information ordering in automatic text generation. We estimate empirically which is the most promising metric and how useful this metric is using a general methodology applied on several corpora. Our main result is that the simplest metric (which relies exclusively on NOCB transitions) sets a robust baseline that cannot be outperformed by other metrics which make use of additional centering-based features. This baseline can be used for the development of both text-to-text and concept-to-text generation systems. </jats:p


    Get PDF
    Most of the document summary are arranged extractive by taking important sentences from the document. Extractive based summarization often not consider the connection sentence.  A good sentence ordering should aware about rhetorical relations such as cause-effect relation, topical relevancy and chronological sequence which exist between the sentences.  Based on this problem, we propose a new method for sentence ordering in multi document summarization using cluster correlation and probability for English documents. Sentences of multi-documents are grouped based on similarity into clusters. Sentence extracted from each cluster to be a summary that will be listed based on cluster correlation and probability. User evaluation showed that the summary result of proposed method easier to understanding than the previous method. The result of ROUGE method also shows increase on sentence arrangement compared to previous method

    Joint semantic discourse models for automatic multi-document summarization

    Get PDF
    Automatic multi-document summarization aims at selecting the essential content of related documents and presenting it in a summary. In this paper, we propose some methods for automatic summarization based on Rhetorical Structure Theory and Cross-document Structure Theory. They are chosen in order to properly address the relevance of information, multidocument phenomena and subtopical distribution in the source texts. The results show that using semantic discourse knowledge in strategies for content selection produces summaries that are more informative.Sumarização automática multidocumento visa à seleção das informações mais importantes de um conjunto de documentos para produzir um sumário. Neste artigo, propõem-se métodos para sumarização automática baseando-se em conhecimento semântico-discursivo das teorias Rhetorical Structure Theory e Cross-document Structure Theory. Tais teorias foram escolhidas para tratar adequadamente a relevância das informações, os fenômenos multidocumento e a distribuição de subtópicos dos documentos. Os resultados mostram que o uso de conhecimento semântico-discursivo para selecionar conteúdo produz sumários mais informativos.FAPESPCAPE


    Full text link