3 research outputs found

    A Review On Automatic Text Summarization Approaches

    Get PDF
    It has been more than 50 years since the initial investigation on automatic text summarization was started.Various techniques have been successfully used to extract the important contents from text document to represent document summary.In this study,we review some of the studies that have been conducted in this still-developing research area.It covers the basics of text summarization,the types of summarization,the methods that have been used and some areas in which text summarization has been applied.Furthermore,this paper also reviews the significant efforts which have been put in studies concerning sentence extraction,domain specific summarization and multi document summarization and provides the theoretical explanation and the fundamental concepts related to it.In addition,the advantages and limitations concerning the approaches commonly used for text summarization are also highlighted in this study

    Caracterização da complementaridade temporal: subsídios para sumarização automática multidocumento

    Get PDF
    Complementarity is a usual multi-document phenomenon that commonly occurs among news texts about the same event. From a set of sentence pairs (in Portuguese) manually annotated with CST (Cross-Document Structure Theory) relations (Historical background and Follow-up) that make explicit the temporal complementary among the sentences, we identified a potential set of linguistic attributes of such complementary. Using Machine Learning algorithms, we evaluate the capacity of the attributes to discriminate between Historical background and Follow-up. JRip learned a small set of rules with high accuracy. Based on a set of 5 rules, the classifier discriminates the CST relations with 80% of accuracy. According to the rules, the occurrence of temporal expression in sentence 2 is the most discriminative feature in the task. As a contribution, the JRip classifier can improve the performance of the CST-discourse parsers for Portuguese.A complementaridade é um fenômeno multidocumento comumente observado entre notícias que versam sobre um mesmo evento. A partir de um corpus em português composto por um conjunto de pares de sentenças manualmente anotadas com as relações da Cross-Document Structure Theory (CST) que explicitam a complementaridade temporal (Historical background e Follow-up), identificou-se um conjunto potencial de atributos linguísticos desse tipo de complementaridade. Por meio de algoritmos de Aprendizado de Máquina, testou-se o potencial dos atributos em distinguir as referidas relações. O classificador simbólico gerado pelo algoritmo JRip obteve o melhor desempenho ao se considerar a precisão e o tamanho reduzido do conjunto de regras. Somente com base em 5 regras, tal classificador identificou Follow-up e Historical background com precisão aproximada de 80%. Ademais, as regras do classificador indicam que o atributo ocorrência de expressão temporal na sentença 2 é o mais relevante para a tarefa. Como contribuição, salienta-se que o classificador JRip aqui gerado pode ser utilizado nos analisadores discursivos multidocumento para o português do Brasil que são baseados na CST

    Pertanika Journal of Science & Technology

    Get PDF
    corecore