45 research outputs found

    O desenvolvimento de um sistema computacional de sumarização multidocumento com base em um método linguisticamente motivado

    Get PDF
    This paper presents the studies conducted in the area of Natural Language Processing, more specifically, in Automatic Multi-document Summarization. We describe the steps for the production of a computational prototype, based on a linguistically motivated method, for summarizing news texts in Portuguese.Este trabalho apresenta os estudos realizados na área de Processamento de Linguagem Natural, mais especificamente, em Sumarização Automática Multidocumento. São descritos os passos para a produção de um protótipo computacional, baseado em um método linguisticamente motivado, para a produção de sumários de notícias jornalísticas escritas em português.FAPESPICMCPró-reitoria de Pesquis

    NILC_USP: aspect extraction using semantic labels

    Get PDF
    This paper details the system NILC USP that participated in the Semeval 2014: Aspect Based Sentiment Analysis task. This system uses a Conditional Random Field (CRF) algorithm for extracting the aspects mentioned in the text. Our work added semantic labels into a basic feature set for measuring the efficiency of those for aspect extraction. We used the semantic roles and the highest verb frame as features for the machine learning. Overall, our results demonstrated that the system could not improve with the use of this semantic information, but its precision was increased.FAPES

    Building Contrastive Summaries of Subjective Text Via Opinion Ranking

    Get PDF
    This article investigates methods to automatically compare entities from opinionated text to help users to obtain important information from a large amount of data, a task known as “contrastive opinion summarization”. The task aims at generating contrastive summaries that highlight differences between entities given opinionated text (written about each entity individually) where opinions have been previously identified. These summaries are made by selecting sentences from the input data. The core of the problem is to find out how to choose these more relevant sentences in an appropriate manner. The proposed method uses a heuristic that makesdecisions according to the opinions found in the input text and to traits that a summary is expected to present. The evaluation is made by measuring three characteristics that contrastive summaries are expected to have: representativity (presence of opinions that are frequent in the input), contrastivity (presence of opinions that highlight differences between entities) and diversity (presence of different opinions to avoid redundancy). The novel method is compared to methods previously published and performs significantly better than them according to the measures used. The main contributions of this work are: a comparative analysis of methods of contrastive opinion summarization, the proposal of a systematic way to evaluate summaries, the development of a new method that performs better than others previously known and the creation of a dataset for the task

    Enriching entity grids and graphs with discourse relations: the impact in local coherence evaluation

    Get PDF
    This paper describes how discursive knowledge, given by the discursive theories RST (Rhetorical Structure Theory) and CST (Crossdocument Structure Theory), may improve the automatic evaluation of local coherence in multi-document summaries. Two of the main coherence models from literature were incremented with discursive information and obtained 91.3% of accuracy, with a gain of 53% in relation to the original results.FAPES

    Joint semantic discourse models for automatic multi-document summarization

    Get PDF
    Automatic multi-document summarization aims at selecting the essential content of related documents and presenting it in a summary. In this paper, we propose some methods for automatic summarization based on Rhetorical Structure Theory and Cross-document Structure Theory. They are chosen in order to properly address the relevance of information, multidocument phenomena and subtopical distribution in the source texts. The results show that using semantic discourse knowledge in strategies for content selection produces summaries that are more informative.Sumarização automática multidocumento visa à seleção das informações mais importantes de um conjunto de documentos para produzir um sumário. Neste artigo, propõem-se métodos para sumarização automática baseando-se em conhecimento semântico-discursivo das teorias Rhetorical Structure Theory e Cross-document Structure Theory. Tais teorias foram escolhidas para tratar adequadamente a relevância das informações, os fenômenos multidocumento e a distribuição de subtópicos dos documentos. Os resultados mostram que o uso de conhecimento semântico-discursivo para selecionar conteúdo produz sumários mais informativos.FAPESPCAPE

    BuscaOpinioes: searching for opinions over the internet

    Get PDF
    This paper describes the BuscaOpinioes website, a tool for searching for opinions over the internet. Our system uses Google search engine to retrieve reviews from the internet and a lexicon-based sentimento analysis approach to identify opinions in these reviews. A web interface is available to visualize the results as well as some statistics.FAPES

    A discursive grid approach to model local coherence in multi-document summaries

    Get PDF
    Multi-document summarization is a very important area of Natural Language Processing (NLP) nowadays because of the huge amount of data in the web. People want more and more information and this information must be coherently organized and summarized. The main focus of this paper is to deal with the coherence of multi-document summaries. Therefore, a model that uses discursive information to automatically evaluate local coherence in multi-document summaries has been developed. This model obtains 92.69% of accuracy in distinguishing coherent from incoherent summaries, outperforming the state of the art in the area.CAPESFAPESPUniversity of Goiá

    On the Development and Evaluation of a Brazilian Portuguese Discourse Parser

    Get PDF
    We present in this paper the development process and the evaluation procedure of a Brazilian Portuguese discourse parser called DiZer. Based on Rhetorical Structure Theory, DiZer is a symbolic cue phrase-based analyzer that makes use of discourse templates learned from a corpus of scientific texts to identify and build the discourse structure of texts. DiZer evaluation shows satisfactory results for scientific and news texts, even tough it was not designed for the latter, which demonstrates DiZer portability.Apresentamos neste artigo o processo de desenvolvimento e avaliação de um analisador discursivo automático para o português brasileiro. Seguindo a Teoria de Estruturação Retórica, o DiZer é um sistema simbólico baseado na ocorrência de marcadores textuais, fazendo uso de templates discursivos extraídos de um corpus de textos científicos para identificar a construir a estrutura discursiva de textos. A avaliação do DiZer mostra resultados satisfatórios para textos científicos e jornalísticos, apesar do sistema não ter sido delineado para o gênero jornalístico, o que demonstra a portabilidade do sistema

    Exploring the subtopic-based relationship map strategy for multi-document summarization

    Get PDF
    In this paper we adapt and explore strategies for generating multi-document summaries based on relationship maps, which represent texts as graphs (maps) of interrelated segments and apply different traversing techniques for producing the summaries. In particular, we focus on the Segmented Bushy Path, a sophisticated method which tries to represent in a summary the main subtopics from source texts while keeping its informativeness. In addition, we also investigate some well-known subtopic segmentation and clustering techniques in order to correctly select the most relevant information to compose the final summary. We show that this subtopic-based method outperforms other methods for multi-document summarization and that achieves state of the art results, competing with the most sophisticated deep summarization methods in the area
    corecore