4,680 research outputs found

    Multi-polarity text segmentation using graph theory

    Full text link
    Text segmentation, or named text binarization, is usually an essential step for text information extraction from images and videos. However, most existing text segmentation methods have difficulties in extracting multi-polarity texts, where multi-polarity texts mean those texts with multiple colors or intensities in the same line. In this paper, we propose a novel algorithm for multi-polarity text segmentation based on graph theory. By representing a text image with an undirected weighted graph and partitioning it iteratively, multi-polarity text image can be effectively split into several single-polarity text images. As a result, these text images are then segmented by single-polarity text segmentation algorithms. Experiments on thousands of multi-polarity text images show that our algorithm can effectively segment multi-polarity texts. ? 2008 IEEE.EI

    Information extraction from multimedia web documents: an open-source platform and testbed

    No full text
    The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval

    Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis

    Full text link
    This paper fills a gap in aspect-based sentiment analysis and aims to present a new method for preparing and analysing texts concerning opinion and generating user-friendly descriptive reports in natural language. We present a comprehensive set of techniques derived from Rhetorical Structure Theory and sentiment analysis to extract aspects from textual opinions and then build an abstractive summary of a set of opinions. Moreover, we propose aspect-aspect graphs to evaluate the importance of aspects and to filter out unimportant ones from the summary. Additionally, the paper presents a prototype solution of data flow with interesting and valuable results. The proposed method's results proved the high accuracy of aspect detection when applied to the gold standard dataset

    Evaluation in Discourse: a Corpus-Based Study

    Get PDF
    This paper describes the CASOAR corpus, the first manually annotated corpus that explores the impact of discourse structure on sentiment analysis with a study of movie reviews in French and in English as well as letters to the editor in French. While annotating opinions at the expression, the sentence or the document level is a well-established task and relatively straightforward, discourse annotation remains difficult, especially for non-experts. Therefore, combining both annotations poses several methodological problems that we address here. We propose a multi-layered annotation scheme that includes: the complete discourse structure according to the Segmented Discourse Representation Theory, the opinion orientation of elementary discourse units and opinion expressions, and their associated features. We detail each layer, explore the interactions between them and discuss our results. In particular, we examine the correlation between discourse and semantic category of opinion expressions, the impact of discourse relations on both subjectivity and polarity analysis and the impact of discourse on the determination of the overall opinion of a document. Our results demonstrate that discourse is an important cue for sentiment analysis, at least for the corpus genres we have studied

    Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis

    Get PDF
    We consider the task of fine-grained sentiment analysis from the perspective of multiple instance learning (MIL). Our neural model is trained on document sentiment labels, and learns to predict the sentiment of text segments, i.e. sentences or elementary discourse units (EDUs), without segment-level supervision. We introduce an attention-based polarity scoring method for identifying positive and negative text snippets and a new dataset which we call SPOT (as shorthand for Segment-level POlariTy annotations) for evaluating MIL-style sentiment models like ours. Experimental results demonstrate superior performance against multiple baselines, whereas a judgement elicitation study shows that EDU-level opinion extraction produces more informative summaries than sentence-based alternatives.Comment: Final published version. Please cite using appropriate date (2018). Link to journal: http://www.transacl.org/ojs/index.php/tacl/article/view/1225/27
    corecore