22,504 research outputs found

    Resources for Evaluation of Summarization Techniques

    Full text link
    We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with construction of the test set which apply broadly to the construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st

    Living Knowledge

    Get PDF
    Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following

    MIsA : multilingual 'IsA' extraction from Corpora

    Get PDF

    A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

    Full text link
    Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research

    Rude waiter but mouthwatering pastries! An exploratory study into Dutch aspect-based sentiment analysis

    Get PDF
    The fine-grained task of automatically detecting all sentiment expressions within a given document and the aspects to which they refer is known as aspect-based sentiment analysis. In this paper we present the first full aspect-based sentiment analysis pipeline for Dutch and apply it to customer reviews. To this purpose, we collected reviews from two different domains, i.e. restaurant and smartphone reviews. Both corpora have been manually annotated using newly developed guidelines that comply to standard practices in the field. For our experimental pipeline we perceive aspect-based sentiment analysis as a task consisting of three main subtasks which have to be tackled incrementally: aspect term extraction, aspect category classification and polarity classification. First experiments on our Dutch restaurant corpus reveal that this is indeed a feasible approach that yields promising results
    • …
    corecore