6 research outputs found

    A literature survey of methods for analysis of subjective language

    Get PDF
    Subjective language is used to express attitudes and opinions towards things, ideas and people. While content and topic centred natural language processing is now part of everyday life, analysis of subjective aspects of natural language have until recently been largely neglected by the research community. The explosive growth of personal blogs, consumer opinion sites and social network applications in the last years, have however created increased interest in subjective language analysis. This paper provides an overview of recent research conducted in the area

    Sistema de clasificación automática de críticas de cine

    Get PDF
    Considerada inicialmente una subdisciplina de la tarea de clasificación de documentos, en los últimos años la clasificación de documentos basada en la opinión (conocida en inglés bajo los nombres de sentiment classification, sentiment analysis u opinion mining) ha sido objeto de un creciente interés por parte de la comunidad de investigadores del procesamiento del lenguaje natural. El creciente interés por el procesamiento automático de las opiniones contenidas en documentos de texto, es en parte consecuencia del aumento exponencial de contenidos generados por usuarios en la Web 2.0, y por el interés, entre otros, de empresas y administraciones públicas en analizar, filtrar o detectar automáticamente las opiniones vertidas por sus clientes o ciudadanos. Este Proyecto de Fin de Carrera tiene como objetivo el diseño y la implementación de un sistema de clasificación automática de textos de opinión, concretamente de críticas cinematográficas vertidas por usuarios de internet, recogidas en diferentes webs dedicadas a tal fin. Los documentos serán clasificados, en una de las categorías definidas en el sistema (de acuerdo a la orientación afectiva de las críticas), aplicando diversas técnicas para el procesamiento del lenguaje natural (se aplicará en un caso el algoritmo kNN y en otro caso se hará uso de un diccionario afectivo). El hecho de conseguir un sistema automático de clasificación evitará la intervención humana y aumentará la rapidez con que se pueden procesar este tipo de documentos. Con la realización de este proyecto, se comprobarán y analizarán también las dificultades encontradas en la implementación de un sistema de clasificación automática donde la naturaleza de los textos es de opinión. ____________________________________________________As a subfield of document classification, Opinion based document classification (also known as sentiment classification, sentiment analysis or opinion mining) has been object of an increasing interest over the last years by the natural language research community. This focus on automatic opinion detection in text documents is due to the exponential increase of contents produced by Web 2.0 users, as well as to the interest of companies and public administrations to be able to analyse, filter or detect opinions expressed by their clients or citizens. The aim of this project is the design and implementation of an automatic opinion classification system, specifically, the classification of film reviews written by internet users that have been collected among different specialized websites. The documents will be classified into one of the defined system’s categories (according to the review’s affective orientation), applying diverse techniques for the natural language processing (both a kNN algorithm and an affective dictionary will be used). Such a kind of automatic classification system avoids any human intervention and considerably decreases the document’s manipulation time. Problems and difficulties found while implementing the system will be thoroughly commented and analysed.Ingeniería de Telecomunicació

    Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

    Get PDF
    Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language
    corecore