36 research outputs found

    Multilingual subjectivity analysis using machine translation

    Get PDF
    Although research in other languages is increasing, much of the work in subjectivity analysis has been applied to English data, mainly due to the large body of electronic resources and tools that are available for this language. In this paper, we propose and evaluate methods that can be employed to transfer a repository of subjectivity resources across languages. Specifically, we attempt to leverage on the resources available for English and, by employing machine translation, generate resources for subjectivity analysis in other languages. Through comparative evaluations on two different languages (Romanian and Spanish), we show that automatic translation is a viable alternative for the construction of resources and tools for subjectivity analysis in a new target language.

    Topic-dependent sentiment analysis of financial blogs

    Get PDF
    While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches

    Crowdsourced real-world sensing: sentiment analysis and the real-time web

    Get PDF
    The advent of the real-time web is proving both challeng- ing and at the same time disruptive for a number of areas of research, notably information retrieval and web data mining. As an area of research reaching maturity, sentiment analysis oers a promising direction for modelling the text content available in real-time streams. This paper reviews the real-time web as a new area of focus for sentiment analysis and discusses the motivations and challenges behind such a direction

    Going beyond traditional QA systems: challenges and keys in opinion question answering

    Get PDF
    The treatment of factual data has been widely studied in different areas of Natural Language Processing (NLP). However, processing subjective information still poses important challenges. This paper presents research aimed at assessing techniques that have been suggested as appropriate in the context of subjective - Opinion Question Answering (OQA). We evaluate the performance of an OQA with these new components and propose methods to optimally tackle the issues encountered. We assess the impact of including additional resources and processes with the purpose of improving the system performance on two distinct blog datasets. The improvements obtained for the different combination of tools are statistically significant. We thus conclude that the proposed approach is adequate for the OQA task, offering a good strategy to deal with opinionated questions.This paper has been partially supported by Ministerio de Ciencia e Innovación - Spanish Government (grant no. TIN2009-13391-C04-01), and Conselleria d'Educación - Generalitat Valenciana (grant no. PROMETEO/2009/119 and ACOMP/2010/286)

    The good, the bad and the implicit: a comprehensive approach to annotating explicit and implicit sentiment

    Get PDF
    We present a fine-grained scheme for the annotation of polar sentiment in text, that accounts for explicit sentiment (so-called private states), as well as implicit expressions of sentiment (polar facts). Polar expressions are annotated below sentence level and classified according to their subjectivity status. Additionally, they are linked to one or more targets with a specific polar orientation and intensity. Other components of the annotation scheme include source attribution and the identification and classification of expressions that modify polarity. In previous research, little attention has been given to implicit sentiment, which represents a substantial amount of the polar expressions encountered in our data. An English and Dutch corpus of financial newswire, consisting of over 45,000 words each, was annotated using our scheme. A subset of this corpus was used to conduct an inter-annotator agreement study, which demonstrated that the proposed scheme can be used to reliably annotate explicit and implicit sentiment in real-world textual data, making the created corpora a useful resource for sentiment analysis

    Attribution: a computational approach

    Get PDF
    Our society is overwhelmed with an ever growing amount of information. Effective management of this information requires novel ways to filter and select the most relevant pieces of information. Some of this information can be associated with the source or sources expressing it. Sources and their relation to what they express affect information and whether we perceive it as relevant, biased or truthful. In news texts in particular, it is common practice to report third-party statements and opinions. Recognizing relations of attribution is therefore a necessary step toward detecting statements and opinions of specific sources and selecting and evaluating information on the basis of its source. The automatic identification of Attribution Relations has applications in numerous research areas. Quotation and opinion extraction, discourse and factuality have all partly addressed the annotation and identification of Attribution Relations. However, disjoint efforts have provided a partial and partly inaccurate picture of attribution. Moreover, these research efforts have generated small or incomplete resources, thus limiting the applicability of machine learning approaches. Existing approaches to extract Attribution Relations have focused on rule-based models, which are limited both in coverage and precision. This thesis presents a computational approach to attribution that recasts attribution extraction as the identification of the attributed text, its source and the lexical cue linking them in a relation. Drawing on preliminary data-driven investigation, I present a comprehensive lexicalised approach to attribution and further refine and test a previously defined annotation scheme. The scheme has been used to create a corpus annotated with Attribution Relations, with the goal of contributing a large and complete resource than can lay the foundations for future attribution studies. Based on this resource, I developed a system for the automatic extraction of attribution relations that surpasses traditional syntactic pattern-based approaches. The system is a pipeline of classification and sequence labelling models that identify and link each of the components of an attribution relation. The results show concrete opportunities for attribution-based applications

    Unsupervised and knowledge-poor approaches to sentiment analysis

    Get PDF
    Sentiment analysis focuses upon automatic classiffication of a document's sentiment (and more generally extraction of opinion from text). Ways of expressing sentiment have been shown to be dependent on what a document is about (domain-dependency). This complicates supervised methods for sentiment analysis which rely on extensive use of training data or linguistic resources that are usually either domain-specific or generic. Both kinds of resources prevent classiffiers from performing well across a range of domains, as this requires appropriate in-domain (domain-specific) data. This thesis presents a novel unsupervised, knowledge-poor approach to sentiment analysis aimed at creating a domain-independent and multilingual sentiment analysis system. The approach extracts domain-specific resources from documents that are to be processed, and uses them for sentiment analysis. This approach does not require any training corpora, large sets of rules or generic sentiment lexicons, which makes it domain- and languageindependent but at the same time able to utilise domain- and language-specific information. The thesis describes and tests the approach, which is applied to diffeerent data, including customer reviews of various types of products, reviews of films and books, and news items; and to four languages: Chinese, English, Russian and Japanese. The approach is applied not only to binary sentiment classiffication, but also to three-way sentiment classiffication (positive, negative and neutral), subjectivity classifiation of documents and sentences, and to the extraction of opinion holders and opinion targets. Experimental results suggest that the approach is often a viable alternative to supervised systems, especially when applied to large document collections
    corecore