37 research outputs found

    Building Tagged Linguistic Unit Databases for Sentiment Detection

    Get PDF
    Despite the obvious business value of visualizing similarities between elements of evolving information spaces and mapping these similarities e.g. onto geospatial reference systems, analysts are often more interested in how the semantic orientation (sentiment) towards an organization, a product or a particular technology is changing over time. Unfortunately, popular methods that process unstructured textual material to detect semantic orientation automatically based on tagged dictionaries are not capable of fulfilling this task, even when coupled with part-of-speech tagging, a standard component of most text processing toolkits that distinguishes grammatical categories such as article (AT), noun (NN), verb (VB), and adverb (RB). Small corpus size, ambiguity and subtle incremental change of tonal expressions between different versions of a document complicate the detection of semantic orientation and often prevent promising algorithms from being incorporated into commercial applications. Parsing grammatical structures, by contrast, outperforms dictionary-based approaches in terms of reliability, but usually suffers from poor scalability due to their computational complexity. This paper addresses this predicament by presenting an alternative approach based on automatically building Tagged Linguistic Unit (TLU) databases to overcome the restrictions of dictionaries with a limited set of tagged tokens

    Rule-based Opinion Target and Aspect Extraction to Acquire Affective Knowledge

    Get PDF
    Opinion holder and opinion target extraction are among the most popular and challenging problems tackled by opinion mining researchers, recognizing the significant business value of such components and their importance for applications such as media monitoring and Web intelligence. This paper describes an approach that combines opinion target extraction with aspect extraction using syntactic patterns. It expands previous work limited by sentence boundaries and includes a heuristic for anaphora resolution to identify targets across sentences. Furthermore, it demonstrates the application of concepts known from research on open information extraction to the identification of relevant opinion aspects. Qualitative analyses performed on a corpus of 100 000 Amazon product reviews show that the approach is promising. The extracted opinion targets and aspects are useful for enriching common knowledge resources and opinion mining ontologies, and support practitioners and researchers to identify opinions in document collections

    A Context-Dependent Supervised Learning Approach to Sentiment Detection in Large Textual Databases

    Get PDF
    Sentiment detection automatically identifies emotions in textual data. The increasing amount of emotive documents available in corporate databases and on the World Wide Web calls for automated methods to process this important source of knowledge. Sentiment detection draws attention from researchers and practitioners alike - to enrich business intelligence applications, for example, or to asure the impact of customer reviews on purchasing decisions. Most sentiment detection approaches do not consider language ambiguity, despite the fact that one and the same sentiment term might differ in polarity depending on the context, in which a statement is made. To address this shortcoming, this paper introduces a novel method that uses Naïve Bayes to identify ambiguous terms. A contextualized sentiment lexicon stores the polarity of these terms, together with a set of co-occurring context terms. A formal evaluation of the assigned polarities confirms that considering the usage context of ambiguous terms improves the accuracy of high-throughput sentiment detection methods. Such methods are a prerequisite for using sentiment as a metadata element in storage and distributed file-level intelligence applications, as well as in enterprise portals that provide a semantic repository of an organization's information assets

    Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons

    Get PDF
    Sentiment detection analyzes the positive or negative polarity of text. The field has received considerable attention in recent years, since it plays an important role in providing means to assess user opinions regarding an organisation's products, services, or actions. Approaches towards sentiment detection include machine learning techniques as well as computationally less expensive methods. The latter rely on the use of language-specific sentiment lexicons, which are lists of sentiment terms with their corresponding sentiment value. The effort involved in creating, customizing, and extending sentiment lexicons is considerable, particularly if less common languages and domains are targeted without access to appropriate language resources. This paper proposes a semi-automatic approach for the creation of sentiment lexicons which assigns sentiment values to sentiment terms via crowdsourcing. Furthermore, it introduces a bootstrapping process operating on unlabeled domain documents to extend the created lexicons, and to customize them according to the particular use case. This process considers sentiment terms as well as sentiment indicators occurring in the discourse surrounding a particular topic. Such indicators are associated with a positive or negative context in a particular domain, but might have a neutral connotation in other domains. A formal evaluation shows that bootstrapping considerably improves the method's recall. Automatically created lexicons yield a performance comparable to professionally created language resources such as the General Inquirer

    Cross-Domain Contextualisation of Sentiment Lexicons

    Get PDF
    The simplicity of using Web 2.0 platforms and services has resulted in an abundance of user-generated content. A significant part of this content contains user opinions with clear economic relevance - customer and travel reviews, for example, or the articles of well-known and respected bloggers who influence purchase decisions. Analyzing and acting upon user-generated content is becoming imperative for marketers and social scientists who aim to gather feedback from very large user communities. Sentiment detection, as part of opinion mining, supports these efforts by identifying and aggregating polar opinions - i.e., positive or negative statements about facts. For achieving accurate results, sentiment detection requires a correct interpretation of language, which remains a challenging task due to the inherent ambiguities of human languages. Particular attention has to be directed to the context of opinionated terms when trying to resolve these ambiguities. Contextualized sentiment lexicons address this need by considering the sentiment term's context in their evaluation but are usually limited to one domain, as many contextualizations are not stable across domains. This paper introduces a method which identifies unstable contextualizations and refines the contextualized sentiment dictionaries accordingly, eliminating the need for specific training data for each individual domain. An extensive evaluation compares the accuracy of this approach with results obtained from domain-specific corpora

    Extracting Opinion Targets from Environmental Web Coverage and Social Media Streams

    Get PDF
    Policy makers and environmental organizations have a keen interest in awareness building and the evolution of stakeholder opinions on environmental issues. Mere polarity detection, as provided by many existing methods, does not suffice to understand the emergence of collective awareness. Methods for extracting affective knowledge should be able to pinpoint opinion targets within a thread. Opinion target extraction provides a more accurate and fine-grained identification of opinions expressed in online media. This paper compares two different approaches for identifying potential opinion targets and applies them to comments from the YouTube video sharing platform. The first approach is based on statistical keyword analysis in conjunction with sentiment classification on the sentence level. The second approach uses dependency parsing to pinpoint the target of an opinionated term. A case study based on YouTube postings applies the developed methods and measures their ability to handle noisy input data from social media streams

    Extracting and Grounding Context-Aware Sentiment Lexicons

    Get PDF
    Web intelligence applications track online sources with economic relevance such as customer reviews, news articles and social media postings. Automated sentiment analysis based on lexical methods or machine learning identifies the polarity of opinions expressed in these sources to assess how stakeholders perceive a topic. This paper introduces a hybrid approach that combines the throughput of lexical analysis with the flexibility of machine learning to resolve ambiguity and consider the context of sentiment terms. The context-aware method identifies ambiguous terms that vary in polarity depending on the context and stores them in contextualized sentiment lexicons. In conjunction with semantic knowledge bases, these lexicons help ground ambiguous sentiment terms to concepts that correspond to their polarity. This grounding paves the way for interlinking, extending, or even replacing contextualized sentiment lexicons with semantic knowledge bases. An extensive evaluation applies the method to user reviews across three domains (movies, products and hotels)

    Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources

    Get PDF
    Games with a purpose are an increasingly popular mechanism for leveraging the wisdom of the crowds to address tasks which are trivial for humans but still not solvable by computer algorithms in a satisfying manner. As a novel mechanism for structuring human-computer interactions, a key challenge when creating them is motivating users to participate while generating useful and unbiased results. This paper focuses on important design choices and success factors of effective games with a purpose. Our findings are based on lessons learned while developing and deploying Sentiment Quiz, a crowdsourcing application for creating sentiment lexicons (an essential component of most sentiment detection algorithms). We describe the goals and structure of the game, the underlying application framework, the sentiment lexicons gathered through crowdsourcing, as well as a novel approach to automatically extend the lexicons by means of a bootstrapping process. Such an automated extension further increases the efficiency of the acquisition process by limiting the number of terms that need to be gathered from the game participants

    Incremental and Scalable Computation of Dynamic Topography Information Landscapes

    Get PDF
    Dynamic topography information landscapes are capable of visualizing longitudinal changes in large document repositories. Resembling tectonic processes in the natural world, dynamic rendering reflects both long-term trends and short-term fluctuations in such repositories. To visualize the rise and decay of topics, the mapping algorithm elevates and lowers related sets of concentric contour lines. Acknowledging the growing number of documents to be processed by state-of-the-art Web intelligence applications, we present a scalable, incremental approach for generating such landscapes. The processing pipeline includes a number of sequential tasks, from crawling, filtering and pre-processing Web content to projecting, labeling and rendering the aggregated information. Processing steps central to incremental processing are found in the projection stage which consists of document clustering, cluster force-directed placement, and fast document positioning. We introduce two different positioning methods and compare them in an incremental setting using two different quality measures. The evaluation is performed on a set of approximately 5000 documents taken from the environmental blog sample of the Media Watch on Climate Change (www.ecoresearch.net/climate), a Web content aggregator about climate change and related environmental issues that serves static versions of the information landscapes presented in this paper as part of a multiple coordinated view representation

    Visualizing Contextual and Dynamic Features of Micropost Streams

    Get PDF
    Visual techniques provide an intuitive way of making sense of the large amounts of microposts available from social media sources, particularly in the case of emerging topics of interest to a global audience, which often raise controversy among key stakeholders. Micropost streams are context-dependent and highly dynamic in nature. We describe a visual analytics platform to handle high-volume micropost streams from multiple social media channels. For each post we extract key contextual features such as location, topic and sentiment, and subsequently render the resulting multi-dimensional information space using a suite of coordinated views that support a variety of complex information seeking behaviors. We also describe three new visualization techniques that extend the original platform to account for the dynamic nature of micro¬post streams through dynamic topography information landscapes, news flow diagrams and longitudinal cross-media analyses
    corecore