10 research outputs found

    Sentence-level sentiment tagging across different domains and genres

    Get PDF
    The demand for information about sentiment expressed in texts has stimulated a growing interest into automatic sentiment analysis in Natural Language Processing (NLP). This dissertation is motivated by an unmet need for high-performance domain-independent sentiment taggers and by pressing theoretical questions in NLP, where the exploration of limitations of specific approaches, as well as synergies between them, remain practically unaddressed. This study focuses on sentiment tagging at the sentence level and covers four genres: news, blogs, movie reviews, and product reviews. It draws comparisons between sentiment annotation at different linguistic levels (words, sentences, and texts) and highlights the key differences between supervised machine learning methods that rely on annotated corpora (corpus-based, CBA) and lexicon-based approaches (LBA) to sentiment tagging. Exploring the performance of supervised corpus-based approach to sentiment tagging, this study highlights the strong domain-dependence of the CBA. I present the development of LBA approaches based on general lexicons, such as WordNet, as a potential solution to the domain portability problem. A system for sentiment marker extraction from WordNet's relations and glosses is developed and used to acquire lists for a lexicon-based system for sentiment annotation at the sentence and text levels. It demonstrates that LBA's performance across domains is more stable than that of CBA. Finally, the study proposes an integration of LBA and CBA in an ensemble of classifiers using a precision-based voting technique that allows the ensemble system to incorporate the best features of both CBA and LBA. This combined approach outperforms both base learners and provides a promising solution to the domain-adaptation problem. The study contributes to NLP (1) by developing algorithms for automatic acquisition of sentiment-laden words from dictionary definitions; (2) by conducting a systematic study of approaches to sentiment classification and of factors affecting their performance; (3) by refining the lexicon-based approach by introducing valence shifter handling and parse tree information; and (4) by development of the combined, CBA/LBA approach that brings together the strengths of the two approaches and allows domain-adaptation with limited amounts of labeled training data

    Abstract All Blogs Are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs

    No full text
    One of the essential characteristics of blogs is their subjectivity, which makes blogs a particularly interesting domain for research on automatic sentiment determination. In this paper, we explore the properties of two most common subgenres of blogs – personal diaries and “notebooks ” – and the effects that these properties have on performance of an automatic sentiment annotation system, which we developed for binary (positive vs. negative) and ternary (positive vs. negative vs. neutral) classification of sentiment at the sentence level. We also investigate the differential effect of inclusion of negations and other valence shifters on the performance of our system on these two subgenres of blogs

    Knowledge acquisition for dynamic personalization in e-commerce

    Get PDF
    Information technology is playing an increasingly important role in today's world. Commerce through Internet is not an exception to this phenomenon. Currently the focus in the retailer e-commerce is shifting toward catering to the needs of repeat customers by offering them more personalized services. One of the barriers to such an individualized approach to each customer is the difficulty of collecting information about individual users. This thesis addresses this knowledge acquisition problem. Based on a thorough analysis of different kinds of knowledge acquisition tools and techniques, we propose an architecture that allows the use of a combination of different approaches for knowledge acquisition about users in e-commerce. This architecture is designed to support dynamic adaptation of the user profile to changes in the user interests as well as in the store. The architecture is based on two core concepts, namely dynamic personalization and software agent-support. To reduce the time and effort put in the process of knowledge acquisition by the user and by the knowledge engineer software agents in the proposed architecture assist in different aspects of the process, such as profile initialization, processing of results discovered by the web mining, making changes to user profile and tracking their effects, and in trust related issues and interaction with other agents and systems. A proof of concept prototype has been implemented to demonstrate the feasibility of the architectur

    Mining WordNet for Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses

    No full text
    Many of the tasks required for semantic tagging of phrases and texts rely on a list of words annotated with some semantic features. We present a method for extracting sentiment-bearing adjectives from WordNet using the Sentiment Tag Extraction Program (STEP). We did 58 STEP runs on unique non-intersecting seed lists drawn from manually annotated list of positive and negative adjectives and evaluated the results against other manually annotated lists. The 58 runs were then collapsed into a single set of 7, 813 unique words. For each word we computed a Net Overlap Score by subtracting the total number of runs assigning this word a negative sentiment from the total of the runs that consider it positive. We demonstrate that Net Overlap Score can be used as a measure of the words degree of membership in the fuzzy category of sentiment: the core adjectives, which had the highest Net Overlap scores, were identified most accurately both by STEP and by human annotators, while the words on the periphery of the category had the lowest scores and were associated with low rates of inter-annotator agreement
    corecore