680 research outputs found

    Semi-Supervised Learning For Identifying Opinions In Web Content

    Get PDF
    Thesis (Ph.D.) - Indiana University, Information Science, 2011Opinions published on the World Wide Web (Web) offer opportunities for detecting personal attitudes regarding topics, products, and services. The opinion detection literature indicates that both a large body of opinions and a wide variety of opinion features are essential for capturing subtle opinion information. Although a large amount of opinion-labeled data is preferable for opinion detection systems, opinion-labeled data is often limited, especially at sub-document levels, and manual annotation is tedious, expensive and error-prone. This shortage of opinion-labeled data is less challenging in some domains (e.g., movie reviews) than in others (e.g., blog posts). While a simple method for improving accuracy in challenging domains is to borrow opinion-labeled data from a non-target data domain, this approach often fails because of the domain transfer problem: Opinion detection strategies designed for one data domain generally do not perform well in another domain. However, while it is difficult to obtain opinion-labeled data, unlabeled user-generated opinion data are readily available. Semi-supervised learning (SSL) requires only limited labeled data to automatically label unlabeled data and has achieved promising results in various natural language processing (NLP) tasks, including traditional topic classification; but SSL has been applied in only a few opinion detection studies. This study investigates application of four different SSL algorithms in three types of Web content: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. SSL algorithms are also evaluated for their effectiveness in sparse data situations and domain adaptation. Research findings suggest that, when there is limited labeled data, SSL is a promising approach for opinion detection in Web content. Although the contributions of SSL varied across data domains, significant improvement was demonstrated for the most challenging data domain--the blogosphere--when a domain transfer-based SSL strategy was implemented

    Cross-domain sentiment classification using a sentiment sensitive thesaurus

    Get PDF
    Automatic classification of sentiment is important for numerous applications such as opinion mining, opinion summarization, contextual advertising, and market analysis. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is costly. Applying a sentiment classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance. We propose a method to overcome this problem in cross-domain sentiment classification. First, we create a sentiment sensitive distributional thesaurus using labeled data for the source domains and unlabeled data for both source and target domains. Sentiment sensitivity is achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words. Next, we use the created thesaurus to expand feature vectors during train and test times in a binary classifier. The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods. We conduct an extensive empirical analysis of the proposed method on single and multi-source domain adaptation, unsupervised and supervised domain adaptation, and numerous similarity measures for creating the sentiment sensitive thesaurus

    Identification of Consumer Adverse Drug Reaction Messages on Social Media

    Get PDF
    The prevalence of social media has resulted in spikes of data on the Internet which can have potential use to assist in many aspects of human life. One prospective use of the data is in the development of an early warning system to monitor consumer Adverse Drug Reactions (ADRs). The direct reporting of ADRs by consumers is playing an increasingly important role in the world of pharmacovigilance. Social media provides patients a platform to exchange their experiences regarding the use of certain drugs. However, the messages posted on those social media networks contain both ADR related messages (positive examples) and non-ADR related messages (negative examples). In this paper, we integrate text mining and partially supervised learning methods to automatically extract and classify messages posted on social media networks into positive and negative examples. Our findings can provide managerial insights into how social media analytics can improve not only postmarketing surveillance, but also other problem domains where large quantity of user-generated content is available

    ChangeMyView Through Concessions: Do Concessions Increase Persuasion?

    Get PDF
    In Discourse Studies concessions are considered among those argumentative strategies that increase persuasion. We aim to empirically test this hypothesis by calculating the distribution of argumentative concessions in persuasive vs. non-persuasive comments from the the ChangeMyView subreddit. This constitutes a challenging task since concessions do not always bear an argumentative role and are expressed through polysemous lexical markers. Drawing from a theoretically-informed typology of concessions, we first conduct a crowdsourcing task to label a set of polysemous lexical markers as introducing an argumentative concession relation or not. Second, we present a self-training method to automatically identify argumentative concessions using linguistically motivated features. While we achieve a moderate F1 of 57.4% via the self-training method, our subsequent error analysis highlights that the self training method is able to generalize and identify other types of concessions that are argumentative, but were not considered in the annotation guidelines. Our findings from the manual labeling and the classification experiments indicate that the type of argumentative concessions we investigated is almost equally likely to be used in winning and losing arguments. While this result seems to contradict theoretical assumptions, we provide some reasons related to the ChangeMyView subreddit
    corecore