6,094 research outputs found

    Thumbs up? Sentiment Classification using Machine Learning Techniques

    Full text link
    We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.Comment: To appear in EMNLP-200

    Toward Optimal Feature Selection in Naive Bayes for Text Categorization

    Full text link
    Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MDMD) and MDχ2MD-\chi^2 methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure

    What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?

    Get PDF
    Purpose: The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint. Design/methodology/approach: A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint. Findings: The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior. Research limitations/implications: The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation. Originality/value: Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective
    corecore