Search CORE

23,477 research outputs found

A Case-Based Approach to Cross Domain Sentiment Classification

Author: Delany Sarah Jane
Ohana Bruno
Tierney Brendan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2012
Field of study

This paper considers the task of sentiment classification of subjective text across many domains, in particular on scenarios where no in-domain data is available. Motivated by the more general applicability of such methods, we propose an extensible approach to sentiment classification that leverages sentiment lexicons and out-of-domain data to build a case-based system where solutions to past cases are reused to predict the sentiment of new documents from an unknown domain. In our approach the case representation uses a set of features based on document statistics, while the case solution stores sentiment lexicons employed on past predictions allowing for later retrieval and reuse on similar documents. The case-based nature of our approach also allows for future improvements since new lexicons and classification methods can be added to the case base as they become available. On a cross domain experiment our method has shown robust results when compared to a baseline single-lexicon classifier where the lexicon has to be pre-selected for the domain in question

Arrow@TUDublin

A Case-Based Approach to Cross Domain Sentiment Classification

Author: A. Aamodt
A. Abbasi
A. Kennedy
G. Salton
G.A. Miller
J. Demšar
P.J. Stone
S. Baccianella
W. Chapman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Arrow@TUDublin

Cross-Lingual Adaptation using Structural Correspondence Learning

Author: Prettenhofer Peter
Stein Benno
Publication venue
Publication date: 25/08/2010
Field of study

Cross-lingual adaptation, a special case of domain adaptation, refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce cross-lingual feature correspondences. From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language. The main advantages of this approach over other approaches are its resource efficiency and task specificity. We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual correspondences

arXiv.org e-Print Archive

CiteSeerX

Automatic domain ontology extraction for context-sensitive opinion mining

Author: Lai Chapmann C.L.
Lau Raymond Y.K.
Li Yuefeng
Ma Jian
Publication venue
Publication date: 01/01/2009
Field of study

Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline

Queensland University of Technology ePrints Archive

AIS Electronic Library (AISeL)

Topic-dependent sentiment analysis of financial blogs

Author: Bermingham Adam
Davy Michael
Ferguson Paul
Gurrin Cathal
O'Hare Neil
Sheridan Páraic
Smeaton Alan F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Recommended from our members

On stopwords, filtering and data sparsity for sentiment analysis of Twitter

Author: Alani Harith
Fernández Miriam
He Yulan
Saif Hassan
Publication venue
Publication date: 01/01/2014
Field of study

Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier’s feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space

Open Research Online (The Open University)

Aston Publications Explorer

Research Directions, Challenges and Issues in Opinion Mining

Author: Hariharan Shanmugasundaram
Lu Joan
Sudhakaran Periakaruppan
Publication venue: 'Science and Engineering Research Support Society'
Publication date: 01/01/2013
Field of study

Rapid growth of Internet and availability of user reviews on the web for any product has provided a need for an effective system to analyze the web reviews. Such reviews are useful to some extent, promising both the customers and product manufacturers. For any popular product, the number of reviews can be in hundreds or even thousands. This creates difficulty for a customer to analyze them and make important decisions on whether to purchase the product or to not. Mining such product reviews or opinions is termed as opinion mining which is broadly classified into two main categories namely facts and opinions. Though there are several approaches for opinion mining, there remains a challenge to decide on the recommendation provided by the system. In this paper, we analyze the basics of opinion mining, challenges, pros & cons of past opinion mining systems and provide some directions for the future research work, focusing on the challenges and issues

Crossref

University of Huddersfield Repository