4,207 research outputs found
Exploring the use of paragraph-level annotations for sentiment analysis of financial blogs
In this paper we describe our work in the area of topic-based sentiment analysis in the domain of financial blogs. We explore the use of paragraph-level and document-level annotations, examining how additional information from paragraph-level annotations can be used to increase the accuracy of document-level sentiment classification. We acknowledge the additional effort required to provide these paragraph-level annotations, and so we compare these findings against an automatic means of generating topic-specific sub-documents
Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data
In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the modelās ability to generalise to other data genres
Return of Frustratingly Easy Domain Adaptation
Unlike human learning, machine learning often fails to handle changes between
training (source) and test (target) input distributions. Such domain shifts,
common in practical scenarios, severely damage the performance of conventional
machine learning methods. Supervised domain adaptation methods have been
proposed for the case when the target data have labels, including some that
perform very well despite being "frustratingly easy" to implement. However, in
practice, the target domain is often unlabeled, requiring unsupervised
adaptation. We propose a simple, effective, and efficient method for
unsupervised domain adaptation called CORrelation ALignment (CORAL). CORAL
minimizes domain shift by aligning the second-order statistics of source and
target distributions, without requiring any target labels. Even though it is
extraordinarily simple--it can be implemented in four lines of Matlab
code--CORAL performs remarkably well in extensive evaluations on standard
benchmark datasets.Comment: Fixed typos. Full paper to appear in AAAI-16. Extended Abstract of
the full paper to appear in TASK-CV 2015 worksho
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
- ā¦