12,385 research outputs found
Latent sentiment model for weakly-supervised cross-lingual sentiment classification
In this paper, we present a novel weakly-supervised method for crosslingual sentiment analysis. In specific, we propose a latent sentiment model (LSM) based on latent Dirichlet allocation where sentiment labels are considered as topics. Prior information extracted from English sentiment lexicons through machine translation are incorporated into LSM model learning, where preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. An efficient parameter estimation procedure using variational Bayes is presented. Experimental results on the Chinese product reviews show that the weakly-supervised LSM model performs comparably to supervised classifiers such as Support vector Machines with an average of 81% accuracy achieved over a total of 5484 review documents. Moreover, starting with a generic sentiment lexicon, the LSM model is able to extract highly domainspecific polarity words from text
Language Without Words: A Pointillist Model for Natural Language Processing
This paper explores two separate questions: Can we perform natural language
processing tasks without a lexicon?; and, Should we? Existing natural language
processing techniques are either based on words as units or use units such as
grams only for basic classification tasks. How close can a machine come to
reasoning about the meanings of words and phrases in a corpus without using any
lexicon, based only on grams?
Our own motivation for posing this question is based on our efforts to find
popular trends in words and phrases from online Chinese social media. This form
of written Chinese uses so many neologisms, creative character placements, and
combinations of writing systems that it has been dubbed the "Martian Language."
Readers must often use visual queues, audible queues from reading out loud, and
their knowledge and understanding of current events to understand a post. For
analysis of popular trends, the specific problem is that it is difficult to
build a lexicon when the invention of new ways to refer to a word or concept is
easy and common. For natural language processing in general, we argue in this
paper that new uses of language in social media will challenge machines'
abilities to operate with words as the basic unit of understanding, not only in
Chinese but potentially in other languages.Comment: 5 pages, 2 figure
- …