5,525 research outputs found
A distributional and syntactic approach to fine-grained opinion mining
This thesis contributes to a larger social science research program of
analyzing the diffusion of IT innovations. We show how to
automatically discriminate portions of text dealing with opinions
about innovations by finding {source, target, opinion} triples in text.
In this context, we can discern a list of innovations as targets from
the domain itself. We can then use this list as an anchor for finding
the other two members of the triple at a ``fine-grained''
level---paragraph contexts or less.
We first demonstrate a vector space model for finding opinionated
contexts in which the innovation targets are mentioned. We can find
paragraph-level contexts by searching for an
``expresses-an-opinion-about'' relation between sources and targets
using a supervised model with an SVM that uses features derived from a
general-purpose subjectivity lexicon and a corpus indexing tool. We
show that our algorithm correctly filters the domain relevant subset
of subjectivity terms so that they are more highly valued.
We then turn to identifying the opinion. Typically, opinions in
opinion mining are taken to be positive or negative. We discuss a
crowd sourcing technique developed to create the seed data describing
human perception of opinion bearing language needed for our supervised
learning algorithm. Our user interface successfully limited the
meta-subjectivity inherent in the task (``What is an opinion?'') while
reliably retrieving relevant opinionated words using labour not expert
in the domain.
Finally, we developed a new data structure and modeling technique for
connecting targets with the correct within-sentence opinionated
language. Syntactic relatedness tries (SRTs) contain all paths from a
dependency graph of a sentence that connect a target expression to a
candidate opinionated word. We use factor graphs to model how far a
path through the SRT must be followed in order to connect the right
targets to the right words. It turns out that we can correctly label
significant portions of these tries with very rudimentary features
such as part-of-speech tags and dependency labels with minimal
processing. This technique uses the data from the crowdsourcing
technique we developed as training data.
We conclude by placing our work in the context of a larger sentiment
classification pipeline and by describing a model for learning from
the data structures produced by our work. This work contributes to
computational linguistics by proposing and verifying new data
gathering techniques and applying recent developments in machine
learning to inference over grammatical structures for highly
subjective purposes. It applies a suffix tree-based data structure to
model opinion in a specific domain by imposing a restriction on the
order in which the data is stored in the structure
Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis
This paper fills a gap in aspect-based sentiment analysis and aims to present
a new method for preparing and analysing texts concerning opinion and
generating user-friendly descriptive reports in natural language. We present a
comprehensive set of techniques derived from Rhetorical Structure Theory and
sentiment analysis to extract aspects from textual opinions and then build an
abstractive summary of a set of opinions. Moreover, we propose aspect-aspect
graphs to evaluate the importance of aspects and to filter out unimportant ones
from the summary. Additionally, the paper presents a prototype solution of data
flow with interesting and valuable results. The proposed method's results
proved the high accuracy of aspect detection when applied to the gold standard
dataset
The good, the bad and the implicit: a comprehensive approach to annotating explicit and implicit sentiment
We present a fine-grained scheme for the annotation of polar sentiment in text, that accounts for explicit sentiment (so-called private states), as well as implicit expressions of sentiment (polar facts). Polar expressions are annotated below sentence level and classified according to their subjectivity status. Additionally, they are linked to one or more targets with a specific polar orientation and intensity. Other components of the annotation scheme include source attribution and the identification and classification of expressions that modify polarity. In previous research, little attention has been given to implicit sentiment, which represents a substantial amount of the polar expressions encountered in our data. An English and Dutch corpus of financial newswire, consisting of over 45,000 words each, was annotated using our scheme. A subset of this corpus was used to conduct an inter-annotator agreement study, which demonstrated that the proposed scheme can be used to reliably annotate explicit and implicit sentiment in real-world textual data, making the created corpora a useful resource for sentiment analysis
- …