10 research outputs found
Don't Let Me Be Misunderstood: Comparing Intentions and Perceptions in Online Discussions
Discourse involves two perspectives: a person's intention in making an
utterance and others' perception of that utterance. The misalignment between
these perspectives can lead to undesirable outcomes, such as misunderstandings,
low productivity and even overt strife. In this work, we present a
computational framework for exploring and comparing both perspectives in online
public discussions.
We combine logged data about public comments on Facebook with a survey of
over 16,000 people about their intentions in writing these comments or about
their perceptions of comments that others had written. Unlike previous studies
of online discussions that have largely relied on third-party labels to
quantify properties such as sentiment and subjectivity, our approach also
directly captures what the speakers actually intended when writing their
comments. In particular, our analysis focuses on judgments of whether a comment
is stating a fact or an opinion, since these concepts were shown to be often
confused.
We show that intentions and perceptions diverge in consequential ways. People
are more likely to perceive opinions than to intend them, and linguistic cues
that signal how an utterance is intended can differ from those that signal how
it will be perceived. Further, this misalignment between intentions and
perceptions can be linked to the future health of a conversation: when a
comment whose author intended to share a fact is misperceived as sharing an
opinion, the subsequent conversation is more likely to derail into uncivil
behavior than when the comment is perceived as intended. Altogether, these
findings may inform the design of discussion platforms that better promote
positive interactions.Comment: Proceedings of The Web Conference (WWW) 202
Exploiting Structure For Sentiment Classification
This thesis studies the problem of sentiment classification at both the document and sentence level using statistical learning methods. In particular, we develop computational models that capture useful structure-based intuitions for solving each task, treating the intuitions as latent representations to be discovered and exploited during learning. For document-level sentiment classification, we exploit structure in the form of informative sentences - those sentences that exhibit the same sentiment as the document, thus explain or support the document's sentiment label. We first show that incorporating automatically discovered informative sentences in the form of additional constraints for the learner improves performance on the document-level sentiment classification task. Next, we explore joint structured models for this task: our final proposed model does not need sentence-level sentiment labels, and directly optimizes document classification accuracy using inferred sentence-level information. Our empirical evaluation on two publicly available datasets shows improved performance over strong baselines. For phrase-level sentiment classification, we investigate the compositional linguistic structure of phrases. We investigate compositional matrix-space models, learning matrix-space word representations and modeling composition as matrix multiplication. Using a publicly available dataset, we show that the matrix-space model outperforms the standard bag-of-words model for the phrase-level sentiment classification task
Multi-level Structured Models for Document-level Sentiment Classification
In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjective) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.
An empirical evaluation of supervised learning in high dimensions
In this paper we perform an empirical evaluation of supervised learning on highdimensional data. We evaluate performance on three metrics: accuracy, AUC, and squared loss and study the effect of increasing dimensionality on the performance of the learning algorithms. Our findings are consistent with previous studies for problems of relatively low dimension, but suggest that as dimensionality increases the relative performance of the learning algorithms changes. To our surprise, the method that performs consistently well across all dimensions is random forests, followed by neural nets, boosted trees, and SVMs. 1
Computational approaches to sentence completion
This paper studies the problem of sentencelevel semantic coherence by answering SATstyle sentence completion questions. These questions test the ability of algorithms to distinguish sense from nonsense based on a variety of sentence-level phenomena. We tackle the problem with two approaches: methods that use local lexical information, such as the n-grams of a classical language model; and methods that evaluate global coherence, such as latent semantic analysis. We evaluate these methods on a suite of practice SAT questions, and on a recently released sentence completion task based on data taken from five Conan Doyle novels. We find that by fusing local and global information, we can exceed 50% on this task (chance baseline is 20%), and we suggest some avenues for further research.