17,284 research outputs found
CASCADE: Contextual Sarcasm Detection in Online Discussion Forums
The literature in automated sarcasm detection has mainly focused on lexical,
syntactic and semantic-level analysis of text. However, a sarcastic sentence
can be expressed with contextual presumptions, background and commonsense
knowledge. In this paper, we propose CASCADE (a ContextuAl SarCasm DEtector)
that adopts a hybrid approach of both content and context-driven modeling for
sarcasm detection in online social media discussions. For the latter, CASCADE
aims at extracting contextual information from the discourse of a discussion
thread. Also, since the sarcastic nature and form of expression can vary from
person to person, CASCADE utilizes user embeddings that encode stylometric and
personality features of the users. When used along with content-based feature
extractors such as Convolutional Neural Networks (CNNs), we see a significant
boost in the classification performance on a large Reddit corpus.Comment: Accepted in COLING 201
Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research
Sentiment analysis as a field has come a long way since it was first
introduced as a task nearly 20 years ago. It has widespread commercial
applications in various domains like marketing, risk management, market
research, and politics, to name a few. Given its saturation in specific
subtasks -- such as sentiment polarity classification -- and datasets, there is
an underlying perception that this field has reached its maturity. In this
article, we discuss this perception by pointing out the shortcomings and
under-explored, yet key aspects of this field that are necessary to attain true
sentiment understanding. We analyze the significant leaps responsible for its
current relevance. Further, we attempt to chart a possible course for this
field that covers many overlooked and unanswered questions.Comment: Published in the IEEE Transactions on Affective Computing (TAFFC
Reasoning with Sarcasm by Reading In-between
Sarcasm is a sophisticated speech act which commonly manifests on social
communities such as Twitter and Reddit. The prevalence of sarcasm on the social
web is highly disruptive to opinion mining systems due to not only its tendency
of polarity flipping but also usage of figurative language. Sarcasm commonly
manifests with a contrastive theme either between positive-negative sentiments
or between literal-figurative scenarios. In this paper, we revisit the notion
of modeling contrast in order to reason with sarcasm. More specifically, we
propose an attention-based neural model that looks in-between instead of
across, enabling it to explicitly model contrast and incongruity. We conduct
extensive experiments on six benchmark datasets from Twitter, Reddit and the
Internet Argument Corpus. Our proposed model not only achieves state-of-the-art
performance on all datasets but also enjoys improved interpretability.Comment: Accepted to ACL201
Catering to Your Concerns: Automatic Generation of Personalised Security-Centric Descriptions for Android Apps
Android users are increasingly concerned with the privacy of their data and
security of their devices. To improve the security awareness of users, recent
automatic techniques produce security-centric descriptions by performing
program analysis. However, the generated text does not always address users'
concerns as they are generally too technical to be understood by ordinary
users. Moreover, different users have varied linguistic preferences, which do
not match the text. Motivated by this challenge, we develop an innovative
scheme to help users avoid malware and privacy-breaching apps by generating
security descriptions that explain the privacy and security related aspects of
an Android app in clear and understandable terms. We implement a prototype
system, PERSCRIPTION, to generate personalised security-centric descriptions
that automatically learn users' security concerns and linguistic preferences to
produce user-oriented descriptions. We evaluate our scheme through experiments
and user studies. The results clearly demonstrate the improvement on
readability and users' security awareness of PERSCRIPTION's descriptions
compared to existing description generators
Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances
Emotion is intrinsic to humans and consequently emotion understanding is a
key part of human-like artificial intelligence (AI). Emotion recognition in
conversation (ERC) is becoming increasingly popular as a new research frontier
in natural language processing (NLP) due to its ability to mine opinions from
the plethora of publicly available conversational data in platforms such as
Facebook, Youtube, Reddit, Twitter, and others. Moreover, it has potential
applications in health-care systems (as a tool for psychological analysis),
education (understanding student frustration) and more. Additionally, ERC is
also extremely important for generating emotion-aware dialogues that require an
understanding of the user's emotions. Catering to these needs calls for
effective and scalable conversational emotion-recognition algorithms. However,
it is a strenuous problem to solve because of several research challenges. In
this paper, we discuss these challenges and shed light on the recent research
in this field. We also describe the drawbacks of these approaches and discuss
the reasons why they fail to successfully overcome the research challenges in
ERC
A Convolutional Neural Network for Search Term Detection
Pathfinding in hospitals is challenging for patients, visitors, and even
employees. Many people have experienced getting lost due to lack of clear
guidance, large footprint of hospitals, and confusing array of hospital wings.
In this paper, we propose Halo; An indoor navigation application based on
voice-user interaction to help provide directions for users without assistance
of a localization system. The main challenge is accurate detection of origin
and destination search terms. A custom convolutional neural network (CNN) is
proposed to detect origin and destination search terms from transcription of a
submitted speech query. The CNN is trained based on a set of queries tailored
specifically for hospital and clinic environments. Performance of the proposed
model is studied and compared with Levenshtein distance-based word matching.Comment: This paper is accepted for presentation at 2017 IEEE 28th Annual
International Symposium on Personal, Indoor, and Mobile Radio Communication
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training
In this paper, we propose Emo2Vec which encodes emotional semantics into
vectors. We train Emo2Vec by multi-task learning six different emotion-related
tasks, including emotion/sentiment analysis, sarcasm classification, stress
detection, abusive language classification, insult detection, and personality
recognition. Our evaluation of Emo2Vec shows that it outperforms existing
affect-related representations, such as Sentiment-Specific Word Embedding and
DeepMoji embeddings with much smaller training corpora. When concatenated with
GloVe, Emo2Vec achieves competitive performances to state-of-the-art results on
several tasks using a simple logistic regression classifier.Comment: Accepted by 9th Workshop on Computational Approaches to Subjectivity,
Sentiment & Social Media Analysis(WASSA) in EMNLP 201
Social Media-based Substance Use Prediction
In this paper, we demonstrate how the state-of-the-art machine learning and
text mining techniques can be used to build effective social media-based
substance use detection systems. Since a substance use ground truth is
difficult to obtain on a large scale, to maximize system performance, we
explore different feature learning methods to take advantage of a large amount
of unsupervised social media data. We also demonstrate the benefit of using
multi-view unsupervised feature learning to combine heterogeneous user
information such as Facebook `"likes" and "status updates" to enhance system
performance. Based on our evaluation, our best models achieved 86% AUC for
predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which
significantly outperformed existing methods. Our investigation has also
uncovered interesting relations between a user's social media behavior (e.g.,
word usage) and substance use
Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks
Social media, as a major platform for communication and information exchange,
is a rich repository of the opinions and sentiments of 2.3 billion users about
a vast spectrum of topics. To sense the whys of certain social user's demands
and cultural-driven interests, however, the knowledge embedded in the 1.8
billion pictures which are uploaded daily in public profiles has just started
to be exploited since this process has been typically been text-based.
Following this trend on visual-based social analysis, we present a novel
methodology based on Deep Learning to build a combined image-and-text based
personality trait model, trained with images posted together with words found
highly correlated to specific personality traits. So the key contribution here
is to explore whether OCEAN personality trait modeling can be addressed based
on images, here called \emph{Mind{P}ics}, appearing with certain tags with
psychological insights. We found that there is a correlation between those
posted images and their accompanying texts, which can be successfully modeled
using deep neural networks for personality estimation. The experimental results
are consistent with previous cyber-psychology results based on texts or images.
In addition, classification results on some traits show that some patterns
emerge in the set of images corresponding to a specific text, in essence to
those representing an abstract concept. These results open new avenues of
research for further refining the proposed personality model under the
supervision of psychology experts
Self-adaptive Privacy Concern Detection for User-generated Content
To protect user privacy in data analysis, a state-of-the-art strategy is
differential privacy in which scientific noise is injected into the real
analysis output. The noise masks individual's sensitive information contained
in the dataset. However, determining the amount of noise is a key challenge,
since too much noise will destroy data utility while too little noise will
increase privacy risk. Though previous research works have designed some
mechanisms to protect data privacy in different scenarios, most of the existing
studies assume uniform privacy concerns for all individuals. Consequently,
putting an equal amount of noise to all individuals leads to insufficient
privacy protection for some users, while over-protecting others. To address
this issue, we propose a self-adaptive approach for privacy concern detection
based on user personality. Our experimental studies demonstrate the
effectiveness to address a suitable personalized privacy protection for
cold-start users (i.e., without their privacy-concern information in training
data)
- …