9 research outputs found
SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering
Sentiment analysis has various application scenarios in software engineering
(SE), such as detecting developers' emotions in commit messages and identifying
their opinions on Q&A forums. However, commonly used out-of-the-box sentiment
analysis tools cannot obtain reliable results on SE tasks and the
misunderstanding of technical jargon is demonstrated to be the main reason.
Then, researchers have to utilize labeled SE-related texts to customize
sentiment analysis for SE tasks via a variety of algorithms. However, the
scarce labeled data can cover only very limited expressions and thus cannot
guarantee the analysis quality. To address such a problem, we turn to the
easily available emoji usage data for help. More specifically, we employ
emotional emojis as noisy labels of sentiments and propose a representation
learning approach that uses both Tweets and GitHub posts containing emojis to
learn sentiment-aware representations for SE-related texts. These emoji-labeled
posts can not only supply the technical jargon, but also incorporate more
general sentiment patterns shared across domains. They as well as labeled data
are used to learn the final sentiment classifier. Compared to the existing
sentiment analysis methods used in SE, the proposed approach can achieve
significant improvement on representative benchmark datasets. By further
contrast experiments, we find that the Tweets make a key contribution to the
power of our approach. This finding informs future research not to unilaterally
pursue the domain-specific resource, but try to transform knowledge from the
open domain through ubiquitous signals such as emojis.Comment: Accepted by the 2019 ACM Joint European Software Engineering
Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE
2019). Please include ESEC/FSE in any citation
Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources
Sentiment analysis of user-generated reviews or comments on products and
services in social networks can help enterprises to analyze the feedback from
customers and take corresponding actions for improvement. To mitigate
large-scale annotations on the target domain, domain adaptation (DA) provides
an alternate solution by learning a transferable model from other labeled
source domains. Existing multi-source domain adaptation (MDA) methods either
fail to extract some discriminative features in the target domain that are
related to sentiment, neglect the correlations of different sources and the
distribution difference among different sub-domains even in the same source, or
cannot reflect the varying optimal weighting during different training stages.
In this paper, we propose a novel instance-level MDA framework, named
curriculum cycle-consistent generative adversarial network (C-CycleGAN), to
address the above issues. Specifically, C-CycleGAN consists of three
components: (1) pre-trained text encoder which encodes textual input from
different domains into a continuous representation space, (2) intermediate
domain generator with curriculum instance-level adaptation which bridges the
gap across source and target domains, and (3) task classifier trained on the
intermediate domain for final sentiment classification. C-CycleGAN transfers
source samples at instance-level to an intermediate domain that is closer to
the target domain with sentiment semantics preserved and without losing
discriminative features. Further, our dynamic instance-level weighting
mechanisms can assign the optimal weights to different source samples in each
training stage. We conduct extensive experiments on three benchmark datasets
and achieve substantial gains over state-of-the-art DA approaches. Our source
code is released at: https://github.com/WArushrush/Curriculum-CycleGAN.Comment: Accepted by WWW 202
Inferring attributes with picture metadata embeddings
International audienceUsers in online social networks are vulnerable to attribute inference attacks due to some published data. Thus, the picture owner's gender has a strong influence on individuals' emotional reactions to the photo. In this work, we present a graph-embedding approach for gender inference attacks based on pictures meta-data such as (i) alt-texts generated by Facebook to describe the content of images, and (ii) Emojis/Emoticons posted by friends, friends of friends or regular users as a reaction to the picture. Specifically, we apply a semi-supervised technique, node2vec, for learning a mapping of pictures meta-data to a low-dimensional vector space. Next, we study in this vector space the gender closeness of users who published similar photos and/or received similar reactions. We leverage this image sharing and reaction mode of Facebook users to derive an efficient and accurate technique for user gender inference. Experimental results show that privacy attack often succeeds even when other information than pictures published by their owners is either hidden or unavailable
Sentiment Analysis of Textual Content in Social Networks. From Hand-Crafted to Deep Learning-Based Models
Aquesta tesi proposa diversos mètodes avançats per analitzar automàticament el contingut textual compartit a les xarxes socials i identificar les opinions, emocions i sentiments a diferents nivells d’anàlisi i en diferents idiomes.
Comencem proposant un sistema d’anàlisi de sentiments, anomenat SentiRich, basat en un conjunt ric d’atributs, inclosa la informació extreta de lèxics de sentiments i models de word embedding pre-entrenats. A continuació, proposem un sistema basat en Xarxes Neurals Convolucionals i regressors XGboost per resoldre una sèrie de tasques d’anàlisi de sentiments i emocions a Twitter. Aquestes tasques van des de les tasques típiques d’anàlisi de sentiments fins a determinar automàticament la intensitat d’una emoció (com ara alegria, por, ira, etc.) i la intensitat del sentiment dels autors a partir dels seus tweets. També proposem un nou sistema basat en Deep Learning per solucionar el problema de classificació de les emocions múltiples a Twitter. A més, es va considerar el problema de l’anàlisi del sentiment depenent de l’objectiu. Per a aquest propòsit, proposem un sistema basat en Deep Learning que identifica i extreu l'objectiu dels tweets. Tot i que alguns idiomes, com l’anglès, disposen d’una àmplia gamma de recursos per permetre l’anàlisi del sentiment, a la majoria de llenguatges els hi manca. Per tant, utilitzem la tècnica d'anàlisi de sentiments entre idiomes per desenvolupar un sistema nou, multilingüe i basat en Deep Learning per a llenguatges amb pocs recursos lingüístics. Proposem combinar l’ajuda a la presa de decisions multi-criteri i anàlisis de sentiments per desenvolupar un sistema que permeti als usuaris la possibilitat d’explotar tant les opinions com les seves preferències en el procés de classificació d’alternatives. Finalment, vam aplicar els sistemes desenvolupats al camp de la comunicació de les marques de destinació a través de les xarxes socials. Amb aquesta finalitat, hem recollit tweets de persones locals, visitants i els gabinets oficials de Turisme de diferents destinacions turístiques i es van analitzar les opinions i les emocions compartides en ells. En general, els mètodes proposats en aquesta tesi milloren el rendiment dels enfocaments d’última generació i mostren troballes apassionants.Esta tesis propone varios métodos avanzados para analizar automáticamente el contenido textual compartido en las redes sociales e identificar opiniones, emociones y sentimientos, en diferentes niveles de análisis y en diferentes idiomas. Comenzamos proponiendo un sistema de análisis de sentimientos, llamado SentiRich, que está basado en un conjunto rico de características, que incluyen la información extraída de léxicos de sentimientos y modelos de word embedding previamente entrenados. Luego, proponemos un sistema basado en redes neuronales convolucionales y regresores XGboost para resolver una variedad de tareas de análisis de sentimientos y emociones en Twitter. Estas tareas van desde las típicas tareas de análisis de sentimientos hasta la determinación automática de la intensidad de una emoción (como alegría, miedo, ira, etc.) y la intensidad del sentimiento de los autores de los tweets. También proponemos un novedoso sistema basado en Deep Learning para abordar el problema de clasificación de emociones múltiples en Twitter. Además, consideramos el problema del análisis de sentimientos dependiente del objetivo. Para este propósito, proponemos un sistema basado en Deep Learning que identifica y extrae el objetivo de los tweets.
Si bien algunos idiomas, como el inglés, tienen una amplia gama de recursos para permitir el análisis de sentimientos, la mayoría de los idiomas carecen de ellos. Por lo tanto, utilizamos la técnica de Análisis de Sentimiento Inter-lingual para desarrollar un sistema novedoso, multilingüe y basado en Deep Learning para los lenguajes con pocos recursos lingüísticos. Proponemos combinar la Ayuda a la Toma de Decisiones Multi-criterio y el análisis de sentimientos para desarrollar un sistema que brinde a los usuarios la capacidad de explotar las opiniones junto con sus preferencias en el proceso de clasificación de alternativas. Finalmente, aplicamos los sistemas desarrollados al campo de la comunicación de las marcas de destino a través de las redes sociales. Con este fin, recopilamos tweets de personas locales, visitantes, y gabinetes oficiales de Turismo de diferentes destinos turísticos y analizamos las opiniones y las emociones compartidas en ellos. En general, los métodos propuestos en esta tesis mejoran el rendimiento de los enfoques de vanguardia y muestran hallazgos interesa.This thesis proposes several advanced methods to automatically analyse textual content shared on social networks and identify people’ opinions, emotions and feelings at a different level of analysis and in different languages.
We start by proposing a sentiment analysis system, called SentiRich, based on a set of rich features, including the information extracted from sentiment lexicons and pre-trained word embedding models. Then, we propose an ensemble system based on Convolutional Neural Networks and XGboost regressors to solve an array of sentiment and emotion analysis tasks on Twitter. These tasks range from the typical sentiment analysis tasks, to automatically determining the intensity of an emotion (such as joy, fear, anger, etc.) and the intensity of sentiment (aka valence) of the authors from their tweets. We also propose a novel Deep Learning-based system to address the multiple emotion classification problem on Twitter. Moreover, we considered the problem of target-dependent sentiment analysis. For this purpose, we propose a Deep Learning-based system that identifies and extracts the target of the tweets.
While some languages, such as English, have a vast array of resources to enable sentiment analysis, most low-resource languages lack them. So, we utilise the Cross-lingual Sentiment Analysis technique to develop a novel, multi-lingual and Deep Learning-based system for low resource languages. We propose to combine Multi-Criteria Decision Aid and sentiment analysis to develop a system that gives users the ability to exploit reviews alongside their preferences in the process of alternatives ranking. Finally, we applied the developed systems to the field of communication of destination brands through social networks. To this end, we collected tweets of local people, visitors, and official brand destination offices from different tourist destinations and analysed the opinions and the emotions shared in these tweets
An Empirical Study of Offensive Language in Online Interactions
In the past decade, usage of social media platforms has increased significantly. People use these platforms to connect with friends and family, share information, news and opinions. Platforms such as Facebook, Twitter are often used to propagate offensive and hateful content online. The open nature and anonymity of the internet fuels aggressive and inflamed conversations. The companies and federal institutions are striving to make social media cleaner, welcoming and unbiased. In this study, we first explore the underlying topics in popular offensive language datasets using statistical and neural topic modeling. The current state-of-the-art models for aggression detection only present a toxicity score based on the entire post. Content moderators often have to deal with lengthy texts without any word-level indicators. We propose a neural transformer approach for detecting the tokens that make a particular post aggressive. The pre-trained BERT model has achieved state-of-the-art results in various natural language processing tasks. However, the model is trained on general-purpose corpora and lacks aggressive social media linguistic features. We propose fBERT, a retrained BERT model with over million offensive tweets from the SOLID dataset. We demonstrate the effectiveness and portability of fBERT over BERT in various shared offensive language detection tasks. We further propose a new multi-task aggression detection (MAD) framework for post and token-level aggression detection using neural transformers. The experiments confirm the effectiveness of the multi-task learning model over individual models; particularly when the number of training data is limited
Natural Language Processing: Emerging Neural Approaches and Applications
This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains