3,208 research outputs found
MoralStrength: Exploiting a Moral Lexicon and Embedding Similarity for Moral Foundations Prediction
Moral rhetoric plays a fundamental role in how we perceive and interpret the
information we receive, greatly influencing our decision-making process.
Especially when it comes to controversial social and political issues, our
opinions and attitudes are hardly ever based on evidence alone. The Moral
Foundations Dictionary (MFD) was developed to operationalize moral values in
the text. In this study, we present MoralStrength, a lexicon of approximately
1,000 lemmas, obtained as an extension of the Moral Foundations Dictionary,
based on WordNet synsets. Moreover, for each lemma it provides with a
crowdsourced numeric assessment of Moral Valence, indicating the strength with
which a lemma is expressing the specific value. We evaluated the predictive
potentials of this moral lexicon, defining three utilization approaches of
increased complexity, ranging from lemmas' statistical properties to a deep
learning approach of word embeddings based on semantic similarity. Logistic
regression models trained on the features extracted from MoralStrength,
significantly outperformed the current state-of-the-art, reaching an F1-score
of 87.6% over the previous 62.4% (p-value<0.01), and an average F1-Score of
86.25% over six different datasets. Such findings pave the way for further
research, allowing for an in-depth understanding of moral narratives in text
for a wide range of social issues
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
When to Say What and How: Adapting the Elaborateness and Indirectness of Spoken Dialogue Systems
With the aim of designing a spoken dialogue system which has the ability to adapt to the user's communication idiosyncrasies, we investigate whether it is possible to carry over insights from the usage of communication styles in human-human interaction to human-computer interaction. In an extensive literature review, it is demonstrated that communication styles play an important role in human communication. Using a multi-lingual data set, we show that there is a significant correlation between the communication style of the system and the preceding communication style of the user. This is why two components that extend the standard architecture of spoken dialogue systems are presented: 1) a communication style classifier that automatically identifies the user communication style and 2) a communication style selection module that selects an appropriate system communication style. We consider the communication styles elaborateness and indirectness as it has been shown that they influence the user's satisfaction and the user's perception of a dialogue. We present a neural classification approach based on supervised learning for each task. Neural networks are trained and evaluated with features that can be automatically derived during an ongoing interaction in every spoken dialogue system. It is shown that both components yield solid results and outperform the baseline in form of a majority-class classifier
Detection of Sarcasm and Nastiness: New Resources for Spanish Language
The main goal of this work is to provide the cognitive computing community with valuable resources to analyze and simulate the intentionality and/or emotions embedded in the language employed in social media. Specifically, it is focused on the Spanish language and online dialogues, leading to the creation of SOFOCO (Spanish Online Forums Corpus). It is the first Spanish corpus consisting of dialogic debates extracted from social media and it is annotated by means of crowdsourcing in order to carry out automatic analysis of subjective language forms, like sarcasm or nastiness. Furthermore, the annotators were also asked about the context need when taking a decision. In this way, the usersâ intentions and their behavior inside social networks can be better understood and more accurate text analysis is possible. An analysis of the annotation results is carried out and the reliability of the annotations is also explored. Additionally, sarcasm and nastiness detection results (around 0.76 F-Measure in both cases) are also reported. The obtained results show the presented corpus as a valuable resource that might be used in very diverse future work.This study was partially funded by the Spanish Government (TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R) by the European Unionsâs H2020 program under grant 769872 and by the National Science Foundation of USA (NSF CISE R1 #1202668
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
A system for automatic English text expansion
We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, âautomaticâ means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptability is one of its greatest advantages. For English, we have created the highly precise aLexiE lexicon with wide coverage, which represents a contribution on its own. We have evaluated the resulting NLG library in an Augmentative and Alternative Communication (AAC) proof of concept, both directly (by regenerating corpus sentences) and manually (from annotations) using a popular corpus in the NLG field. We performed a second analysis by comparing the quality of text expansion in English to Spanish, using an ad-hoc Spanish-English parallel corpus. The system might also be applied to other domains such as report and news generation.Ministerio de EconomĂa, Industria y Competitividad | Ref. TEC2016-76465-C2-2-RXunta de Galicia | Ref. GRC-2018/53Xunta de Galicia | Ref. ED341D R2016/012University of Aberdee
A System for Automatic English Text Expansion
This work was supported in part by the Mineco, Spain, under Grant TEC2016-76465-C2-2-R, in part by the Xunta de Galicia, Spain, under Grant GRC-2018/53 and Grant ED341D R2016/012, and in part by the University of Vigo Travel Grant to visit the CLAN Research Group, University of Aberdeen, U.K.Peer reviewedPublisher PD
- âŠ