6,849 research outputs found
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach
Portrayals of history are never complete, and each description inherently
exhibits a specific viewpoint and emphasis. In this paper, we aim to
automatically identify such differences by computing timelines and detecting
temporal focal points of written history across languages on Wikipedia. In
particular, we study articles related to the history of all UN member states
and compare them in 30 language editions. We develop a computational approach
that allows to identify focal points quantitatively, and find that Wikipedia
narratives about national histories (i) are skewed towards more recent events
(recency bias) and (ii) are distributed unevenly across the continents with
significant focus on the history of European countries (Eurocentric bias). We
also establish that national historical timelines vary across language
editions, although average interlingual consensus is rather high. We hope that
this paper provides a starting point for a broader computational analysis of
written history on Wikipedia and elsewhere
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
Every culture and language is unique. Our work expressly focuses on the
uniqueness of culture and language in relation to human affect, specifically
sentiment and emotion semantics, and how they manifest in social multimedia. We
develop sets of sentiment- and emotion-polarized visual concepts by adapting
semantic structures called adjective-noun pairs, originally introduced by Borth
et al. (2013), but in a multilingual context. We propose a new
language-dependent method for automatic discovery of these adjective-noun
constructs. We show how this pipeline can be applied on a social multimedia
platform for the creation of a large-scale multilingual visual sentiment
concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our
unified ontology is organized hierarchically by multilingual clusters of
visually detectable nouns and subclusters of emotionally biased versions of
these nouns. In addition, we present an image-based prediction task to show how
generalizable language-specific models are in a multilingual context. A new,
publicly available dataset of >15.6K sentiment-biased visual concepts across 12
languages with language-specific detector banks, >7.36M images and their
metadata is also released.Comment: 11 pages, to appear at ACM MM'1
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
The growing prevalence and rapid evolution of offensive language in social
media amplify the complexities of detection, particularly highlighting the
challenges in identifying such content across diverse languages. This survey
presents a systematic and comprehensive exploration of Cross-Lingual Transfer
Learning (CLTL) techniques in offensive language detection in social media. Our
study stands as the first holistic overview to focus exclusively on the
cross-lingual scenario in this domain. We analyse 67 relevant papers and
categorise these studies across various dimensions, including the
characteristics of multilingual datasets used, the cross-lingual resources
employed, and the specific CLTL strategies implemented. According to "what to
transfer", we also summarise three main CLTL transfer approaches: instance,
feature, and parameter transfer. Additionally, we shed light on the current
challenges and future research opportunities in this field. Furthermore, we
have made our survey resources available online, including two comprehensive
tables that provide accessible references to the multilingual datasets and CLTL
methods used in the reviewed literature.Comment: 35 pages, 7 figure
Cross-Cultural Transfer Learning for Chinese Offensive Language Detection
Detecting offensive language is a challenging task. Generalizing across
different cultures and languages becomes even more challenging: besides
lexical, syntactic and semantic differences, pragmatic aspects such as cultural
norms and sensitivities, which are particularly relevant in this context, vary
greatly. In this paper, we target Chinese offensive language detection and aim
to investigate the impact of transfer learning using offensive language
detection data from different cultural backgrounds, specifically Korean and
English. We find that culture-specific biases in what is considered offensive
negatively impact the transferability of language models (LMs) and that LMs
trained on diverse cultural data are sensitive to different features in Chinese
offensive language detection. In a few-shot learning scenario, however, our
study shows promising prospects for non-English offensive language detection
with limited resources. Our findings highlight the importance of cross-cultural
transfer learning in improving offensive language detection and promoting
inclusive digital spaces.Comment: C3NLP@EAC
General Purpose Textual Sentiment Analysis and Emotion Detection Tools
Textual sentiment analysis and emotion detection consists in retrieving the
sentiment or emotion carried by a text or document. This task can be useful in
many domains: opinion mining, prediction, feedbacks, etc. However, building a
general purpose tool for doing sentiment analysis and emotion detection raises
a number of issues, theoretical issues like the dependence to the domain or to
the language but also pratical issues like the emotion representation for
interoperability. In this paper we present our sentiment/emotion analysis
tools, the way we propose to circumvent the di culties and the applications
they are used for.Comment: Workshop on Emotion and Computing (2013
- …