11,258 research outputs found
Neural Based Statement Classification for Biased Language
Biased language commonly occurs around topics which are of controversial
nature, thus, stirring disagreement between the different involved parties of a
discussion. This is due to the fact that for language and its use,
specifically, the understanding and use of phrases, the stances are cohesive
within the particular groups. However, such cohesiveness does not hold across
groups.
In collaborative environments or environments where impartial language is
desired (e.g. Wikipedia, news media), statements and the language therein
should represent equally the involved parties and be neutrally phrased. Biased
language is introduced through the presence of inflammatory words or phrases,
or statements that may be incorrect or one-sided, thus violating such
consensus.
In this work, we focus on the specific case of phrasing bias, which may be
introduced through specific inflammatory words or phrases in a statement. For
this purpose, we propose an approach that relies on a recurrent neural networks
in order to capture the inter-dependencies between words in a phrase that
introduced bias.
We perform a thorough experimental evaluation, where we show the advantages
of a neural based approach over competitors that rely on word lexicons and
other hand-crafted features in detecting biased language. We are able to
distinguish biased statements with a precision of P=0.92, thus significantly
outperforming baseline models with an improvement of over 30%. Finally, we
release the largest corpus of statements annotated for biased language.Comment: The Twelfth ACM International Conference on Web Search and Data
Mining, February 11--15, 2019, Melbourne, VIC, Australi
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Ethical Challenges in Data-Driven Dialogue Systems
The use of dialogue systems as a medium for human-machine interaction is an
increasingly prevalent paradigm. A growing number of dialogue systems use
conversation strategies that are learned from large datasets. There are well
documented instances where interactions with these system have resulted in
biased or even offensive conversations due to the data-driven training process.
Here, we highlight potential ethical issues that arise in dialogue systems
research, including: implicit biases in data-driven systems, the rise of
adversarial examples, potential sources of privacy violations, safety concerns,
special considerations for reinforcement learning systems, and reproducibility
concerns. We also suggest areas stemming from these issues that deserve further
investigation. Through this initial survey, we hope to spur research leading to
robust, safe, and ethically sound dialogue systems.Comment: In Submission to the AAAI/ACM conference on Artificial Intelligence,
Ethics, and Societ
- …