4 research outputs found
Style Obfuscation by Invariance
The task of obfuscating writing style using sequence models has previously
been investigated under the framework of obfuscation-by-transfer, where the
input text is explicitly rewritten in another style. These approaches also
often lead to major alterations to the semantic content of the input. In this
work, we propose obfuscation-by-invariance, and investigate to what extent
models trained to be explicitly style-invariant preserve semantics. We evaluate
our architectures on parallel and non-parallel corpora, and compare automatic
and human evaluations on the obfuscated sentences. Our experiments show that
style classifier performance can be reduced to chance level, whilst the
automatic evaluation of the output is seemingly equal to models applying
style-transfer. However, based on human evaluation we demonstrate a trade-off
between the level of obfuscation and the observed quality of the output in
terms of meaning preservation and grammaticality.Comment: Accepted for presentation at COLING1
Source-driven Representations for Hate Speech Detection
Sources, in the form of selected Facebook pages, can be used as indicators of hate-rich content. Polarized distributed representations created over such content prove superior to generic embeddings in the task of hate speech detection. The same content seems to carry a too weak signal to proxy silver labels in a distant supervised setting. However, this signal is stronger than gold labels which come from a different distribution, leading to re-think the process of annotation in the context of highly subjective judgments.La provenienza di ciò che viene condiviso su Facebook costituisce un primo elemento indentificativo di contentuti carichi di odio. La rappresentazione distribuita polarizzata che costruiamo su tali contenuti si dimostra migliore nell’individuazione di argomenti di odio rispetto ad alternative più generiche. Il potere predittivo di tali embedding polarizzati risulta anche più incisivo rispetto a quello di dati gold standard che sono caratterizzati da una distribuzione ed una annotatione diverse
Source-driven Representations for Hate Speech Detection
Sources, in the form of selected Facebook pages, can be used as indicators of hate-rich content. Polarized distributed representations created over such content prove superior to generic embeddings in the task of hate speech detection. The same content seems to carry a too weak signal to proxy silver labels in a distant supervised setting. However, this signal is stronger than gold labels which come from a different distribution, leading to re-think the process of annotation in the context of highly subjective judgments.La provenienza di ciò che viene condiviso su Facebook costituisce un primo elemento indentificativo di contentuti carichi di odio. La rappresentazione distribuita polarizzata che costruiamo su tali contenuti si dimostra migliore nell’individuazione di argomenti di odio rispetto ad alternative più generiche. Il potere predittivo di tali embedding polarizzati risulta anche più incisivo rispetto a quello di dati gold standard che sono caratterizzati da una distribuzione ed una annotatione diverse
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges