102 research outputs found
Gender bias and natural language processing
Demographic biases are widely affecting artificial intelligence. In particular, gender bias
is clearly spread in natural language processing applications, e.g. from stereotyped
translations to poorer speech recognition for women than for men. In this talk, I am
going to overview the research and challenges that are currently emerging towards
fairer natural language processing in terms of gender
Enhancing Word Embeddings with Knowledge Extracted from Lexical Resources
In this work, we present an effective method for semantic specialization of
word vector representations. To this end, we use traditional word embeddings
and apply specialization methods to better capture semantic relations between
words. In our approach, we leverage external knowledge from rich lexical
resources such as BabelNet. We also show that our proposed post-specialization
method based on an adversarial neural network with the Wasserstein distance
allows to gain improvements over state-of-the-art methods on two tasks: word
similarity and dialog state tracking.Comment: Accepted to ACL 2020 SR
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings
Gender bias is highly impacting natural language processing applications.
Word embeddings have clearly been proven both to keep and amplify gender biases
that are present in current data sources. Recently, contextualized word
embeddings have enhanced previous word embedding techniques by computing word
vector representations dependent on the sentence they appear in.
In this paper, we study the impact of this conceptual change in the word
embedding computation in relation with gender bias. Our analysis includes
different measures previously applied in the literature to standard word
embeddings. Our findings suggest that contextualized word embeddings are less
biased than standard ones even when the latter are debiased
Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts
Recently, machine translation systems based on neural networks have reached state-of-the-art results for some pairs of languages (e.g., German–English). In this paper, we are investigating the performance of neural machine translation in Chinese–Spanish, which is a challenging language pair. Given that the meaning of a Chinese word can be related to its graphical representation, this work aims to enhance neural machine translation by using as input a combination of: words or characters and their corresponding bitmap fonts. The fact of performing the interpretation of every word or character as a bitmap font generates more informed vectorial representations. Best results are obtained when using words plus their bitmap fonts obtaining an improvement (over a competitive neural MT baseline system) of almost six BLEU, five METEOR points and ranked coherently better in the human evaluation.Peer ReviewedPostprint (published version
Character-level Intra Attention Network for Natural Language Inference
Natural language inference (NLI) is a central problem in language
understanding. End-to-end artificial neural networks have reached
state-of-the-art performance in NLI field recently.
In this paper, we propose Character-level Intra Attention Network (CIAN) for
the NLI task. In our model, we use the character-level convolutional network to
replace the standard word embedding layer, and we use the intra attention to
capture the intra-sentence semantics. The proposed CIAN model provides improved
results based on a newly published MNLI corpus.Comment: EMNLP Workshop RepEval 2017: The Second Workshop on Evaluating Vector
Space Representations for NL
Refinement of Unsupervised Cross-Lingual Word Embeddings
Cross-lingual word embeddings aim to bridge the gap between high-resource and
low-resource languages by allowing to learn multilingual word representations
even without using any direct bilingual signal. The lion's share of the methods
are projection-based approaches that map pre-trained embeddings into a shared
latent space. These methods are mostly based on the orthogonal transformation,
which assumes language vector spaces to be isomorphic. However, this criterion
does not necessarily hold, especially for morphologically-rich languages. In
this paper, we propose a self-supervised method to refine the alignment of
unsupervised bilingual word embeddings. The proposed model moves vectors of
words and their corresponding translations closer to each other as well as
enforces length- and center-invariance, thus allowing to better align
cross-lingual embeddings. The experimental results demonstrate the
effectiveness of our approach, as in most cases it outperforms state-of-the-art
methods in a bilingual lexicon induction task.Comment: Accepted at the 24th European Conference on Artificial Intelligence
(ECAI 2020
- …