20 research outputs found
Refinement of Unsupervised Cross-Lingual Word Embeddings
Cross-lingual word embeddings aim to bridge the gap between high-resource and
low-resource languages by allowing to learn multilingual word representations
even without using any direct bilingual signal. The lion's share of the methods
are projection-based approaches that map pre-trained embeddings into a shared
latent space. These methods are mostly based on the orthogonal transformation,
which assumes language vector spaces to be isomorphic. However, this criterion
does not necessarily hold, especially for morphologically-rich languages. In
this paper, we propose a self-supervised method to refine the alignment of
unsupervised bilingual word embeddings. The proposed model moves vectors of
words and their corresponding translations closer to each other as well as
enforces length- and center-invariance, thus allowing to better align
cross-lingual embeddings. The experimental results demonstrate the
effectiveness of our approach, as in most cases it outperforms state-of-the-art
methods in a bilingual lexicon induction task.Comment: Accepted at the 24th European Conference on Artificial Intelligence
(ECAI 2020
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
Semantic specialization is the process of fine-tuning pre-trained
distributional word vectors using external lexical knowledge (e.g., WordNet) to
accentuate a particular semantic relation in the specialized vector space.
While post-processing specialization methods are applicable to arbitrary
distributional vectors, they are limited to updating only the vectors of words
occurring in external lexicons (i.e., seen words), leaving the vectors of all
other words unchanged. We propose a novel approach to specializing the full
distributional vocabulary. Our adversarial post-specialization method
propagates the external lexical knowledge to the full distributional space. We
exploit words seen in the resources as training examples for learning a global
specialization function. This function is learned by combining a standard
L2-distance loss with an adversarial loss: the adversarial component produces
more realistic output vectors. We show the effectiveness and robustness of the
proposed method across three languages and on three tasks: word similarity,
dialog state tracking, and lexical simplification. We report consistent
improvements over distributional word vectors and vectors specialized by other
state-of-the-art specialization frameworks. Finally, we also propose a
cross-lingual transfer method for zero-shot specialization which successfully
specializes a full target distributional space without any lexical knowledge in
the target language and without any bilingual data.Comment: Accepted at EMNLP 201
Learning Word Subsumption Projections for the Russian Language
The semantic relations of hypernymy and hyponymy are widely used in various natural language processing tasks for modelling the subsumptions in common sense reasoning. Since the popularisation of the distributional semantics, a significant attention is paid to applying word embeddings for inducing the relations between words. In this paper, we show our preliminary results on adopting the projection learning technique for computing hypernyms from hyponyms using word embeddings. We also conduct a series of experiments on the Russian language and release the open source software for learning hyponym-hypernym projections using both CPUs and GPUs, implemented with the TensorFlow machine learning framework
Research on Multilingual News Clustering Based on Cross-Language Word Embeddings
Classifying the same event reported by different countries is of significant
importance for public opinion control and intelligence gathering. Due to the
diverse types of news, relying solely on transla-tors would be costly and
inefficient, while depending solely on translation systems would incur
considerable performance overheads in invoking translation interfaces and
storing translated texts. To address this issue, we mainly focus on the
clustering problem of cross-lingual news. To be specific, we use a combination
of sentence vector representations of news headlines in a mixed semantic space
and the topic probability distributions of news content to represent a news
article. In the training of cross-lingual models, we employ knowledge
distillation techniques to fit two semantic spaces into a mixed semantic space.
We abandon traditional static clustering methods like K-Means and AGNES in
favor of the incremental clustering algorithm Single-Pass, which we further
modify to better suit cross-lingual news clustering scenarios. Our main
contributions are as follows: (1) We adopt the English standard BERT as the
teacher model and XLM-Roberta as the student model, training a cross-lingual
model through knowledge distillation that can represent sentence-level
bilingual texts in both Chinese and English. (2) We use the LDA topic model to
represent news as a combina-tion of cross-lingual vectors for headlines and
topic probability distributions for con-tent, introducing concepts such as
topic similarity to address the cross-lingual issue in news content
representation. (3) We adapt the Single-Pass clustering algorithm for the news
context to make it more applicable. Our optimizations of Single-Pass include
ad-justing the distance algorithm between samples and clusters, adding cluster
merging operations, and incorporating a news time parameter
A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity
Cross-lingual word embeddings encode the meaning of words from different
languages into a shared low-dimensional space. An important requirement for
many downstream tasks is that word similarity should be independent of language
- i.e., word vectors within one language should not be more similar to each
other than to words in another language. We measure this characteristic using
modularity, a network measurement that measures the strength of clusters in a
graph. Modularity has a moderate to strong correlation with three downstream
tasks, even though modularity is based only on the structure of embeddings and
does not require any external resources. We show through experiments that
modularity can serve as an intrinsic validation metric to improve unsupervised
cross-lingual word embeddings, particularly on distant language pairs in
low-resource settings.Comment: Accepted to ACL 2019, camera-read