4 research outputs found
A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders
Powerful sentence encoders trained for multiple languages are on the rise.
These systems are capable of embedding a wide range of linguistic properties
into vector representations. While explicit probing tasks can be used to verify
the presence of specific linguistic properties, it is unclear whether the
vector representations can be manipulated to indirectly steer such properties.
We investigate the use of a geometric mapping in embedding space to transform
linguistic properties, without any tuning of the pre-trained sentence encoder
or decoder. We validate our approach on three linguistic properties using a
pre-trained multilingual autoencoder and analyze the results in both
monolingual and cross-lingual settings
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
Sentence encoders map sentences to real valued vectors for use in downstream
applications. To peek into these representations - e.g., to increase
interpretability of their results - probing tasks have been designed which
query them for linguistic knowledge. However, designing probing tasks for
lesser-resourced languages is tricky, because these often lack large-scale
annotated data or (high-quality) dependency parsers as a prerequisite of
probing task design in English. To investigate how to probe sentence embeddings
in such cases, we investigate sensitivity of probing task results to structural
design choices, conducting the first such large scale study. We show that
design choices like size of the annotated probing dataset and type of
classifier used for evaluation do (sometimes substantially) influence probing
outcomes. We then probe embeddings in a multilingual setup with design choices
that lie in a 'stable region', as we identify for English, and find that
results on English do not transfer to other languages. Fairer and more
comprehensive sentence-level probing evaluation should thus be carried out on
multiple languages in the future