41 research outputs found
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
Semantic specialization is the process of fine-tuning pre-trained
distributional word vectors using external lexical knowledge (e.g., WordNet) to
accentuate a particular semantic relation in the specialized vector space.
While post-processing specialization methods are applicable to arbitrary
distributional vectors, they are limited to updating only the vectors of words
occurring in external lexicons (i.e., seen words), leaving the vectors of all
other words unchanged. We propose a novel approach to specializing the full
distributional vocabulary. Our adversarial post-specialization method
propagates the external lexical knowledge to the full distributional space. We
exploit words seen in the resources as training examples for learning a global
specialization function. This function is learned by combining a standard
L2-distance loss with an adversarial loss: the adversarial component produces
more realistic output vectors. We show the effectiveness and robustness of the
proposed method across three languages and on three tasks: word similarity,
dialog state tracking, and lexical simplification. We report consistent
improvements over distributional word vectors and vectors specialized by other
state-of-the-art specialization frameworks. Finally, we also propose a
cross-lingual transfer method for zero-shot specialization which successfully
specializes a full target distributional space without any lexical knowledge in
the target language and without any bilingual data.Comment: Accepted at EMNLP 201
Modular Deep Learning
Transfer learning has recently become the dominant paradigm of machine
learning. Pre-trained models fine-tuned for downstream tasks achieve better
performance with fewer labelled examples. Nonetheless, it remains unclear how
to develop models that specialise towards multiple tasks without incurring
negative interference and that generalise systematically to non-identically
distributed tasks. Modular deep learning has emerged as a promising solution to
these challenges. In this framework, units of computation are often implemented
as autonomous parameter-efficient modules. Information is conditionally routed
to a subset of modules and subsequently aggregated. These properties enable
positive transfer and systematic generalisation by separating computation from
routing and updating modules locally. We offer a survey of modular
architectures, providing a unified view over several threads of research that
evolved independently in the scientific literature. Moreover, we explore
various additional purposes of modularity, including scaling language models,
causal inference, programme induction, and planning in reinforcement learning.
Finally, we report various concrete applications where modularity has been
successfully deployed such as cross-lingual and cross-modal knowledge transfer.
Related talks and projects to this survey, are available at
https://www.modulardeeplearning.com/
Internal and external pressures on language emergence: least effort, object constancy and frequency
In previous work, artificial agents were shown to achieve almost perfect
accuracy in referential games where they have to communicate to identify
images. Nevertheless, the resulting communication protocols rarely display
salient features of natural languages, such as compositionality. In this paper,
we propose some realistic sources of pressure on communication that avert this
outcome. More specifically, we formalise the principle of least effort through
an auxiliary objective. Moreover, we explore several game variants, inspired by
the principle of object constancy, in which we alter the frequency, position,
and luminosity of the objects in the images. We perform an extensive analysis
on their effect through compositionality metrics, diagnostic classifiers, and
zero-shot evaluation. Our findings reveal that the proposed sources of pressure
result in emerging languages with less redundancy, more focus on high-level
conceptual information, and better abilities of generalisation. Overall, our
contributions reduce the gap between emergent and natural languages
On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling
Model Merging by Uncertainty-Based Gradient Matching
Models trained on different datasets can be merged by a weighted-averaging of
their parameters, but why does it work and when can it fail? Here, we connect
the inaccuracy of weighted-averaging to mismatches in the gradients and propose
a new uncertainty-based scheme to improve the performance by reducing the
mismatch. The connection also reveals implicit assumptions in other schemes
such as averaging, task arithmetic, and Fisher-weighted averaging. Our new
method gives consistent improvements for large language models and vision
transformers, both in terms of performance and robustness to hyperparameters.Comment: Preprint. Under revie
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.</jats:p
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Linguistic typology aims to capture structural and semantic variation across
the world's languages. A large-scale typology could provide excellent guidance
for multilingual Natural Language Processing (NLP), particularly for languages
that suffer from the lack of human labeled resources. We present an extensive
literature survey on the use of typological information in the development of
NLP techniques. Our survey demonstrates that to date, the use of information in
existing typological databases has resulted in consistent but modest
improvements in system performance. We show that this is due to both intrinsic
limitations of databases (in terms of coverage and feature granularity) and
under-employment of the typological features included in them. We advocate for
a new approach that adapts the broad and discrete nature of typological
categories to the contextual and continuous nature of machine learning
algorithms used in contemporary NLP. In particular, we suggest that such
approach could be facilitated by recent developments in data-driven induction
of typological knowledge
Towards Modular LLMs by Building and Reusing a Library of LoRAs
The growing number of parameter-efficient adaptations of a base large
language model (LLM) calls for studying whether we can reuse such trained
adapters to improve performance for new tasks. We study how to best build a
library of adapters given multi-task data and devise techniques for both
zero-shot and supervised task generalization through routing in such library.
We benchmark existing approaches to build this library and introduce
model-based clustering, MBC, a method that groups tasks based on the similarity
of their adapter parameters, indirectly optimizing for transfer across the
multi-task dataset. To re-use the library, we present a novel zero-shot routing
mechanism, Arrow, which enables dynamic selection of the most relevant adapters
for new inputs without the need for retraining. We experiment with several
LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying
that MBC-based adapters and Arrow routing lead to superior generalization to
new tasks. We make steps towards creating modular, adaptable LLMs that can
match or outperform traditional joint training