11 research outputs found
Do the Frankenstein, or how to achieve better out-of-distribution performance with manifold mixing model soup
The standard recipe applied in transfer learning is to finetune a pretrained
model on the task-specific dataset with different hyperparameter settings and
pick the model with the highest accuracy on the validation dataset.
Unfortunately, this leads to models which do not perform well under
distribution shifts, e.g. when the model is given graphical sketches of the
object as input instead of photos. In order to address this, we propose the
manifold mixing model soup, an algorithm which mixes together the latent space
manifolds of multiple finetuned models in an optimal way in order to generate a
fused model. We show that the fused model gives significantly better
out-of-distribution performance (+3.5 % compared to best individual model) when
finetuning a CLIP model for image classification. In addition, it provides also
better accuracy on the original dataset where the finetuning has been done.Comment: Accepted for IMVIP 2023 conferenc
Role of Bootstrap Averaging in Generalized Approximate Message Passing
Generalized approximate message passing (GAMP) is a computationally efficient
algorithm for estimating an unknown signal from a random
linear measurement , where
is a known measurement matrix and is
the noise vector. The salient feature of GAMP is that it can provide an
unbiased estimator , which
can be used for various hypothesis-testing methods. In this study, we consider
the bootstrap average of an unbiased estimator of GAMP for the elastic net. By
numerically analyzing the state evolution of \emph{approximate message passing
with resampling}, which has been proposed for computing bootstrap statistics of
the elastic net estimator, we investigate when the bootstrap averaging reduces
the variance of the unbiased estimator and the effect of optimizing the size of
each bootstrap sample and hyperparameter of the elastic net regularization in
the asymptotic setting . The results
indicate that bootstrap averaging effectively reduces the variance of the
unbiased estimator when the actual data generation process is inconsistent with
the sparsity assumption of the regularization and the sample size is small.
Furthermore, we find that when is less sparse, and the data size is
small, the system undergoes a phase transition. The phase transition indicates
the existence of the region where the ensemble average of unbiased estimators
of GAMP for the elastic net norm minimization problem yields the unbiased
estimator with the minimum variance.Comment: 6 pages, 5 figure
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Low-rank adaptations (LoRA) are often employed to fine-tune large language
models (LLMs) for new tasks. This paper investigates LoRA composability for
cross-task generalization and introduces LoraHub, a strategic framework devised
for the purposive assembly of LoRA modules trained on diverse given tasks, with
the objective of achieving adaptable performance on unseen tasks. With just a
few examples from a novel task, LoraHub enables the fluid combination of
multiple LoRA modules, eradicating the need for human expertise. Notably, the
composition requires neither additional model parameters nor gradients. Our
empirical results, derived from the Big-Bench Hard (BBH) benchmark, suggest
that LoraHub can effectively mimic the performance of in-context learning in
few-shot scenarios, excluding the necessity of in-context examples alongside
each inference input. A significant contribution of our research is the
fostering of a community for LoRA, where users can share their trained LoRA
modules, thereby facilitating their application to new tasks. We anticipate
this resource will widen access to and spur advancements in general
intelligence as well as LLMs in production. Code will be available at
https://github.com/sail-sg/lorahub.Comment: Work in progress. The first three authors contributed equally to this
wor
LM-Cocktail: Resilient Tuning of Language Models via Model Merging
The pre-trained language models are continually fine-tuned to better support
downstream applications. However, this operation may result in significant
performance degeneration on general tasks beyond the targeted domain. To
overcome this problem, we propose LM-Cocktail which enables the fine-tuned
model to stay resilient in general perspectives. Our method is conducted in the
form of model merging, where the fine-tuned language model is merged with the
pre-trained base model or the peer models from other domains through weighted
average. Despite simplicity, LM-Cocktail is surprisingly effective: the
resulted model is able to achieve a strong empirical performance in the whole
scope of general tasks while preserving a superior capacity in its targeted
domain. We conduct comprehensive experiments with LLama and BGE model on
popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the
efficacy of our proposed method. The code and checkpoints are available at
https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail.Comment: Work is in progres
OPTION: OPTImization Algorithm Benchmarking ONtology
Many optimization algorithm benchmarking platforms allow users to share their
experimental data to promote reproducible and reusable research. However,
different platforms use different data models and formats, which drastically
complicates the identification of relevant datasets, their interpretation, and
their interoperability. Therefore, a semantically rich, ontology-based,
machine-readable data model that can be used by different platforms is highly
desirable. In this paper, we report on the development of such an ontology,
which we call OPTION (OPTImization algorithm benchmarking ONtology). Our
ontology provides the vocabulary needed for semantic annotation of the core
entities involved in the benchmarking process, such as algorithms, problems,
and evaluation measures. It also provides means for automatic data integration,
improved interoperability, and powerful querying capabilities, thereby
increasing the value of the benchmarking data. We demonstrate the utility of
OPTION, by annotating and querying a corpus of benchmark performance data from
the BBOB collection of the COCO framework and from the Yet Another Black-Box
Optimization Benchmark (YABBOB) family of the Nevergrad environment. In
addition, we integrate features of the BBOB functional performance landscape
into the OPTION knowledge base using publicly available datasets with
exploratory landscape analysis. Finally, we integrate the OPTION knowledge base
into the IOHprofiler environment and provide users with the ability to perform
meta-analysis of performance data
Generating 2D and 3D Master Faces for Dictionary Attacks with a Network-Assisted Latent Space Evolution
A master face is a face image that passes face-based identity authentication
for a high percentage of the population. These faces can be used to
impersonate, with a high probability of success, any user, without having
access to any user information. We optimize these faces for 2D and 3D face
verification models, by using an evolutionary algorithm in the latent embedding
space of the StyleGAN face generator. For 2D face verification, multiple
evolutionary strategies are compared, and we propose a novel approach that
employs a neural network to direct the search toward promising samples, without
adding fitness evaluations. The results we present demonstrate that it is
possible to obtain a considerable coverage of the identities in the LFW or RFW
datasets with less than 10 master faces, for six leading deep face recognition
systems. In 3D, we generate faces using the 2D StyleGAN2 generator and predict
a 3D structure using a deep 3D face reconstruction network. When employing two
different 3D face recognition systems, we are able to obtain a coverage of
40%-50%. Additionally, we present the generation of paired 2D RGB and 3D master
faces, which simultaneously match 2D and 3D models with high impersonation
rates.Comment: accepted for publication in IEEE Transactions on Biometrics,
Behavior, and Identity Science (TBIOM). This paper extends arXiv:2108.01077
that was accepted to IEEE FG 202
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment
With the continuous growth in the number of parameters of transformer-based
pretrained language models (PLMs), particularly the emergence of large language
models (LLMs) with billions of parameters, many natural language processing
(NLP) tasks have demonstrated remarkable success. However, the enormous size
and computational demands of these models pose significant challenges for
adapting them to specific downstream tasks, especially in environments with
limited computational resources. Parameter Efficient Fine-Tuning (PEFT) offers
an effective solution by reducing the number of fine-tuning parameters and
memory usage while achieving comparable performance to full fine-tuning. The
demands for fine-tuning PLMs, especially LLMs, have led to a surge in the
development of PEFT methods, as depicted in Fig. 1. In this paper, we present a
comprehensive and systematic review of PEFT methods for PLMs. We summarize
these PEFT methods, discuss their applications, and outline future directions.
Furthermore, we conduct experiments using several representative PEFT methods
to better understand their effectiveness in parameter efficiency and memory
efficiency. By offering insights into the latest advancements and practical
applications, this survey serves as an invaluable resource for researchers and
practitioners seeking to navigate the challenges and opportunities presented by
PEFT in the context of PLMs.Comment: 20 pages, 4 figure
State-dependent activity dynamics of hypothalamic stress effector neurons
The stress response necessitates an immediate boost in vital physiological functions from their homeostatic operation to an elevated emergency response. However, the neural mechanisms underlying this state-dependent change remain largely unknown. Using a combination of in vivo and ex vivo electrophysiology with computational modeling, we report that corticotropin releasing hormone (CRH) neurons in the paraventricular nucleus of the hypothalamus (PVN), the effector neurons of hormonal stress response, rapidly transition between distinct activity states through recurrent inhibition. Specifically, in vivo optrode recording shows that under non-stress conditions, CRHPVN neurons often fire with rhythmic brief bursts (RB), which, somewhat counterintuitively, constrains firing rate due to long (~2 s) interburst intervals. Stressful stimuli rapidly switch RB to continuous single spiking (SS), permitting a large increase in firing rate. A spiking network model shows that recurrent inhibition can control this activity-state switch, and more broadly the gain of spiking responses to excitatory inputs. In biological CRHPVN neurons ex vivo, the injection of whole-cell currents derived from our computational model recreates the in vivo-like switch between RB and SS, providing direct evidence that physiologically relevant network inputs enable state-dependent computation in single neurons. Together, we present a novel mechanism for state-dependent activity dynamics in CRHPVN neurons
Versatile black-box optimization
International audienc