155 research outputs found
Towards fast hybrid deep kernel learning methods
El treball estudia la millor manera de crear xarxes neuronals hÃbrides amb mètodes kernel mitjançant dues aproximacions de kernel diferents, random Fourier features i el mètode Nystrom, i la millor manera d'entrenar-les, amb RMSprop i stochastic gradient descent
BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
With the surge of large-scale pre-trained models (PTMs), fine-tuning these
models to numerous downstream tasks becomes a crucial problem. Consequently,
parameter efficient transfer learning (PETL) of large models has grasped huge
attention. While recent PETL methods showcase impressive performance, they rely
on optimistic assumptions: 1) the entire parameter set of a PTM is available,
and 2) a sufficiently large memory capacity for the fine-tuning is equipped.
However, in most real-world applications, PTMs are served as a black-box API or
proprietary software without explicit parameter accessibility. Besides, it is
hard to meet a large memory requirement for modern PTMs. In this work, we
propose black-box visual prompting (BlackVIP), which efficiently adapts the
PTMs without knowledge about model architectures and parameters. BlackVIP has
two components; 1) Coordinator and 2) simultaneous perturbation stochastic
approximation with gradient correction (SPSA-GC). The Coordinator designs
input-dependent image-shaped visual prompts, which improves few-shot adaptation
and robustness on distribution/location shift. SPSA-GC efficiently estimates
the gradient of a target model to update Coordinator. Extensive experiments on
16 datasets demonstrate that BlackVIP enables robust adaptation to diverse
domains without accessing PTMs' parameters, with minimal memory requirements.
Code: \url{https://github.com/changdaeoh/BlackVIP}Comment: Accepted to CVPR 202
FPGA implementation of a LSTM Neural Network
Este trabalho pretende fazer uma implementação customizada, em Hardware, duma Rede Neuronal Long Short-Term Memory. O modelo python, assim como a descrição Verilog, e sÃntese RTL, encontram-se terminadas. Falta apenas fazer o benchmarking e a integração de um sistema de aprendizagem
Fine-Tuning Language Models with Just Forward Passes
Fine-tuning language models (LMs) has yielded success on diverse downstream
tasks, but as LMs grow in size, backpropagation requires a prohibitively large
amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients
using only two forward passes but are theorized to be catastrophically slow for
optimizing large models. In this work, we propose a memory-efficient
zerothorder optimizer (MeZO), adapting the classical ZO-SGD method to operate
in-place, thereby fine-tuning LMs with the same memory footprint as inference.
For example, with a single A100 80GB GPU, MeZO can train a 30-billion parameter
model, whereas fine-tuning with backpropagation can train only a 2.7B LM with
the same budget. We conduct comprehensive experiments across model types
(masked and autoregressive LMs), model scales (up to 66B), and downstream tasks
(classification, multiple-choice, and generation). Our results demonstrate that
(1) MeZO significantly outperforms in-context learning and linear probing; (2)
MeZO achieves comparable performance to fine-tuning with backpropagation across
multiple tasks, with up to 12x memory reduction; (3) MeZO is compatible with
both full-parameter and parameter-efficient tuning techniques such as LoRA and
prefix tuning; (4) MeZO can effectively optimize non-differentiable objectives
(e.g., maximizing accuracy or F1). We support our empirical findings with
theoretical insights, highlighting how adequate pre-training and task prompts
enable MeZO to fine-tune huge models, despite classical ZO analyses suggesting
otherwise.Comment: Code available at https://github.com/princeton-nlp/MeZ
- …