Search CORE

155 research outputs found

Towards fast hybrid deep kernel learning methods

Author: Lara Miquel Miquel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

El treball estudia la millor manera de crear xarxes neuronals híbrides amb mètodes kernel mitjançant dues aproximacions de kernel diferents, random Fourier features i el mètode Nystrom, i la millor manera d'entrenar-les, amb RMSprop i stochastic gradient descent

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning

Author: Choi Hosik
Hwang Hyeji
Jung Geunyoung
Jung Jiyoung
Lee Hee-young
Lim YongTaek
Oh Changdae
Song Kyungwoo
Publication venue
Publication date: 26/03/2023
Field of study

With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements. Code: \url{https://github.com/changdaeoh/BlackVIP}Comment: Accepted to CVPR 202

arXiv.org e-Print Archive

FPGA implementation of a LSTM Neural Network

Author: José Pedro Castro Fonseca
Publication venue
Publication date: 21/07/2016
Field of study

Este trabalho pretende fazer uma implementação customizada, em Hardware, duma Rede Neuronal Long Short-Term Memory. O modelo python, assim como a descrição Verilog, e síntese RTL, encontram-se terminadas. Falta apenas fazer o benchmarking e a integração de um sistema de aprendizagem

Repositório Aberto da Universidade do Porto

Fine-Tuning Language Models with Just Forward Passes

Author: Arora Sanjeev
Chen Danqi
Damian Alex
Gao Tianyu
Lee Jason D.
Malladi Sadhika
Nichani Eshaan
Publication venue
Publication date: 26/05/2023
Field of study

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two forward passes but are theorized to be catastrophically slow for optimizing large models. In this work, we propose a memory-efficient zerothorder optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference. For example, with a single A100 80GB GPU, MeZO can train a 30-billion parameter model, whereas fine-tuning with backpropagation can train only a 2.7B LM with the same budget. We conduct comprehensive experiments across model types (masked and autoregressive LMs), model scales (up to 66B), and downstream tasks (classification, multiple-choice, and generation). Our results demonstrate that (1) MeZO significantly outperforms in-context learning and linear probing; (2) MeZO achieves comparable performance to fine-tuning with backpropagation across multiple tasks, with up to 12x memory reduction; (3) MeZO is compatible with both full-parameter and parameter-efficient tuning techniques such as LoRA and prefix tuning; (4) MeZO can effectively optimize non-differentiable objectives (e.g., maximizing accuracy or F1). We support our empirical findings with theoretical insights, highlighting how adequate pre-training and task prompts enable MeZO to fine-tune huge models, despite classical ZO analyses suggesting otherwise.Comment: Code available at https://github.com/princeton-nlp/MeZ

arXiv.org e-Print Archive