Search CORE

293 research outputs found

Multi-Head Adapter Routing for Data-Efficient Fine-Tuning

Author: Caccia Lucas
Liu Lucas
Pereira Matheus
Ponti Edoardo
Roux Nicolas Le
Sordoni Alessandro
Publication venue
Publication date: 07/11/2022
Field of study

Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly added parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon (Ponti et al.) jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and fine-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efficient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation (Poly-mu) instead of learning a routing function. Moreover, we introduce more expressive variants where finer-grained task-adapter allocation is learned through a multi-head routing function (Poly-S). We test these variants on three separate benchmarks for multi-task learning. We find that Poly-S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we find that instruction tuning, where models are fully fine-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.Comment: Preprin

arXiv.org e-Print Archive

Soft Merging of Experts with Adaptive Routing

Author: Liu Haokun
Muqeeth Mohammed
Raffel Colin
Publication venue
Publication date: 06/06/2023
Field of study

Sparsely activated neural networks with conditional computation learn to route their inputs through different "expert" subnetworks, providing a form of modularity that densely activated models lack. Despite their possible benefits, models with learned routing often underperform their parameter-matched densely activated counterparts as well as models that use non-learned heuristic routing strategies. In this paper, we hypothesize that these shortcomings stem from the gradient estimation techniques used to train sparsely activated models that use non-differentiable discrete routing decisions. To address this issue, we introduce Soft Merging of Experts with Adaptive Routing (SMEAR), which avoids discrete routing by using a single "merged" expert constructed via a weighted average of all of the experts' parameters. By routing activations through a single merged expert, SMEAR does not incur a significant increase in computational costs and enables standard gradient-based training. We empirically validate that models using SMEAR outperform models that route based on metadata or learn sparse routing through gradient estimation. Furthermore, we provide qualitative analysis demonstrating that the experts learned via SMEAR exhibit a significant amount of specialization. All of the code used in our experiments is publicly available

arXiv.org e-Print Archive

Automated Search for Resource-Efficient Branched Multi-Task Networks

Author: Bruggemann David
Georgoulis Stamatios
Kanakis Menelaos
Van Gool Luc
Publication venue
Publication date: 01/01/2020
Field of study

The multi-modal nature of many vision problems calls for neural network architectures that can perform multiple tasks concurrently. Typically, such architectures have been handcrafted in the literature. However, given the size and complexity of the problem, this manual architecture exploration likely exceeds human design abilities. In this paper, we propose a principled approach, rooted in differentiable neural architecture search, to automatically define branching (tree-like) structures in the encoding stage of a multi-task neural network. To allow flexibility within resource-constrained environments, we introduce a proxyless, resource-aware loss that dynamically controls the model size. Evaluations across a variety of dense prediction tasks show that our approach consistently finds high-performing branching structures within limited resource budgets.Comment: British Machine Vision Conference (BMVC) 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Learning and optimization in combinatorial spaces:With a focus on deep learning for vehicle routing

Author: Kool W.
Publication venue
Publication date: 01/01/2022
Field of study

International Migration, Integration and Social Cohesion online publications

Optimization and Learning in Energy Efficient Cognitive Radio System

Author: Zheng Kun
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2012
Field of study

Energy efficiency and spectrum efficiency are two biggest concerns for wireless communication. The constrained power supply is always a bottleneck to the modern mobility communication system. Meanwhile, spectrum resource is extremely limited but seriously underutilized. Cognitive radio (CR) as a promising approach could alleviate the spectrum underutilization and increase the quality of service. In contrast to traditional wireless communication systems, a distinguishing feature of cognitive radio systems is that the cognitive radios, which are typically equipped with powerful computation machinery, are capable of sensing the spectrum environment and making intelligent decisions. Moreover, the cognitive radio systems differ from traditional wireless systems that they can adapt their operating parameters, i.e. transmission power, channel, modulation according to the surrounding radio environment to explore the opportunity. In this dissertation, the study is focused on the optimization and learning of energy efficiency in the cognitive radio system, which can be considered to better utilize both the energy and spectrum resources. Firstly, drowsy transmission, which produces optimized idle period patterns and selects the best sleep mode for each idle period between two packet transmissions through joint power management and transmission power control/rate selection, is introduced to cognitive radio transmitter. Both the optimal solution by dynamic programming and flexible solution by reinforcement learning are provided. Secondly, when cognitive radio system is benefited from the theoretically infinite but unsteady harvested energy, an innovative and flexible control framework mainly based on model predictive control is designed. The solution to combat the problems, such as the inaccurate model and myopic control policy introduced by MPC, is given. Last, after study the optimization problem for point-to-point communication, multi-objective reinforcement learning is applied to the cognitive radio network, an adaptable routing algorithm is proposed and implemented. Epidemic propagation is studied to further understand the learning process in the cognitive radio network

University of Tennessee, Knoxville: Trace