293 research outputs found
Multi-Head Adapter Routing for Data-Efficient Fine-Tuning
Parameter-efficient fine-tuning (PEFT) methods can adapt large language
models to downstream tasks by training a small amount of newly added
parameters. In multi-task settings, PEFT adapters typically train on each task
independently, inhibiting transfer across tasks, or on the concatenation of all
tasks, which can lead to negative interference. To address this, Polytropon
(Ponti et al.) jointly learns an inventory of PEFT adapters and a routing
function to share variable-size sets of adapters across tasks. Subsequently,
adapters can be re-combined and fine-tuned on novel tasks even with limited
data. In this paper, we investigate to what extent the ability to control which
adapters are active for each task leads to sample-efficient generalization.
Thus, we propose less expressive variants where we perform weighted averaging
of the adapters before few-shot adaptation (Poly-mu) instead of learning a
routing function. Moreover, we introduce more expressive variants where
finer-grained task-adapter allocation is learned through a multi-head routing
function (Poly-S). We test these variants on three separate benchmarks for
multi-task learning. We find that Poly-S achieves gains on all three (up to 5.3
points on average) over strong baselines, while incurring a negligible
additional cost in parameter count. In particular, we find that instruction
tuning, where models are fully fine-tuned on natural language instructions for
each task, is inferior to modular methods such as Polytropon and our proposed
variants.Comment: Preprin
Soft Merging of Experts with Adaptive Routing
Sparsely activated neural networks with conditional computation learn to
route their inputs through different "expert" subnetworks, providing a form of
modularity that densely activated models lack. Despite their possible benefits,
models with learned routing often underperform their parameter-matched densely
activated counterparts as well as models that use non-learned heuristic routing
strategies. In this paper, we hypothesize that these shortcomings stem from the
gradient estimation techniques used to train sparsely activated models that use
non-differentiable discrete routing decisions. To address this issue, we
introduce Soft Merging of Experts with Adaptive Routing (SMEAR), which avoids
discrete routing by using a single "merged" expert constructed via a weighted
average of all of the experts' parameters. By routing activations through a
single merged expert, SMEAR does not incur a significant increase in
computational costs and enables standard gradient-based training. We
empirically validate that models using SMEAR outperform models that route based
on metadata or learn sparse routing through gradient estimation. Furthermore,
we provide qualitative analysis demonstrating that the experts learned via
SMEAR exhibit a significant amount of specialization. All of the code used in
our experiments is publicly available
Automated Search for Resource-Efficient Branched Multi-Task Networks
The multi-modal nature of many vision problems calls for neural network
architectures that can perform multiple tasks concurrently. Typically, such
architectures have been handcrafted in the literature. However, given the size
and complexity of the problem, this manual architecture exploration likely
exceeds human design abilities. In this paper, we propose a principled
approach, rooted in differentiable neural architecture search, to automatically
define branching (tree-like) structures in the encoding stage of a multi-task
neural network. To allow flexibility within resource-constrained environments,
we introduce a proxyless, resource-aware loss that dynamically controls the
model size. Evaluations across a variety of dense prediction tasks show that
our approach consistently finds high-performing branching structures within
limited resource budgets.Comment: British Machine Vision Conference (BMVC) 202
Optimization and Learning in Energy Efficient Cognitive Radio System
Energy efficiency and spectrum efficiency are two biggest concerns for wireless communication. The constrained power supply is always a bottleneck to the modern mobility communication system. Meanwhile, spectrum resource is extremely limited but seriously underutilized.
Cognitive radio (CR) as a promising approach could alleviate the spectrum underutilization and increase the quality of service. In contrast to traditional wireless communication systems, a distinguishing feature of cognitive radio systems is that the cognitive radios, which are typically equipped with powerful computation machinery, are capable of sensing the spectrum environment and making intelligent decisions. Moreover, the cognitive radio systems differ from traditional wireless systems that they can adapt their operating parameters, i.e. transmission power, channel, modulation according to the surrounding radio environment to explore the opportunity.
In this dissertation, the study is focused on the optimization and learning of energy efficiency in the cognitive radio system, which can be considered to better utilize both the energy and spectrum resources. Firstly, drowsy transmission, which produces optimized idle period patterns and selects the best sleep mode for each idle period between two packet transmissions through joint power management and transmission power control/rate selection, is introduced to cognitive radio transmitter. Both the optimal solution by dynamic programming and flexible solution by reinforcement learning are provided. Secondly, when cognitive radio system is benefited from the theoretically infinite but unsteady harvested energy, an innovative and flexible control framework mainly based on model predictive control is designed. The solution to combat the problems, such as the inaccurate model and myopic control policy introduced by MPC, is given. Last, after study the optimization problem for point-to-point communication, multi-objective reinforcement learning is applied to the cognitive radio network, an adaptable routing algorithm is proposed and implemented. Epidemic propagation is studied to further understand the learning process in the cognitive radio network
- …