293 research outputs found

    Multi-Head Adapter Routing for Data-Efficient Fine-Tuning

    Full text link
    Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly added parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon (Ponti et al.) jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and fine-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efficient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation (Poly-mu) instead of learning a routing function. Moreover, we introduce more expressive variants where finer-grained task-adapter allocation is learned through a multi-head routing function (Poly-S). We test these variants on three separate benchmarks for multi-task learning. We find that Poly-S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we find that instruction tuning, where models are fully fine-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.Comment: Preprin

    Soft Merging of Experts with Adaptive Routing

    Full text link
    Sparsely activated neural networks with conditional computation learn to route their inputs through different "expert" subnetworks, providing a form of modularity that densely activated models lack. Despite their possible benefits, models with learned routing often underperform their parameter-matched densely activated counterparts as well as models that use non-learned heuristic routing strategies. In this paper, we hypothesize that these shortcomings stem from the gradient estimation techniques used to train sparsely activated models that use non-differentiable discrete routing decisions. To address this issue, we introduce Soft Merging of Experts with Adaptive Routing (SMEAR), which avoids discrete routing by using a single "merged" expert constructed via a weighted average of all of the experts' parameters. By routing activations through a single merged expert, SMEAR does not incur a significant increase in computational costs and enables standard gradient-based training. We empirically validate that models using SMEAR outperform models that route based on metadata or learn sparse routing through gradient estimation. Furthermore, we provide qualitative analysis demonstrating that the experts learned via SMEAR exhibit a significant amount of specialization. All of the code used in our experiments is publicly available

    Automated Search for Resource-Efficient Branched Multi-Task Networks

    Full text link
    The multi-modal nature of many vision problems calls for neural network architectures that can perform multiple tasks concurrently. Typically, such architectures have been handcrafted in the literature. However, given the size and complexity of the problem, this manual architecture exploration likely exceeds human design abilities. In this paper, we propose a principled approach, rooted in differentiable neural architecture search, to automatically define branching (tree-like) structures in the encoding stage of a multi-task neural network. To allow flexibility within resource-constrained environments, we introduce a proxyless, resource-aware loss that dynamically controls the model size. Evaluations across a variety of dense prediction tasks show that our approach consistently finds high-performing branching structures within limited resource budgets.Comment: British Machine Vision Conference (BMVC) 202

    Optimization and Learning in Energy Efficient Cognitive Radio System

    Get PDF
    Energy efficiency and spectrum efficiency are two biggest concerns for wireless communication. The constrained power supply is always a bottleneck to the modern mobility communication system. Meanwhile, spectrum resource is extremely limited but seriously underutilized. Cognitive radio (CR) as a promising approach could alleviate the spectrum underutilization and increase the quality of service. In contrast to traditional wireless communication systems, a distinguishing feature of cognitive radio systems is that the cognitive radios, which are typically equipped with powerful computation machinery, are capable of sensing the spectrum environment and making intelligent decisions. Moreover, the cognitive radio systems differ from traditional wireless systems that they can adapt their operating parameters, i.e. transmission power, channel, modulation according to the surrounding radio environment to explore the opportunity. In this dissertation, the study is focused on the optimization and learning of energy efficiency in the cognitive radio system, which can be considered to better utilize both the energy and spectrum resources. Firstly, drowsy transmission, which produces optimized idle period patterns and selects the best sleep mode for each idle period between two packet transmissions through joint power management and transmission power control/rate selection, is introduced to cognitive radio transmitter. Both the optimal solution by dynamic programming and flexible solution by reinforcement learning are provided. Secondly, when cognitive radio system is benefited from the theoretically infinite but unsteady harvested energy, an innovative and flexible control framework mainly based on model predictive control is designed. The solution to combat the problems, such as the inaccurate model and myopic control policy introduced by MPC, is given. Last, after study the optimization problem for point-to-point communication, multi-objective reinforcement learning is applied to the cognitive radio network, an adaptable routing algorithm is proposed and implemented. Epidemic propagation is studied to further understand the learning process in the cognitive radio network
    • …
    corecore