217,607 research outputs found

    Less is More -- Towards parsimonious multi-task models using structured sparsity

    Full text link
    Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model's memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models in terms of performance. In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model. This approach facilitates the removal of extraneous groups i.e., channels (due to l1 regularization) and also imposes a penalty on the weights, further enhancing the learning efficiency for all tasks (due to l2 regularization). We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used Multi-Task Learning (MTL) datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which consist of three different computer vision tasks each, multi-task models with approximately 70% sparsity outperform their dense equivalents. We also investigate how changing the degree of sparsification influences the model's performance, the overall sparsity percentage, the patterns of sparsity, and the inference time.Comment: Under revie

    Multi-task Learning-based CSI Feedback Design in Multiple Scenarios

    Full text link
    For frequency division duplex systems, the essential downlink channel state information (CSI) feedback includes the links of compression, feedback, decompression and reconstruction to reduce the feedback overhead. One efficient CSI feedback method is the Auto-Encoder (AE) structure based on deep learning, yet facing problems in actual deployments, such as selecting the deployment mode when deploying in a cell with multiple complex scenarios. Rather than designing an AE network with huge complexity to deal with CSI of all scenarios, a more realistic mode is to divide the CSI dataset by region/scenario and use multiple relatively simple AE networks to handle subregions' CSI. However, both require high memory capacity for user equipment (UE) and are not suitable for low-level devices. In this paper, we propose a new user-friendly-designed framework based on the latter multi-tasking mode. Via Multi-Task Learning, our framework, Single-encoder-to-Multiple-decoders (S-to-M), designs the multiple independent AEs into a joint architecture: a shared encoder corresponds to multiple task-specific decoders. We also complete our framework with GateNet as a classifier to enable the base station autonomously select the right task-specific decoder corresponding to the subregion. Experiments on the simulating multi-scenario CSI dataset demonstrate our proposed S-to-M's advantages over the other benchmark modes, i.e., significantly reducing the model complexity and the UE's memory consumptionComment: 31 pages, 13 figures, 10 Table

    Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks

    Full text link
    Deep networks consume a large amount of memory by their nature. A natural question arises can we reduce that memory requirement whilst maintaining performance. In particular, in this work we address the problem of memory efficient learning for multiple tasks. To this end, we propose a novel network architecture producing multiple networks of different configurations, termed deep virtual networks (DVNs), for different tasks. Each DVN is specialized for a single task and structured hierarchically. The hierarchical structure, which contains multiple levels of hierarchy corresponding to different numbers of parameters, enables multiple inference for different memory budgets. The building block of a deep virtual network is based on a disjoint collection of parameters of a network, which we call a unit. The lowest level of hierarchy in a deep virtual network is a unit, and higher levels of hierarchy contain lower levels' units and other additional units. Given a budget on the number of parameters, a different level of a deep virtual network can be chosen to perform the task. A unit can be shared by different DVNs, allowing multiple DVNs in a single network. In addition, shared units provide assistance to the target task with additional knowledge learned from another tasks. This cooperative configuration of DVNs makes it possible to handle different tasks in a memory-aware manner. Our experiments show that the proposed method outperforms existing approaches for multiple tasks. Notably, ours is more efficient than others as it allows memory-aware inference for all tasks.Comment: CVPR 201

    Adversarial Multi-task Learning for Text Classification

    Full text link
    Neural network models have shown their promising opportunities for multi-task learning, which focus on learning the shared layers to extract the common and task-invariant features. However, in most existing approaches, the extracted shared features are prone to be contaminated by task-specific features or the noise brought by other tasks. In this paper, we propose an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other. We conduct extensive experiments on 16 different text classification tasks, which demonstrates the benefits of our approach. Besides, we show that the shared knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks. The datasets of all 16 tasks are publicly available at \url{http://nlp.fudan.edu.cn/data/}Comment: Accepted by ACL201
    corecore