467 research outputs found
TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models
The full potential of large pretrained models remains largely untapped in
control domains like robotics. This is mainly because of the scarcity of data
and the computational challenges associated with training or fine-tuning these
large models for such applications. Prior work mainly emphasizes effective
pretraining of large models for decision-making, with little exploration into
how to perform data-efficient continual adaptation of these models for new
tasks. Recognizing these constraints, we introduce TAIL (Task-specific Adapters
for Imitation Learning), a framework for efficient adaptation to new control
tasks. Inspired by recent advancements in parameter-efficient fine-tuning in
language domains, we explore efficient fine-tuning techniques -- e.g.,
Bottleneck Adapters, P-Tuning, and Low-Rank Adaptation (LoRA) -- in TAIL to
adapt large pretrained models for new tasks with limited demonstration data.
Our extensive experiments in large-scale language-conditioned manipulation
tasks comparing prevalent parameter-efficient fine-tuning techniques and
adaptation baselines suggest that TAIL with LoRA can achieve the best
post-adaptation performance with only 1\% of the trainable parameters of full
fine-tuning, while avoiding catastrophic forgetting and preserving adaptation
plasticity in continual learning settings.Comment: 21 pages, 8 figures, 8 table
Zero-shot personalization of speech foundation models for depressed mood monitoring
The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation
There are significant challenges for speaker adaptation in text-to-speech for
languages that are not widely spoken or for speakers with accents or dialects
that are not well-represented in the training data. To address this issue, we
propose the use of the "mixture of adapters" method. This approach involves
adding multiple adapters within a backbone-model layer to learn the unique
characteristics of different speakers. Our approach outperforms the baseline,
with a noticeable improvement of 5% observed in speaker preference tests when
using only one minute of data for each new speaker. Moreover, following the
adapter paradigm, we fine-tune only the adapter parameters (11% of the total
model parameters). This is a significant achievement in parameter-efficient
speaker adaptation, and one of the first models of its kind. Overall, our
proposed approach offers a promising solution to the speech synthesis
techniques, particularly for adapting to speakers from diverse backgrounds.Comment: Interspeech 202
Introducing deep learning -based methods into the variant calling analysis pipeline
Biological interpretation of the genetic variation enhances our understanding of normal and pathological phenotypes, and may lead to the development of new therapeutics.
However, it is heavily dependent on the genomic data analysis, which might be inaccurate due to the various sequencing errors and inconsistencies caused by these errors. Modern analysis pipelines already utilize heuristic and statistical techniques, but the rate of falsely identified mutations remains high and variable, particular sequencing technology, settings and variant type.
Recently, several tools based on deep neural networks have been published. The neural networks are supposed to find motifs in the data that were not previously seen.
The performance of these novel tools is assessed in terms of precision and recall, as well as computational efficiency. Following the established best practices in both variant detection and benchmarking, the discussed tools demonstrate accuracy metrics and computational efficiency that spur further discussion
Recommended from our members
Towards Universal Object Detection
Object detection is one of the most important and challenging research topics in computer vision. It is playing an important role in our everyday life and has many applications, e.g. surveillance, autonomous driving, robotics, drone, medical imaging, etc. The ultimate goal of object detection is a universal object detector that can work very well in any case under any condition like human vision system. However, there are multiple challenges on the universality of object detection, e.g. scale-variance, high-quality requirement, domain shift, computational constraint, etc. These will prevent the object detector from being widely used for various scales of objects, critical applications requiring extremely accurate localization, scenarios with changing domain priors, and diverse hardware settings. To address these challenges, multiple solutions have been proposed in this thesis. These include an efficient multi-scale architecture to achieve scale-invariant detection, a robust multi-stage framework effective for high-quality requirement, a cross-domain solution to extend the universality over various domains, and a design of complexity-aware cascades and a novel low-precision network to enhance the universality under different computational constraints. All these efforts have substantially improved the universality of object detection, and the advanced object detector can be applied to broader environments
TOAST: Transfer Learning via Attention Steering
Transfer learning involves adapting a pre-trained model to novel downstream
tasks. However, we observe that current transfer learning methods often fail to
focus on task-relevant features. In this work, we explore refocusing model
attention for transfer learning. We introduce Top-Down Attention Steering
(TOAST), a novel transfer learning algorithm that keeps the pre-trained
backbone frozen, selects task-relevant features in the output, and feeds those
features back to the model to steer the attention to the task-specific
features. By refocusing the attention only, TOAST achieves state-of-the-art
results on a number of transfer learning benchmarks, while having a small
number of tunable parameters. Compared to fully fine-tuning, LoRA, and prompt
tuning, TOAST substantially improves performance across a range of fine-grained
visual classification datasets (e.g., 81.1% -> 86.2% on FGVC). TOAST also
outperforms the fully fine-tuned Alpaca and Vicuna models on
instruction-following language generation. Code is available at
https://github.com/bfshi/TOAST.Comment: Code is available at https://github.com/bfshi/TOAS
Efficient Pyramid Channel Attention Network for Pathological Myopia Detection
Pathological myopia (PM) is the leading ocular disease for impaired vision
and blindness worldwide. The key to detecting PM as early as possible is to
detect informative features in global and local lesion regions, such as fundus
tessellation, atrophy and maculopathy. However, applying classical
convolutional neural networks (CNNs) to efficiently highlight global and local
lesion context information in feature maps is quite challenging. To tackle this
issue, we aim to fully leverage the potential of global and local lesion
information with attention module design. Based on this, we propose an
efficient pyramid channel attention (EPCA) module, which dynamically explores
the relative importance of global and local lesion context information in
feature maps. Then we combine the EPCA module with the backbone network to
construct EPCA-Net for automatic PM detection based on fundus images. In
addition, we construct a PM dataset termed PM-fundus by collecting fundus
images of PM from publicly available datasets (e.g., the PALM dataset and ODIR
dataset). The comprehensive experiments are conducted on three datasets,
demonstrating that our EPCA-Net outperforms state-of-the-art methods in
detecting PM. Furthermore, motivated by the recent pretraining-and-finetuning
paradigm, we attempt to adapt pre-trained natural image models for PM detection
by freezing them and treating the EPCA module and other attention modules as
the adapters. The results show that our method with the
pretraining-and-finetuning paradigm achieves competitive performance through
comparisons to part of methods with traditional fine-tuning methods with fewer
tunable parameters.Comment: 12 page
- …