176 research outputs found
One-Shot Pruning for Fast-adapting Pre-trained Models on Devices
Large-scale pre-trained models have been remarkably successful in resolving
downstream tasks. Nonetheless, deploying these models on low-capability devices
still requires an effective approach, such as model pruning. However, pruning
the model from scratch can pose a practical challenge given the limited
resources of each downstream task or device. To tackle this issue, we present a
scalable one-shot pruning method that leverages pruned knowledge of similar
tasks to extract a sub-network from the pre-trained model for a new task.
Specifically, we create a score mask using the pruned models of similar tasks
to identify task-specific filters/nodes in the pre-trained model for the new
task. Based on this mask, we conduct a single round of pruning to extract a
suitably-sized sub-network that can quickly adapt to the new task with only a
few training iterations. Our experimental analysis demonstrates the
effectiveness of the proposed method on the convolutional neural networks
(CNNs) and vision transformers (ViT) with various datasets. The proposed method
consistently outperforms popular pruning baseline methods in terms of accuracy
and efficiency when dealing with diverse downstream tasks with different memory
constraints
Personalization Disentanglement for Federated Learning
Personalized federated learning (PFL) jointly trains a variety of local
models through balancing between knowledge sharing across clients and model
personalization per client. This paper addresses PFL via explicit disentangling
latent representations into two parts to capture the shared knowledge and
client-specific personalization, which leads to more reliable and effective
PFL. The disentanglement is achieved by a novel Federated Dual Variational
Autoencoder (FedDVA), which employs two encoders to infer the two types of
representations. FedDVA can produce a better understanding of the trade-off
between global knowledge sharing and local personalization in PFL. Moreover, it
can be integrated with existing FL methods and turn them into personalized
models for heterogeneous downstream tasks. Extensive experiments validate the
advantages caused by disentanglement and show that models trained with
disentangled representations substantially outperform those vanilla methods
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Neural networks equipped with self-attention have parallelizable computation,
light-weight structure, and the ability to capture both long-range and local
dependencies. Further, their expressive power and performance can be boosted by
using a vector to measure pairwise dependency, but this requires to expand the
alignment matrix to a tensor, which results in memory and computation
bottlenecks. In this paper, we propose a novel attention mechanism called
"Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as
memory-efficient as a CNN, but significantly outperforms previous
CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token)
and global (source2token) dependencies by a novel compatibility function
composed of dot-product and additive attentions, 2) uses a tensor to represent
the feature-wise alignment scores for better expressive power but only requires
parallelizable matrix multiplications, and 3) combines multi-head with
multi-dimensional attentions, and applies a distinct positional mask to each
head (subspace), so the memory and computation can be distributed to multiple
heads, each with sequential information encoded independently. The experiments
show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or
competitive performance on nine NLP benchmarks with compelling memory- and
time-efficiency
Distributionally Robust Semi-Supervised Learning for People-Centric Sensing
Semi-supervised learning is crucial for alleviating labelling burdens in
people-centric sensing. However, human-generated data inherently suffer from
distribution shift in semi-supervised learning due to the diverse biological
conditions and behavior patterns of humans. To address this problem, we propose
a generic distributionally robust model for semi-supervised learning on
distributionally shifted data. Considering both the discrepancy and the
consistency between the labeled data and the unlabeled data, we learn the
latent features that reduce person-specific discrepancy and preserve
task-specific consistency. We evaluate our model in a variety of people-centric
recognition tasks on real-world datasets, including intention recognition,
activity recognition, muscular movement recognition and gesture recognition.
The experiment results demonstrate that the proposed model outperforms the
state-of-the-art methods.Comment: 8 pages, accepted by AAAI201
Federated Recommendation with Additive Personalization
Building recommendation systems via federated learning (FL) is a new emerging
challenge for advancing next-generation Internet service and privacy
protection. Existing approaches train shared item embedding by FL while keeping
the user embedding private on client side. However, item embedding identical
for all clients cannot capture users' individual differences on perceiving the
same item and thus leads to poor personalization. Moreover, dense item
embedding in FL results in expensive communication cost and latency. To address
these challenges, we propose Federated Recommendation with Additive
Personalization (FedRAP), which learns a global view of items via FL and a
personalized view locally on each user. FedRAP enforces sparsity of the global
view to save FL's communication cost and encourages difference between the two
views through regularization. We propose an effective curriculum to learn the
local and global views progressively with increasing regularization weights. To
produce recommendations for an user, FedRAP adds the two views together to
obtain a personalized item embedding. FedRAP achieves the best performance in
FL setting on multiple benchmarks. It outperforms recent federated
recommendation methods and several ablation study baselines.Comment: 9 pages, conferenc
- …