71 research outputs found
FedET: A Communication-Efficient Federated Class-Incremental Learning Framework Based on Enhanced Transformer
Federated Learning (FL) has been widely concerned for it enables
decentralized learning while ensuring data privacy. However, most existing
methods unrealistically assume that the classes encountered by local clients
are fixed over time. After learning new classes, this assumption will make the
model's catastrophic forgetting of old classes significantly severe. Moreover,
due to the limitation of communication cost, it is challenging to use
large-scale models in FL, which will affect the prediction accuracy. To address
these challenges, we propose a novel framework, Federated Enhanced Transformer
(FedET), which simultaneously achieves high accuracy and low communication
cost. Specifically, FedET uses Enhancer, a tiny module, to absorb and
communicate new knowledge, and applies pre-trained Transformers combined with
different Enhancers to ensure high precision on various tasks. To address local
forgetting caused by new classes of new tasks and global forgetting brought by
non-i.i.d (non-independent and identically distributed) class imbalance across
different local clients, we proposed an Enhancer distillation method to modify
the imbalance between old and new knowledge and repair the non-i.i.d. problem.
Experimental results demonstrate that FedET's average accuracy on
representative benchmark datasets is 14.1% higher than the state-of-the-art
method, while FedET saves 90% of the communication cost compared to the
previous method.Comment: Accepted by 2023 International Joint Conference on Artificial
Intelligence (IJCAI2023
Pose Guided Human Image Synthesis with Partially Decoupled GAN
Pose Guided Human Image Synthesis (PGHIS) is a challenging task of
transforming a human image from the reference pose to a target pose while
preserving its style. Most existing methods encode the texture of the whole
reference human image into a latent space, and then utilize a decoder to
synthesize the image texture of the target pose. However, it is difficult to
recover the detailed texture of the whole human image. To alleviate this
problem, we propose a method by decoupling the human body into several parts
(\eg, hair, face, hands, feet, \etc) and then using each of these parts to
guide the synthesis of a realistic image of the person, which preserves the
detailed information of the generated images. In addition, we design a
multi-head attention-based module for PGHIS. Because most convolutional neural
network-based methods have difficulty in modeling long-range dependency due to
the convolutional operation, the long-range modeling capability of attention
mechanism is more suitable than convolutional neural networks for pose transfer
task, especially for sharp pose deformation. Extensive experiments on
Market-1501 and DeepFashion datasets reveal that our method almost outperforms
other existing state-of-the-art methods in terms of both qualitative and
quantitative metrics.Comment: 16 pages, 14th Asian Conference on Machine Learning conferenc
Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification
Data-Free Knowledge Distillation (DFKD) has recently attracted growing
attention in the academic community, especially with major breakthroughs in
computer vision. Despite promising results, the technique has not been well
applied to audio and signal processing. Due to the variable duration of audio
signals, it has its own unique way of modeling. In this work, we propose
feature-rich audio model inversion (FRAMI), a data-free knowledge distillation
framework for general sound classification tasks. It first generates
high-quality and feature-rich Mel-spectrograms through a feature-invariant
contrastive loss. Then, the hidden states before and after the statistics
pooling layer are reused when knowledge distillation is performed on these
feature-rich samples. Experimental results on the Urbansound8k, ESC-50, and
audioMNIST datasets demonstrate that FRAMI can generate feature-rich samples.
Meanwhile, the accuracy of the student model is further improved by reusing the
hidden state and significantly outperforms the baseline method.Comment: Accepted by ICASSP 2023. International Conference on Acoustics,
Speech and Signal Processing (ICASSP 2023
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
The Transformer architecture model, based on self-attention and multi-head
attention, has achieved remarkable success in offline end-to-end Automatic
Speech Recognition (ASR). However, self-attention and multi-head attention
cannot be easily applied for streaming or online ASR. For self-attention in
Transformer ASR, the softmax normalization function-based attention mechanism
makes it impossible to highlight important speech information. For multi-head
attention in Transformer ASR, it is not easy to model monotonic alignments in
different heads. To overcome these two limits, we integrate sparse attention
and monotonic attention into Transformer-based ASR. The sparse mechanism
introduces a learned sparsity scheme to enable each self-attention structure to
fit the corresponding head better. The monotonic attention deploys
regularization to prune redundant heads for the multi-head attention structure.
The experiments show that our method can effectively improve the attention
mechanism on widely used benchmarks of speech recognition.Comment: Accepted to DSAA 202
Blur the Linguistic Boundary: Interpreting Chinese Buddhist Sutra in English via Neural Machine Translation
Buddhism is an influential religion with a long-standing history and profound
philosophy. Nowadays, more and more people worldwide aspire to learn the
essence of Buddhism, attaching importance to Buddhism dissemination. However,
Buddhist scriptures written in classical Chinese are obscure to most people and
machine translation applications. For instance, general Chinese-English neural
machine translation (NMT) fails in this domain. In this paper, we proposed a
novel approach to building a practical NMT model for Buddhist scriptures. The
performance of our translation pipeline acquired highly promising results in
ablation experiments under three criteria.Comment: This paper is accepted by ICTAI 2022. The 34th IEEE International
Conference on Tools with Artificial Intelligence (ICTAI
Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning
This paper proposes Shoggoth, an efficient edge-cloud collaborative
architecture, for boosting inference performance on real-time video of changing
scenes. Shoggoth uses online knowledge distillation to improve the accuracy of
models suffering from data drift and offloads the labeling process to the
cloud, alleviating constrained resources of edge devices. At the edge, we
design adaptive training using small batches to adapt models under limited
computing power, and adaptive sampling of training frames for robustness and
reducing bandwidth. The evaluations on the realistic dataset show 15%-20% model
accuracy improvement compared to the edge-only strategy and fewer network costs
than the cloud-only strategy.Comment: Accepted by 60th ACM/IEEE Design Automation Conference (DAC2023
EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices
Real-time video analytics on edge devices for changing scenes remains a
difficult task. As edge devices are usually resource-constrained, edge deep
neural networks (DNNs) have fewer weights and shallower architectures than
general DNNs. As a result, they only perform well in limited scenarios and are
sensitive to data drift. In this paper, we introduce EdgeMA, a practical and
efficient video analytics system designed to adapt models to shifts in
real-world video streams over time, addressing the data drift problem. EdgeMA
extracts the gray level co-occurrence matrix based statistical texture feature
and uses the Random Forest classifier to detect the domain shift. Moreover, we
have incorporated a method of model adaptation based on importance weighting,
specifically designed to update models to cope with the label distribution
shift. Through rigorous evaluation of EdgeMA on a real-world dataset, our
results illustrate that EdgeMA significantly improves inference accuracy.Comment: Accepted by 30th International Conference on Neural Information
Processing (ICONIP 2023
Statistical approach for mining breast cancer patterns
74 p.The main objective of this dissertation is to help physicians predict and diagnose the recurrence of breast cancer by providing which factors of the tumor are more important and have more direct relations with recurrence. According to conventional point of view, the size of tumor is closely associated to the recurrence of breast cancer. But this does not provide enough information. In this dissertation, some other probable factors of the tumor may also contribute to the recurrence, such as the texture of the tumor.Master of Science (Biomedical Engineering
- …