7 research outputs found
Collaborative Chinese Text Recognition with Personalized Federated Learning
In Chinese text recognition, to compensate for the insufficient local data
and improve the performance of local few-shot character recognition, it is
often necessary for one organization to collect a large amount of data from
similar organizations. However, due to the natural presence of private
information in text data, such as addresses and phone numbers, different
organizations are unwilling to share private data. Therefore, it becomes
increasingly important to design a privacy-preserving collaborative training
framework for the Chinese text recognition task. In this paper, we introduce
personalized federated learning (pFL) into the Chinese text recognition task
and propose the pFedCR algorithm, which significantly improves the model
performance of each client (organization) without sharing private data.
Specifically, pFedCR comprises two stages: multiple rounds of global model
training stage and the the local personalization stage. During stage 1, an
attention mechanism is incorporated into the CRNN model to adapt to various
client data distributions. Leveraging inherent character data characteristics,
a balanced dataset is created on the server to mitigate character imbalance. In
the personalization phase, the global model is fine-tuned for one epoch to
create a local model. Parameter averaging between local and global models
combines personalized and global feature extraction capabilities. Finally, we
fine-tune only the attention layers to enhance its focus on local personalized
features. The experimental results on three real-world industrial scenario
datasets show that the pFedCR algorithm can improve the performance of local
personalized models by about 20\% while also improving their generalization
performance on other client data domains. Compared to other state-of-the-art
personalized federated learning methods, pFedCR improves performance by 6\%
8\%
Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model
Recently, semi-supervised federated learning (semi-FL) has been proposed to
handle the commonly seen real-world scenarios with labeled data on the server
and unlabeled data on the clients. However, existing methods face several
challenges such as communication costs, data heterogeneity, and training
pressure on client devices. To address these challenges, we introduce the
powerful diffusion models (DM) into semi-FL and propose FedDISC, a Federated
Diffusion-Inspired Semi-supervised Co-training method. Specifically, we first
extract prototypes of the labeled server data and use these prototypes to
predict pseudo-labels of the client data. For each category, we compute the
cluster centroids and domain-specific representations to signify the semantic
and stylistic information of their distributions. After adding noise, these
representations are sent back to the server, which uses the pre-trained DM to
generate synthetic datasets complying with the client distributions and train a
global model on it. With the assistance of vast knowledge within DM, the
synthetic datasets have comparable quality and diversity to the client images,
subsequently enabling the training of global models that achieve performance
equivalent to or even surpassing the ceiling of supervised centralized
training. FedDISC works within one communication round, does not require any
local training, and involves very minimal information uploading, greatly
enhancing its practicality. Extensive experiments on three large-scale datasets
demonstrate that FedDISC effectively addresses the semi-FL problem on non-IID
clients and outperforms the compared SOTA methods. Sufficient visualization
experiments also illustrate that the synthetic dataset generated by FedDISC
exhibits comparable diversity and quality to the original client dataset, with
a neglectable possibility of leaking privacy-sensitive information of the
clients
A Case-Control Study of the Association between Polymorphisms in the Fibrinogen Alpha Chain Gene and Schizophrenia
Our previous studies using the mass spectrum analysis provided evidence that fibrinopeptide A (FPA) could be a potential biomarker for schizophrenia diagnosis. We sought further to demonstrate that variants in the fibrinogen alpha chain gene (FGA) coded FPA might confer vulnerability to schizophrenia. 1,145 patients with schizophrenia and 1,016 healthy volunteers from the Han population in Northeast China were recruited. The association of three tag single nucleotide polymorphisms (SNPs) (rs2070011 in the 5′UTR, rs2070016 in intron 4, and rs2070022 in the 3′UTR) in FGA and schizophrenia was examined using a case-control study design. Genotypic distributions of these three SNPs were not found to be significantly different between cases and controls (rs2070011: χ2=1.28, P=0.528; rs2070016: χ2=4.11, P=0.128; rs2070022: χ2=1.23, P=0.541). There were also no significant differences in SNP allelic frequencies between cases and controls (all P>0.05). Additionally, the frequency of haplotypes consisting of alleles of these three SNPs was not significantly different between cases and healthy control subjects (global χ2=9.27, P=0.159). Our study did not show a significant association of FGA SNPs with schizophrenia. Future studies may need to test more FGA SNPs in a larger sample to identify those SNPs with a minor or moderate effect on schizophrenia