7 research outputs found

    Collaborative Chinese Text Recognition with Personalized Federated Learning

    Full text link
    In Chinese text recognition, to compensate for the insufficient local data and improve the performance of local few-shot character recognition, it is often necessary for one organization to collect a large amount of data from similar organizations. However, due to the natural presence of private information in text data, such as addresses and phone numbers, different organizations are unwilling to share private data. Therefore, it becomes increasingly important to design a privacy-preserving collaborative training framework for the Chinese text recognition task. In this paper, we introduce personalized federated learning (pFL) into the Chinese text recognition task and propose the pFedCR algorithm, which significantly improves the model performance of each client (organization) without sharing private data. Specifically, pFedCR comprises two stages: multiple rounds of global model training stage and the the local personalization stage. During stage 1, an attention mechanism is incorporated into the CRNN model to adapt to various client data distributions. Leveraging inherent character data characteristics, a balanced dataset is created on the server to mitigate character imbalance. In the personalization phase, the global model is fine-tuned for one epoch to create a local model. Parameter averaging between local and global models combines personalized and global feature extraction capabilities. Finally, we fine-tune only the attention layers to enhance its focus on local personalized features. The experimental results on three real-world industrial scenario datasets show that the pFedCR algorithm can improve the performance of local personalized models by about 20\% while also improving their generalization performance on other client data domains. Compared to other state-of-the-art personalized federated learning methods, pFedCR improves performance by 6\% ∼\sim 8\%

    Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

    Full text link
    Recently, semi-supervised federated learning (semi-FL) has been proposed to handle the commonly seen real-world scenarios with labeled data on the server and unlabeled data on the clients. However, existing methods face several challenges such as communication costs, data heterogeneity, and training pressure on client devices. To address these challenges, we introduce the powerful diffusion models (DM) into semi-FL and propose FedDISC, a Federated Diffusion-Inspired Semi-supervised Co-training method. Specifically, we first extract prototypes of the labeled server data and use these prototypes to predict pseudo-labels of the client data. For each category, we compute the cluster centroids and domain-specific representations to signify the semantic and stylistic information of their distributions. After adding noise, these representations are sent back to the server, which uses the pre-trained DM to generate synthetic datasets complying with the client distributions and train a global model on it. With the assistance of vast knowledge within DM, the synthetic datasets have comparable quality and diversity to the client images, subsequently enabling the training of global models that achieve performance equivalent to or even surpassing the ceiling of supervised centralized training. FedDISC works within one communication round, does not require any local training, and involves very minimal information uploading, greatly enhancing its practicality. Extensive experiments on three large-scale datasets demonstrate that FedDISC effectively addresses the semi-FL problem on non-IID clients and outperforms the compared SOTA methods. Sufficient visualization experiments also illustrate that the synthetic dataset generated by FedDISC exhibits comparable diversity and quality to the original client dataset, with a neglectable possibility of leaking privacy-sensitive information of the clients

    A Case-Control Study of the Association between Polymorphisms in the Fibrinogen Alpha Chain Gene and Schizophrenia

    No full text
    Our previous studies using the mass spectrum analysis provided evidence that fibrinopeptide A (FPA) could be a potential biomarker for schizophrenia diagnosis. We sought further to demonstrate that variants in the fibrinogen alpha chain gene (FGA) coded FPA might confer vulnerability to schizophrenia. 1,145 patients with schizophrenia and 1,016 healthy volunteers from the Han population in Northeast China were recruited. The association of three tag single nucleotide polymorphisms (SNPs) (rs2070011 in the 5′UTR, rs2070016 in intron 4, and rs2070022 in the 3′UTR) in FGA and schizophrenia was examined using a case-control study design. Genotypic distributions of these three SNPs were not found to be significantly different between cases and controls (rs2070011: χ2=1.28, P=0.528; rs2070016: χ2=4.11, P=0.128; rs2070022: χ2=1.23, P=0.541). There were also no significant differences in SNP allelic frequencies between cases and controls (all P>0.05). Additionally, the frequency of haplotypes consisting of alleles of these three SNPs was not significantly different between cases and healthy control subjects (global χ2=9.27, P=0.159). Our study did not show a significant association of FGA SNPs with schizophrenia. Future studies may need to test more FGA SNPs in a larger sample to identify those SNPs with a minor or moderate effect on schizophrenia
    corecore