Search CORE

1,613 research outputs found

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

Author: Bai Sikai
Gao Junyu
Guo Song
Hou Jun
Li Shuaicheng
Yang Kunlin
Yi Shuai
Zhang Jie
Zhang Shuai
Zhuang Weiming
Publication venue
Publication date: 11/07/2023
Field of study

Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets

arXiv.org e-Print Archive

Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data

Author: Mishra Divyanshu
Noble J. Alison
Saha Pramit
Publication venue
Publication date: 28/10/2023
Field of study

The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.Comment: Published in MICCAI 2023 with early acceptance and selected as 1 of the top 20 poster highlights under the category: Which work has the potential to impact other applications of AI and C

arXiv.org e-Print Archive

프로토티피컬 네트워크를 이용한 연합 준 지도 학습

Author: 김우중
Publication venue: 서울대학교 대학원
Publication date: 01/02/2022
Field of study

학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2022.2. 김형신.Federated Learning (FL) is being actively studied as computing power of edge devices increase. Most of the existing studies assume that there are full labels of data. However, since labeling data on the edge devices requires high cost, this assumption is not suitable in the real world. In general, most of the data each client has is often unlabeled. In this study, we propose a novel federated semi-supervised learning (FSSL) method. It uses prototype to utilize other clients’ knowledge and pseudo-labeling to compute loss about unlabeled data. It is a communication and computation efficient method than recent FSSL algorithm. In experiments, we showed that our method performed 3.8% better than not using unlabeled data with CIFAR-10 dataset, 4.6% better with SVHN dataset and 3.1% better with STL-10 dataset.연합 학습은 엣지 디바이스의 계산 능력이 증가하면서 활발하게 연구되고 있는 분야이다. 대부분의 기존 연구는 클라이언트가 가지고 있는 데이터에 레이블이 모두 있다고 가정한다. 하지만, 엣지 디바이스의 데이터를 레이블링 하는 작업은 비용이 많이 들기 때문에, 이러한 가정은 실생활에서 적합하지 않다. 일반적으로, 클라이언트가 가지고 있는 데이터의 대부분은 레이블이 없는 경우가 많다. 본 연구에서, 우리는 새로운 연합 준 지도 학습 방법을 제안한다. 이것은 다른 클라이언트의 지식을 이용하기 위해 프로토타입이라는 것을 사용하고, 레이블이 없는 데이터를 학습시킬 때 수도 레이블링이라는 작업을 한다. 제안 방법은 최신 기법보다 높은 성능을 보이고, 또한 통신 비용과 계산 비용 측면에서 더 효율적인 방법론이다. 실험을 통해 우리의 알고리즘이 레이블이 없는 데이터를 사용하지 않은 경우에 비해 CIFAR-10 데이터셋에서는 3.8%, SVHN 데이터셋에서는 4.6%, 그리고 STL-10 데이터셋에서는 3.1% 성능이 더 좋다는 결과를 얻었다.Chapter 1. Introduction 1 Chapter 2. Backgrounds 3 Chapter 3. Algorithm 6 Chapter 4. Experiments 10 Chapter 5. Conclusion 15 Bibliography 16석

SNU Open Repository and Archive

Uncertainty Minimization for Personalized Federated Semi-Supervised Learning

Author: Chen Siguang
Shi Yanhang
Zhang Haijun
Publication venue
Publication date: 19/07/2022
Field of study

Since federated learning (FL) has been introduced as a decentralized learning technique with privacy preservation, statistical heterogeneity of distributed data stays the main obstacle to achieve robust performance and stable convergence in FL applications. Model personalization methods have been studied to overcome this problem. However, existing approaches are mainly under the prerequisite of fully labeled data, which is unrealistic in practice due to the requirement of expertise. The primary issue caused by partial-labeled condition is that, clients with deficient labeled data can suffer from unfair performance gain because they lack adequate insights of local distribution to customize the global model. To tackle this problem, 1) we propose a novel personalized semi-supervised learning paradigm which allows partial-labeled or unlabeled clients to seek labeling assistance from data-related clients (helper agents), thus to enhance their perception of local data; 2) based on this paradigm, we design an uncertainty-based data-relation metric to ensure that selected helpers can provide trustworthy pseudo labels instead of misleading the local training; 3) to mitigate the network overload introduced by helper searching, we further develop a helper selection protocol to achieve efficient communication with negligible performance sacrifice. Experiments show that our proposed method can obtain superior performance and more stable convergence than other related works with partial labeled data, especially in highly heterogeneous setting.Comment: 11 page

arXiv.org e-Print Archive

FedSup: 교사-학생 구조 준지도 연합학습

Author: 길광연
Publication venue: 서울대학교 대학원
Publication date: 01/02/2023
Field of study

학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2023. 2. 김형신.Federated Learning (FL) is a machine learning paradigm in which multiple heterogeneous clients train local models with their data and only share the parameters to the server to create a centralized model. This paradigm, however, is based upon an unrealistic assumption that every client has fully labeled data readily available for training. Since labeling the data generally requires domain expertise and consistency, which are difficult to attain in a federated setup, it is more pragmatic to consider a scenario where clients own completely unlabeled data, whereas the server contains a small fraction of labeled data (Labels-At-Server). The methods to exploit unlabeled data at clients are actively being researched, which takes advantage of stochastic augmentations to improve the quality of pseudo-labels. Inspired by recent SSL methods and knowledge distillation, we propose a Semi-Supervised FL teacher-student architecture FedSup to tackle this problem. To demonstrate its validity, we conduct various experiments on CIFAR-10/CIFAR-100/STL-10 using naive applications of four popular SSL methods to FL and state-of-the-art Semi-Supervised FL methods, FedMatch and FedRGD. On both Independent and identically distributed (IID) and non-IID data, FedSup demonstrates higher accuracy on all three datasets compared to other methods under finetuning. Also, we conduct ablation studies on CIFAR-10 to explore why FedSup works better.연합 학습(FL)은 여러 클라이언트가 로컬 데이터로 모델을 훈련하고 매개 변수만 서버에 공유하여 중앙 집중식 모델을 만드는 머신 러닝 패러다임이다. 그러나 이 패러다임은 모든 데이터에 레이블이 완전히 지정되어 있다는 비현실적인 가정에 기초한다. 데이터에 레이블을 지정하려면 일반적으로 도메인 전문성과 일관성이 필요한데, 이는 연합 학습에서는 달성하기 어렵다. 그래서, 클라이언트가 레이블이 없는 데이터를 소유하는 반면, 서버에는 레이블이 지정된 데이터(Labels-At-Server)가 포함되어 있는 시나리오를 고려하는 것이 더 실용적이다. 클라이언트에서 레이블이 지정되지 않은 데이터를 활용하는 방법이 활발히 연구되고 있으며, 이는 확률적 데이터 증강을 활용하여 의사 라벨 (pseudo label)의 품질을 향상시킨다. 최근의 SSL 방법론들과 지식 증류에서 영감을 받아, 우리는 이 문제를 해결하기 위해 준지도 연합학습을 위한 교사-학생 아키텍처 FedSup을 제안한다. FedSup의 타당성을 입증하기 위해, 우리는 최근 준지도 연합학습 방법론인 FedMatch, FedRGD와 네 가지의 SSL 방법론을 연합학습에 적용하여 CIFAR-10/CIFAR-100/STL-10에 대한 다양한 실험을 수행한다. 독립 항등 분산(IID) 데이터와 비 IID 데이터 모두에서 FedSup은 미세 조정 중인 다른 방법에 비해 세 가지 데이터 모두에서 더 높은 정확도를 보여준다. 또한, 우리는 FedSup이 잘 작동하는 이유를 탐구하기 위해 CIFAR-10에 대한 절제 연구를 수행하였다.1.Introduction 1 2. Related works 4 2.1 Federated Learning 4 2.2 Unsupervised Representation Learning 5 2.3 Semi-Supervised Federated Learning 6 2.4 Bias in Classifier 6 3 Background 7 3.1 Supervised Federated Learning 7 3.2 Semi-Supervised Learning 7 3.2.1 FixMatch 8 3.2.2 SimCLR 9 3.2.3 SimSiam 10 3.2.4 BYOL 11 3.3 Gradient Diversity 11 4 Methods 12 4.1 Algorithm 12 4.1.1 FedSup 12 4.1.2 Semi-Supervised Federated Learning 16 5 Experimental Details 17 5.1 Experiments 17 5.1.1 Setup 17 5.1.2 Evaluation 18 6 Results and Discussions 20 6.1 Experimental Results 20 6.1.1 Main observations 20 6.1.2 Statistical Heterogeneity 21 6.1.3 Label Ratio 22 6.1.4 Ablation for Loss 22 6.1.5 Hyperparameter Search 24 6.2 Discussions 25 6.2.1 Semi-Supervised Learning for Federated Learning 25 6.2.2 Lack of Labels 26 7 Conclusion 27 8 Appendix 34 8.1 Detailed Experimental Results 34 8.2 Algorithms 35 Acknowledgement 38 Abstract (In Korean) 39석

SNU Open Repository and Archive

Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels

Author: Cho Yae Jee
Dimitriadis Dimitrios
Joshi Gauri
Publication venue
Publication date: 17/07/2023
Field of study

Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by

8

24\%

, and even outperforms standard fully supervised FL baselines (

100\%

labeled data) with only

5

20\%

of labeled data.Comment: To appear in the proceedings of ICCV 202

arXiv.org e-Print Archive

Efficient Semi-Supervised Federated Learning for Heterogeneous Participants

Author: Liao Yunming
Sun Zhipeng
Wang Zhiyuan
Xu Hongli
Xu Yang
Publication venue
Publication date: 10/11/2023
Field of study

Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data. However, training and deploying large-scale models on resource-constrained clients is challenging. Fortunately, Split Federated Learning (SFL) offers a feasible solution by alleviating the computation and/or communication burden on clients. However, existing SFL works often assume sufficient labeled data on clients, which is usually impractical. Besides, data non-IIDness across clients poses another challenge to ensure efficient model training. To our best knowledge, the above two issues have not been simultaneously addressed in SFL. Herein, we propose a novel Semi-SFL system, which incorporates clustering regularization to perform SFL under the more practical scenario with unlabeled and non-IID client data. Moreover, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data have an influence on the effectiveness of clustering regularization. To this end, we develop a control algorithm for dynamically adjusting the global updating frequency, so as to mitigate the training inconsistency and improve training performance. Extensive experiments on benchmark models and datasets show that our system provides a 3.0x speed-up in training time and reduces the communication cost by about 70.3% while reaching the target accuracy, and achieves up to 5.1% improvement in accuracy under non-IID scenarios compared to the state-of-the-art baselines.Comment: 16 pages, 12 figures, conferenc

arXiv.org e-Print Archive