Federated Learning (FL) enables collaborative model training while preserving
the privacy of raw data. A challenge in this framework is the fair and
efficient valuation of data, which is crucial for incentivizing clients to
contribute high-quality data in the FL task. In scenarios involving numerous
data clients within FL, it is often the case that only a subset of clients and
datasets are pertinent to a specific learning task, while others might have
either a negative or negligible impact on the model training process. This
paper introduces a novel privacy-preserving method for evaluating client
contributions and selecting relevant datasets without a pre-specified training
algorithm in an FL task. Our proposed approach FedBary, utilizes Wasserstein
distance within the federated context, offering a new solution for data
valuation in the FL framework. This method ensures transparent data valuation
and efficient computation of the Wasserstein barycenter and reduces the
dependence on validation datasets. Through extensive empirical experiments and
theoretical analyses, we demonstrate the potential of this data valuation
method as a promising avenue for FL research.Comment: Fixed some experimental errors and typo