Federated learning (FL) is emerging as a privacy-aware alternative to
classical cloud-based machine learning. In FL, the sensitive data remains in
data silos and only aggregated parameters are exchanged. Hospitals and research
institutions which are not willing to share their data can join a federated
study without breaching confidentiality. In addition to the extreme sensitivity
of biomedical data, the high dimensionality poses a challenge in the context of
federated genome-wide association studies (GWAS). In this article, we present a
federated singular value decomposition (SVD) algorithm, suitable for the
privacy-related and computational requirements of GWAS. Notably, the algorithm
has a transmission cost independent of the number of samples and is only weakly
dependent on the number of features, because the singular vectors associated
with the samples are never exchanged and the vectors associated with the
features only for a fixed number of iterations. Although motivated by GWAS, the
algorithm is generically applicable for both horizontally and vertically
partitioned data.Comment: 36 pages, 7 figures, 5 tables, submitted to Data Mining and Knowledge
Discover