Exploiting Record Similarity for Practical Vertical Federated Learning

He, Bingsheng; Li, Qinbin; Wu, Zhaomin

Exploiting Record Similarity for Practical Vertical Federated Learning

Authors: Bingsheng He
Qinbin Li
Zhaomin Wu
Publication date: 11 June 2021
Publisher

Abstract

As the privacy of machine learning has drawn increasing attention, federated learning is introduced to enable collaborative learning without revealing raw data. Notably, \textit{vertical federated learning} (VFL), where parties share the same set of samples but only hold partial features, has a wide range of real-world applications. However, existing studies in VFL rarely study the ``record linkage'' process. They either design algorithms assuming the data from different parties have been linked or use simple linkage methods like exact-linkage or top1-linkage. These approaches are unsuitable for many applications, such as the GPS location and noisy titles requiring fuzzy matching. In this paper, we design a novel similarity-based VFL framework, FedSim, which is suitable for more real-world applications and achieves higher performance on traditional VFL tasks. Moreover, we theoretically analyze the privacy risk caused by sharing similarities. Our experiments on three synthetic datasets and five real-world datasets with various similarity metrics show that FedSim consistently outperforms other state-of-the-art baselines

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2106.06312

Last time updated on 16/06/2021