3 research outputs found
Exploiting Record Similarity for Practical Vertical Federated Learning
As the privacy of machine learning has drawn increasing attention, federated
learning is introduced to enable collaborative learning without revealing raw
data. Notably, \textit{vertical federated learning} (VFL), where parties share
the same set of samples but only hold partial features, has a wide range of
real-world applications. However, existing studies in VFL rarely study the
``record linkage'' process. They either design algorithms assuming the data
from different parties have been linked or use simple linkage methods like
exact-linkage or top1-linkage. These approaches are unsuitable for many
applications, such as the GPS location and noisy titles requiring fuzzy
matching. In this paper, we design a novel similarity-based VFL framework,
FedSim, which is suitable for more real-world applications and achieves higher
performance on traditional VFL tasks. Moreover, we theoretically analyze the
privacy risk caused by sharing similarities. Our experiments on three synthetic
datasets and five real-world datasets with various similarity metrics show that
FedSim consistently outperforms other state-of-the-art baselines
SFour: A Protocol for Cryptographically Secure Record Linkage at Scale
The prevalence of various (and increasingly large) datasets presents the challenging problem of discovering common entities dispersed across disparate datasets. Solutions to the private record linkage problem (PRL) aim to enable such explorations of datasets in a secure manner.
A two-party PRL protocol allows two parties to determine for which entities they each possess a record (either an exact matching record or a fuzzy matching record) in their respective datasets — without revealing to one another information about any entities for which they do not both possess records. Although several solutions have been proposed to solve the PRL problem, no current solution offers a fully cryptographic security guarantee while maintaining both high accuracy of output and subquadratic runtime efficiency.
To this end, we propose the first known efficient PRL protocol that runs in subquadratic time, provides high accuracy, and guarantees cryptographic security