Deep learning-based linkage of records across different databases is becoming
increasingly useful in data integration and mining applications to discover new
insights from multiple sources of data. However, due to privacy and
confidentiality concerns, organisations often are not willing or allowed to
share their sensitive data with any external parties, thus making it
challenging to build/train deep learning models for record linkage across
different organizations' databases. To overcome this limitation, we propose the
first deep learning-based multi-party privacy-preserving record linkage (PPRL)
protocol that can be used to link sensitive databases held by multiple
different organisations. In our approach, each database owner first trains a
local deep learning model, which is then uploaded to a secure environment and
securely aggregated to create a global model. The global model is then used by
a linkage unit to distinguish unlabelled record pairs as matches and
non-matches. We utilise differential privacy to achieve provable privacy
protection against re-identification attacks. We evaluate the linkage quality
and scalability of our approach using several large real-world databases,
showing that it can achieve high linkage quality while providing sufficient
privacy protection against existing attacks.Comment: 11 page