In recent years, data are typically distributed in multiple organizations
while the data security is becoming increasingly important. Federated Learning
(FL), which enables multiple parties to collaboratively train a model without
exchanging the raw data, has attracted more and more attention. Based on the
distribution of data, FL can be realized in three scenarios, i.e., horizontal,
vertical, and hybrid. In this paper, we propose to combine distributed machine
learning techniques with Vertical FL and propose a Distributed Vertical
Federated Learning (DVFL) approach. The DVFL approach exploits a fully
distributed architecture within each party in order to accelerate the training
process. In addition, we exploit Homomorphic Encryption (HE) to protect the
data against honest-but-curious participants. We conduct extensive
experimentation in a large-scale cluster environment and a cloud environment in
order to show the efficiency and scalability of our proposed approach. The
experiments demonstrate the good scalability of our approach and the
significant efficiency advantage (up to 6.8 times with a single server and 15.1
times with multiple servers in terms of the training time) compared with
baseline frameworks.Comment: To appear in CCPE (Concurrency and Computation: Practice and
Experience