2 research outputs found
Efficient Encrypted Inference on Ensembles of Decision Trees
Data privacy concerns often prevent the use of cloud-based machine learning
services for sensitive personal data. While homomorphic encryption (HE) offers
a potential solution by enabling computations on encrypted data, the challenge
is to obtain accurate machine learning models that work within the
multiplicative depth constraints of a leveled HE scheme. Existing approaches
for encrypted inference either make ad-hoc simplifications to a pre-trained
model (e.g., replace hard comparisons in a decision tree with soft comparators)
at the cost of accuracy or directly train a new depth-constrained model using
the original training set. In this work, we propose a framework to transfer
knowledge extracted by complex decision tree ensembles to shallow neural
networks (referred to as DTNets) that are highly conducive to encrypted
inference. Our approach minimizes the accuracy loss by searching for the best
DTNet architecture that operates within the given depth constraints and
training this DTNet using only synthetic data sampled from the training data
distribution. Extensive experiments on real-world datasets demonstrate that
these characteristics are critical in ensuring that DTNet accuracy approaches
that of the original tree ensemble. Our system is highly scalable and can
perform efficient inference on batched encrypted (134 bits of security) data
with amortized time in milliseconds. This is approximately three orders of
magnitude faster than the standard approach of applying soft comparison at the
internal nodes of the ensemble trees.Comment: 9 pages, 6 figure
SoK: Privacy-Preserving Collaborative Tree-based Model Learning
Tree-based models are among the most efficient machine learning techniques
for data mining nowadays due to their accuracy, interpretability, and
simplicity. The recent orthogonal needs for more data and privacy protection
call for collaborative privacy-preserving solutions. In this work, we survey
the literature on distributed and privacy-preserving training of tree-based
models and we systematize its knowledge based on four axes: the learning
algorithm, the collaborative model, the protection mechanism, and the threat
model. We use this to identify the strengths and limitations of these works and
provide for the first time a framework analyzing the information leakage
occurring in distributed tree-based model learning