2 research outputs found
S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees
Privacy-preserving learning of gradient boosting decision trees (GBDT) has
the potential for strong utility-privacy tradeoffs for tabular data, such as
census data or medical meta data: classical GBDT learners can extract
non-linear patterns from small sized datasets. The state-of-the-art notion for
provable privacy-properties is differential privacy, which requires that the
impact of single data points is limited and deniable. We introduce a novel
differentially private GBDT learner and utilize four main techniques to improve
the utility-privacy tradeoff. (1) We use an improved noise scaling approach
with tighter accounting of privacy leakage of a decision tree leaf compared to
prior work, resulting in noise that in expectation scales with , for
data points. (2) We integrate individual R\'enyi filters to our method to
learn from data points that have been underutilized during an iterative
training process, which -- potentially of independent interest -- results in a
natural yet effective insight to learning streams of non-i.i.d. data. (3) We
incorporate the concept of random decision tree splits to concentrate privacy
budget on learning leaves. (4) We deploy subsampling for privacy amplification.
Our evaluation shows for the Abalone dataset ( training data points) a
-score of for , which the closest prior work only
achieved for . On the Adult dataset ( training data
points) we achieve test error of for which the
closest prior work only achieved for . For the Abalone dataset
for we achieve -score of which is very close to
the -score of for the nonprivate version of GBDT. For the Adult
dataset for we achieve test error which is very
close to the test error of the nonprivate version of GBDT.Comment: The first two authors equally contributed to this wor