S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees

Abstract

Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with O(1/n)O(1/n), for nn data points. (2) We integrate individual R\'enyi filters to our method to learn from data points that have been underutilized during an iterative training process, which -- potentially of independent interest -- results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset (<4k<4k training data points) a R2R^2-score of 0.390.39 for ε=0.15\varepsilon=0.15, which the closest prior work only achieved for ε=10.0\varepsilon=10.0. On the Adult dataset (50k50k training data points) we achieve test error of 18.7 %18.7\,\% for ε=0.07\varepsilon=0.07 which the closest prior work only achieved for ε=1.0\varepsilon=1.0. For the Abalone dataset for ε=0.54\varepsilon=0.54 we achieve R2R^2-score of 0.470.47 which is very close to the R2R^2-score of 0.540.54 for the nonprivate version of GBDT. For the Adult dataset for ε=0.54\varepsilon=0.54 we achieve test error 17.1 %17.1\,\% which is very close to the test error 13.7 %13.7\,\% of the nonprivate version of GBDT.Comment: The first two authors equally contributed to this wor

    Similar works

    Full text

    thumbnail-image

    Available Versions