S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees

Cotrini, Carlos; Kirschte, Moritz; Mohammadi, Esfandiar; Peinemann, Thorsten; Stock, Joshua

S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees

Authors: Carlos Cotrini
Moritz Kirschte
Esfandiar Mohammadi
Thorsten Peinemann
Joshua Stock
Publication date: 28 September 2023
Publisher

Abstract

Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with

O(1/n)

, for

n

data points. (2) We integrate individual R\'enyi filters to our method to learn from data points that have been underutilized during an iterative training process, which -- potentially of independent interest -- results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset (

<4k

training data points) a

R^2

-score of

0.39

for

\varepsilon=0.15

, which the closest prior work only achieved for

\varepsilon=10.0

. On the Adult dataset (

50k

training data points) we achieve test error of

18.7\,\%

for

\varepsilon=0.07

which the closest prior work only achieved for

\varepsilon=1.0

. For the Abalone dataset for

\varepsilon=0.54

we achieve

R^2

-score of

0.47

which is very close to the

R^2

-score of

0.54

for the nonprivate version of GBDT. For the Adult dataset for

\varepsilon=0.54

we achieve test error

17.1\,\%

which is very close to the test error

13.7\,\%

of the nonprivate version of GBDT.Comment: The first two authors equally contributed to this wor

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2309.12041

Last time updated on 10/12/2023