Search CORE

2 research outputs found

Differentially-Private Decision Trees with Probabilistic Robustness to Data Poisoning

Author: Erkin Zekeriya
Li Tianyu
Verwer Sicco
Vos Daniël
Vos Jelle
Publication venue
Publication date: 24/05/2023
Field of study

Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions create random decision nodes that reduce decision tree accuracy or spend an excessive share of the privacy budget on labeling leaves. Moreover, many works do not support or leak information about feature values when data is continuous. We propose a new method called PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget. The resulting trees provide a significantly better privacy-utility trade-off and accept mixed numerical and categorical data without leaking additional information. Finally, while it is notoriously hard to give robustness guarantees against data poisoning attacks, we prove bounds for the expected success rates of backdoor attacks against differentially-private learners. Our experimental results show that PrivaTree consistently outperforms previous works on predictive accuracy and significantly improves robustness against backdoor attacks compared to regular decision trees

arXiv.org e-Print Archive

Automatically Selecting Parameters for Graph-Based Clustering

Author: Callister Ross
Publication venue: Curtin University
Publication date: 01/01/2020
Field of study

Data streams present a number of challenges, caused by change in stream concepts over time. In this thesis we present a novel method for detection of concept drift within data streams by analysing geometric features of the clustering algorithm, RepStream. Further, we present novel methods for automatically adjusting critical input parameters over time, and generating self-organising nearest-neighbour graphs, improving robustness and decreasing the need to domain-specific knowledge in the face of stream evolution

espace@Curtin