2,131 research outputs found
Privacy-Preserving Gradient Boosting Decision Trees
The Gradient Boosting Decision Tree (GBDT) is a popular machine learning
model for various tasks in recent years. In this paper, we study how to improve
model accuracy of GBDT while preserving the strong guarantee of differential
privacy. Sensitivity and privacy budget are two key design aspects for the
effectiveness of differential private models. Existing solutions for GBDT with
differential privacy suffer from the significant accuracy loss due to too loose
sensitivity bounds and ineffective privacy budget allocations (especially
across different trees in the GBDT model). Loose sensitivity bounds lead to
more noise to obtain a fixed privacy level. Ineffective privacy budget
allocations worsen the accuracy loss especially when the number of trees is
large. Therefore, we propose a new GBDT training algorithm that achieves
tighter sensitivity bounds and more effective noise allocations. Specifically,
by investigating the property of gradient and the contribution of each tree in
GBDTs, we propose to adaptively control the gradients of training data for each
iteration and leaf node clipping in order to tighten the sensitivity bounds.
Furthermore, we design a novel boosting framework to allocate the privacy
budget between trees so that the accuracy loss can be further reduced. Our
experiments show that our approach can achieve much better model accuracy than
other baselines
Privet: A Privacy-Preserving Vertical Federated Learning Service for Gradient Boosted Decision Tables
Vertical federated learning (VFL) has recently emerged as an appealing
distributed paradigm empowering multi-party collaboration for training
high-quality models over vertically partitioned datasets. Gradient boosting has
been popularly adopted in VFL, which builds an ensemble of weak learners
(typically decision trees) to achieve promising prediction performance.
Recently there have been growing interests in using decision table as an
intriguing alternative weak learner in gradient boosting, due to its simpler
structure, good interpretability, and promising performance. In the literature,
there have been works on privacy-preserving VFL for gradient boosted decision
trees, but no prior work has been devoted to the emerging case of decision
tables. Training and inference on decision tables are different from that the
case of generic decision trees, not to mention gradient boosting with decision
tables in VFL. In light of this, we design, implement, and evaluate Privet, the
first system framework enabling privacy-preserving VFL service for gradient
boosted decision tables. Privet delicately builds on lightweight cryptography
and allows an arbitrary number of participants holding vertically partitioned
datasets to securely train gradient boosted decision tables. Extensive
experiments over several real-world datasets and synthetic datasets demonstrate
that Privet achieves promising performance, with utility comparable to
plaintext centralized learning.Comment: Accepted in IEEE Transactions on Services Computing (TSC
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
- …