2,131 research outputs found

    Privacy-Preserving Gradient Boosting Decision Trees

    Full text link
    The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines

    Privet: A Privacy-Preserving Vertical Federated Learning Service for Gradient Boosted Decision Tables

    Full text link
    Vertical federated learning (VFL) has recently emerged as an appealing distributed paradigm empowering multi-party collaboration for training high-quality models over vertically partitioned datasets. Gradient boosting has been popularly adopted in VFL, which builds an ensemble of weak learners (typically decision trees) to achieve promising prediction performance. Recently there have been growing interests in using decision table as an intriguing alternative weak learner in gradient boosting, due to its simpler structure, good interpretability, and promising performance. In the literature, there have been works on privacy-preserving VFL for gradient boosted decision trees, but no prior work has been devoted to the emerging case of decision tables. Training and inference on decision tables are different from that the case of generic decision trees, not to mention gradient boosting with decision tables in VFL. In light of this, we design, implement, and evaluate Privet, the first system framework enabling privacy-preserving VFL service for gradient boosted decision tables. Privet delicately builds on lightweight cryptography and allows an arbitrary number of participants holding vertically partitioned datasets to securely train gradient boosted decision tables. Extensive experiments over several real-world datasets and synthetic datasets demonstrate that Privet achieves promising performance, with utility comparable to plaintext centralized learning.Comment: Accepted in IEEE Transactions on Services Computing (TSC

    Scalable Privacy-Compliant Virality Prediction on Twitter

    Get PDF
    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi
    • …
    corecore