Search CORE

110 research outputs found

Recommended from our members

Data Summarizations for Scalable, Robust and Privacy-Aware Learning in High Dimensions

Author: Manousakas Dionysios
Publication venue: University of Cambridge
Publication date: 30/10/2021
Field of study

The advent of large-scale datasets has offered unprecedented amounts of information for building statistically powerful machines, but, at the same time, also introduced a remarkable computational challenge: how can we efficiently process massive data? This thesis presents a suite of data reduction methods that make learning algorithms scale on large datasets, via extracting a succinct model-specific representation that summarizes the full data collection—a coreset. Our frameworks support by design datasets of arbitrary dimensionality, and can be used for general purpose Bayesian inference under real-world constraints, including privacy preservation and robustness to outliers, encompassing diverse uncertainty-aware data analysis tasks, such as density estimation, classification and regression. We motivate the necessity for novel data reduction techniques in the first place by developing a reidentification attack on coarsened representations of private behavioural data. Analysing longitudinal records of human mobility, we detect privacy-revealing structural patterns, that remain preserved in reduced graph representations of individuals’ information with manageable size. These unique patterns enable mounting linkage attacks via structural similarity computations on longitudinal mobility traces, revealing an overlooked, yet existing, privacy threat. We then propose a scalable variational inference scheme for approximating posteriors on large datasets via learnable weighted pseudodata, termed pseudocoresets. We show that the use of pseudodata enables overcoming the constraints on minimum summary size for given approximation quality, that are imposed on all existing Bayesian coreset constructions due to data dimensionality. Moreover, it allows us to develop a scheme for pseudocoresets-based summarization that satisfies the standard framework of differential privacy by construction; in this way, we can release reduced size privacy-preserving representations for sensitive datasets that are amenable to arbitrary post-processing. Subsequently, we consider summarizations for large-scale Bayesian inference in scenarios when observed datapoints depart from the statistical assumptions of our model. Using robust divergences, we develop a method for constructing coresets resilient to model misspecification. Crucially, this method is able to automatically discard outliers from the generated data summaries. Thus we deliver robustified scalable representations for inference, that are suitable for applications involving contaminated and unreliable data sources. We demonstrate the performance of proposed summarization techniques on multiple parametric statistical models, and diverse simulated and real-world datasets, from music genre features to hospital readmission records, considering a wide range of data dimensionalities.Nokia Bell Labs, Lundgren Fund, Darwin College, University of Cambridge Department of Computer Science & Technology, University of Cambridg

Apollo (Cambridge)

Parameter-free locally differentially private stochastic subgradient descent

Author: Jun Kwang-Sung
Orabona Francesco
Publication venue
Publication date: 21/11/2019
Field of study

https://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfPublished versio

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

A Privacy-Preserving Hybrid Federated Learning Framework for Financial Crime Detection

Author: Dong Fan
Drew Steve
Hong Junyuan
Xue Liangjie
Zhang Haobo
Zhou Jiayu
Publication venue
Publication date: 18/04/2023
Field of study

The recent decade witnessed a surge of increase in financial crimes across the public and private sectors, with an average cost of scams of $102m to financial institutions in 2022. Developing a mechanism for battling financial crimes is an impending task that requires in-depth collaboration from multiple institutions, and yet such collaboration imposed significant technical challenges due to the privacy and security requirements of distributed financial data. For example, consider the modern payment network systems, which can generate millions of transactions per day across a large number of global institutions. Training a detection model of fraudulent transactions requires not only secured transactions but also the private account activities of those involved in each transaction from corresponding bank systems. The distributed nature of both samples and features prevents most existing learning systems from being directly adopted to handle the data mining task. In this paper, we collectively address these challenges by proposing a hybrid federated learning system that offers secure and privacy-aware learning and inference for financial crime detection. We conduct extensive empirical studies to evaluate the proposed framework's detection performance and privacy-protection capability, evaluating its robustness against common malicious attacks of collaborative learning. We release our source code at https://github.com/illidanlab/HyFL .Comment: PETs prize challenge versio

arXiv.org e-Print Archive

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Author: Bietti Alberto
Mairal Julien
Publication venue
Publication date: 23/01/2017
Field of study

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2017, Long Beach, CA, United State

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Learning from Data with Heterogeneous Noise using SGD

Author: Chaudhuri Kamalika
Sarwate Anand D.
Song Shuang
Publication venue
Publication date: 17/12/2014
Field of study

We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality. The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Experiments on real data show that our method performs better than using a single learning rate and using only the less noisy of the two datasets when the noise level is low to moderate

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning

Author: Norton Andrew P.
Qi Yanjun
Publication venue
Publication date: 01/08/2017
Field of study

Recent studies have shown that attackers can force deep learning models to misclassify so-called "adversarial examples": maliciously generated images formed by making imperceptible modifications to pixel values. With growing interest in deep learning for security applications, it is important for security experts and users of machine learning to recognize how learning systems may be attacked. Due to the complex nature of deep learning, it is challenging to understand how deep models can be fooled by adversarial examples. Thus, we present a web-based visualization tool, Adversarial-Playground, to demonstrate the efficacy of common adversarial methods against a convolutional neural network (CNN) system. Adversarial-Playground is educational, modular and interactive. (1) It enables non-experts to compare examples visually and to understand why an adversarial example can fool a CNN-based image classifier. (2) It can help security experts explore more vulnerability of deep learning as a software module. (3) Building an interactive visualization is challenging in this domain due to the large feature space of image classification (generating adversarial examples is slow in general and visualizing images are costly). Through multiple novel design choices, our tool can provide fast and accurate responses to user requests. Empirically, we find that our client-server division strategy reduced the response time by an average of 1.5 seconds per sample. Our other innovation, a faster variant of JSMA evasion algorithm, empirically performed twice as fast as JSMA and yet maintains a comparable evasion rate. Project source code and data from our experiments available at: https://github.com/QData/AdversarialDNN-PlaygroundComment: 5 pages. {I.2.6}{Artificial Intelligence} ; {K.6.5}{Management of Computing and Information Systems}{Security and Protection}. arXiv admin note: substantial text overlap with arXiv:1706.0176

arXiv.org e-Print Archive

Crossref