Search CORE

10 research outputs found

Differential Privacy Guarantees for Stochastic Gradient Langevin Dynamics

Author: Bach Francis
Pointcheval David
Ryffel Théo
Publication venue: HAL CCSD
Publication date: 05/02/2022
Field of study

We analyse the privacy leakage of noisy stochastic gradient descent by modeling Rényi divergence dynamics with Langevin diffusions. Inspired by recent work on non-stochastic algorithms, we derive similar desirable properties in the stochastic setting. In particular, we prove that the privacy loss converges exponentially fast for smooth and strongly convex objectives under constant step size, which is a significant improvement over previous DP-SGD analyses. We also extend our analysis to arbitrary sequences of varying step sizes and derive new utility bounds. Last, we propose an implementation and our experiments show the practical utility of our approach compared to classical DP-SGD libraries

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

ARIANN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing

Author: Bach Francis
Pointcheval David
Ryffel Théo
Publication venue: HAL CCSD
Publication date: 10/07/2020
Field of study

We propose ARIANN, a low-interaction framework to perform private training and inference of standard deep neural networks on sensitive data. This framework implements semi-honest 2-party computation and leverages function secret sharing, a recent cryptographic protocol that only uses lightweight primitives to achieve an efficient online phase with a single message of the size of the inputs, for operations like comparison and multiplication which are building blocks of neural networks. Built on top of PyTorch, it offers a wide range of functions including ReLU, MaxPool and BatchNorm, and allows to use models like AlexNet or ResNet18. We report experimental results for inference and training over distant servers. Last, we propose an extension to support n-party private federated learning

INRIA a CCSD electronic archive server

HAL-Rennes 1

Privacy-preserving medical image analysis

Author: Braren Rickmer
Da Lima Ionésio
Kaissis Georgios
Makowski Marcus
Mancuso Jason
Passerat-Palmbach Jonathan
Rueckert Daniel
Ryffel Théo
Trask Andrew
Usynin Dmitrii
Ziller Alexander
Publication venue: HAL CCSD
Publication date: 10/12/2020
Field of study

International audienc

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Cryptographie pour l'apprentissage automatique respectueux de la vie privée

Author: Ryffel Théo
Publication venue: HAL CCSD
Publication date: 23/06/2022
Field of study

The ever growing use of machine learning (ML), motivated by the possibilities it brings to a large number of sectors, is increasingly raising questions because of the sensitive nature of the data that must be used and the lack of transparency on the way these data are collected, combined or shared. Therefore, a number of methods are being developed to reduce its impact on our privacy and make its use more acceptable, especially in areas such as healthcare where its potential is still largely under-exploited.This thesis explores different methods from the fields of cryptography and security, and applies them to machine learning in order to establish new confidentiality guarantees for the data used and the ML models.Our first contribution is the development of a technical foundation to facilitate experimentation of new approaches, through an open-source library named PySyft. We propose a modular architecture that allows one to pick the confidentiality blocks necessary for one’s study, or to develop and easily integrate new blocks. This library is reused in all the implementations proposed in this thesis.Our second contribution consists in highlighting the vulnerability of ML models by proposing an attack that exploits a trained model to reveal confidential attributes of an individual. This attack could, for example, subvert a model that recognizes a person’s sport from an image, to detect the person’s racial origins. We propose solutions to limit the impact of this attack.In a third step, we focus on some cryptographic protocols that allow us to perform computations on encrypted data. A first study proposes a functional encryption protocol that allows to make predictions using a small ML model over encrypted data and to only make the predictions public. A second study focuses on optimizing a functional secret sharing protocol, which allows an ML model to be trained or evaluated on data privately, i.e. without revealing either the model or the data to anyone. This protocol provides sufficient performance to use models that have practical utility in non-trivial tasks such as pathology detection in lung X-rays.Our final contribution is in differential privacy, a technique that limits the vulnerability of ML models and thus the exposure of the data used in training by introducing a controlled perturbation. We propose a new protocol and show that it offers the possibility to train a smooth and strongly convex model with a bounded privacy loss regardless of the number of calls to sensitive data during training.L’usage sans précédent du machine learning (ML) ou apprentissage automatique, motivé parles possibilités qu’il apporte dans un grand nombre de secteurs, interroge de plus en plus enraison du caractère sensible des données qui doivent être utilisées et du manque de transparencesur la façon dont ces données sont collectées, croisées ou partagées. Aussi, un certain nombrede méthodes se développent pour réduire son intrusivité sur notre vie privée, afin d’en rendreson usage plus acceptable notamment dans des domaines tels que la santé, où son potentiel estencore très largement sous-exploité.Cette thèse explore différentes méthodes issues de la cryptographie ou plus largement dumonde de la sécurité et les applique au machine learning afin d’établir des garanties de confidentialité nouvelles pour les données utilisées et les modèles de ML.Notre première contribution est le développement d’un socle technique pour implémenteret expérimenter de nouvelles approches au travers d’une librairie open-source nommée PySyft.Nous proposons une architecture modulaire qui permet à chacun et chacune d’utiliser les briquesde confidentialité nécessaires selon son contexte d’étude, ou encore de développer et d’interfacerde nouvelles briques. Ce socle sert de base à l’ensemble des implémentations proposées danscette thèse.Notre seconde contribution consiste à mettre en lumière la vulnérabilité des modèles de MLen proposant une attaque qui exploite un modèle entraîné et permet de révéler des attributsconfidentiels d’un individu. Cette attaque pourrait par exemple détourner un modèle qui reconnaît le sport fait par une personne à partir d’une image, pour détecter les origines raciales decette personne. Nous proposons des pistes pour limiter l’impact de cette attaque.Dans un troisième temps, nous nous intéressons à certains protocoles de cryptographie quipermettent de faire des calculs sur des données chiffrées. Une première étude propose un protocole de chiffrement fonctionnel qui permet de réaliser des prédictions grâce à un petit modèlede ML à partir de données chiffrées et de ne rendre public que la prédiction. Une seconde étudeporte sur l’optimisation d’un protocole de partage de secret fonctionnel, qui permet d’entraînerou d’évaluer un modèle de ML sur des données de façon privée, c’est à dire sans révéler àquiconque ni le modèle ni les données. Ce protocole offre des performances suffisantes pourutiliser des modèles qui ont une utilité pratique dans des tâches non triviales comme la détection de pathologies dans les radiographies de poumons.Dans un dernier temps, nous intéressons à la confidentialité différentielle qui permet delimiter la vulnérabilité des modèles de ML et donc l’exposition des données qui sont utiliséeslors de l’entraînement, en introduisant une perturbation contrôlée. Nous proposons un protocoleet démontrons qu’il offre notamment la possibilité d’entraîner un modèle lisse et fortementconvexe en garantissant un niveau de confidentialité indépendant du nombre d’accès aux donnéessensibles lors de l’entraînement

Thèses en Ligne

INRIA a CCSD electronic archive server

Annotation d'entités cliniques en utilisant les Larges Modèles de Langue

Author: De La Clergerie Eric
Meoni Simon
Ryffel Théo
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 05/06/2023
Field of study

International audienceDans le domaine clinique et dans d'autres domaines spécialisés, les données sont rares du fait de leur caractère confidentiel. Ce manque de données est un problème majeur lors du fine-tuning de modèles de langue.Par ailleurs, les modèles de langue de très grande taille (LLM) ont des performances prometteuses dans le domaine médical. Néanmoins, ils ne peuvent pas être utilisés directement dans les infrastructures des établissements de santé pour des raisons de confidentialité des données. Nous explorons une approche d'annotation des données d'entraînement avec des LLMs pour entraîner des modèles de moins grandes tailles mieux adaptés à notre problématique. Cette méthode donne des résultats prometteurs pour des tâches d'extraction d'informatio

INRIA a CCSD electronic archive server

Partially Encrypted Machine Learning using Functional Encryption

Author: Bach Francis
Dufour-Sans Edouard
Gay Romain
Pointcheval David
Ryffel Théo
Publication venue: HAL CCSD
Publication date: 22/10/2019
Field of study

International audienceMachine learning on encrypted data has received a lot of attention thanks to recent breakthroughs in homomorphic encryption and secure multi-party computation. It allows outsourcing computation to untrusted servers without sacrificing privacy of sensitive data. We propose a practical framework to perform partially encrypted and privacy-preserving predictions which combines adversarial training and functional encryption. We first present a new functional encryption scheme to efficiently compute quadratic functions so that the data owner controls what can be computed but is not involved in the calculation: it provides a decryption key which allows one to learn a specific function evaluation of some encrypted data. We then show how to use it in machine learning to partially encrypt neural networks with quadratic activation functions at evaluation time, and we provide a thorough analysis of the information leaks based on indistinguishability of data items of the same label. Last, since most encryption schemes cannot deal with the last thresholding operation used for classification, we propose a training method to prevent selected sensitive features from leaking, which adversarially optimizes the network against an adversary trying to identify these features. This is interesting for several existing works using partially encrypted machine learning as it comes with little reduction on the model's accuracy and significantly improves data privacy

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Partially Encrypted Machine Learning using Functional Encryption

Author: Bach Francis
Dufour-Sans Edouard
Gay Romain
Pointcheval David
Ryffel Théo
Publication venue: HAL CCSD
Publication date: 08/12/2019
Field of study

INRIA a CCSD electronic archive server

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Author: Anderljung Markus
Askell Amanda
Avin Shahar
Barnes Elizabeth
Belfield Haydn
Bengio Yoshua
Besiroglu Tamay
Bluemke Emma
Brundage Miles
Cammarota Rosario
Carugati Federica
Clark Jack
Dafoe Allan
de Haas Sarah
Dyer Lisa
Eckersley Peter
Flynn Carrick
Fong Ruth
Gilbert Thomas Krendl
Graham Logan
Hadfield Gillian
Henderson Peter
Herbert-Voss Ariel
Hooker Sara
Ingerman Alex
Johnson Maritza
Kagan Rebecca
Khan Saif
Khlaaf Heidy
Koh Pang Wei
Koren Mark
Krawczuk Igor
Kroeger Frens
Krueger David
Krueger Gretchen
Laurie Ben
Lebensold Jonathan
Leung Jade
Lohn Andrew
Maharaj Tegan
Martin Bianca
O’Keefe Cullen
Prunkl Carina
Rasser Martijn
Rubinovitz JB
Ryffel Théo
Sastry Girish
Scharre Paul
Seger Elizabeth
Sodhani Shagun
Stix Charlotte
Toner Helen
Trask Andrew
Tse Brian
Wang Jasmine
Weller Adrian
Yang Jingying
Zilberman Noa
Ó hÉigeartaigh Seán
Publication venue: 'Center for Open Science'
Publication date: 15/04/2020
Field of study

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms

arXiv.org e-Print Archive

Pure OAI Repository

INRIA a CCSD electronic archive server

Coventry University Pure Portal

HAL-Rennes 1