794 research outputs found
Risk mitigation in algorithmic accountability: The role of machine learning copies
Machine learning plays an increasingly important role in our society and economy and is already having an impact on our daily life in many different ways. From several perspectives, machine learning is seen as the new engine of productivity and economic growth. It can increase the business efficiency and improve any decision-making process, and of course, spawn the creation of new products and services by using complex machine learning algorithms. In this scenario, the lack of actionable accountability-related guidance is potentially the single most important challenge facing the machine learning community. Machine learning systems are often composed of many parts and ingredients, mixing third party components or software-as-a-service APIs, among others. In this paper we study the role of copies for risk mitigation in such machine learning systems. Formally, a copy can be regarded as an approximated projection operator of a model into a target model hypothesis set. Under the conceptual framework of actionable accountability, we explore the use of copies as a viable alternative in circumstances where models cannot be re-trained, nor enhanced by means of a wrapper. We use a real residential mortgage default dataset as a use case to illustrate the feasibility of this approach
Towards Improved Homomorphic Encryption for Privacy-Preserving Deep Learning
Mención Internacional en el título de doctorDeep Learning (DL) has supposed a remarkable transformation for many fields, heralded
by some as a new technological revolution. The advent of large scale models has increased
the demands for data and computing platforms, for which cloud computing has become
the go-to solution. However, the permeability of DL and cloud computing are reduced
in privacy-enforcing areas that deal with sensitive data. These areas imperatively call for
privacy-enhancing technologies that enable responsible, ethical, and privacy-compliant
use of data in potentially hostile environments.
To this end, the cryptography community has addressed these concerns with what
is known as Privacy-Preserving Computation Techniques (PPCTs), a set of tools that
enable privacy-enhancing protocols where cleartext access to information is no longer
tenable. Of these techniques, Homomorphic Encryption (HE) stands out for its ability
to perform operations over encrypted data without compromising data confidentiality or
privacy. However, despite its promise, HE is still a relatively nascent solution with efficiency
and usability limitations. Improving the efficiency of HE has been a longstanding
challenge in the field of cryptography, and with improvements, the complexity of the
techniques has increased, especially for non-experts.
In this thesis, we address the problem of the complexity of HE when applied to DL.
We begin by systematizing existing knowledge in the field through an in-depth analysis
of state-of-the-art for privacy-preserving deep learning, identifying key trends, research
gaps, and issues associated with current approaches. One such identified gap lies in the
necessity for using vectorized algorithms with Packed Homomorphic Encryption (PaHE),
a state-of-the-art technique to reduce the overhead of HE in complex areas. This thesis
comprehensively analyzes existing algorithms and proposes new ones for using DL with
PaHE, presenting a formal analysis and usage guidelines for their implementation.
Parameter selection of HE schemes is another recurring challenge in the literature,
given that it plays a critical role in determining not only the security of the instantiation
but also the precision, performance, and degree of security of the scheme. To address
this challenge, this thesis proposes a novel system combining fuzzy logic with linear
programming tasks to produce secure parametrizations based on high-level user input
arguments without requiring low-level knowledge of the underlying primitives.
Finally, this thesis describes HEFactory, a symbolic execution compiler designed to
streamline the process of producing HE code and integrating it with Python. HEFactory
implements the previous proposals presented in this thesis in an easy-to-use tool. It provides
a unique architecture that layers the challenges associated with HE and produces
simplified operations interpretable by low-level HE libraries. HEFactory significantly reduces
the overall complexity to code DL applications using HE, resulting in an 80% length
reduction from expert-written code while maintaining equivalent accuracy and efficiency.El aprendizaje profundo ha supuesto una notable transformación para muchos campos
que algunos han calificado como una nueva revolución tecnológica. La aparición de modelos
masivos ha aumentado la demanda de datos y plataformas informáticas, para lo cual,
la computación en la nube se ha convertido en la solución a la que recurrir. Sin embargo,
la permeabilidad del aprendizaje profundo y la computación en la nube se reduce en los
ámbitos de la privacidad que manejan con datos sensibles. Estas áreas exigen imperativamente
el uso de tecnologías de mejora de la privacidad que permitan un uso responsable,
ético y respetuoso con la privacidad de los datos en entornos potencialmente hostiles.
Con este fin, la comunidad criptográfica ha abordado estas preocupaciones con las
denominadas técnicas de la preservación de la privacidad en el cómputo, un conjunto de
herramientas que permiten protocolos de mejora de la privacidad donde el acceso a la información
en texto claro ya no es sostenible. Entre estas técnicas, el cifrado homomórfico
destaca por su capacidad para realizar operaciones sobre datos cifrados sin comprometer
la confidencialidad o privacidad de la información. Sin embargo, a pesar de lo prometedor
de esta técnica, sigue siendo una solución relativamente incipiente con limitaciones
de eficiencia y usabilidad. La mejora de la eficiencia del cifrado homomórfico en la
criptografía ha sido todo un reto, y, con las mejoras, la complejidad de las técnicas ha
aumentado, especialmente para los usuarios no expertos.
En esta tesis, abordamos el problema de la complejidad del cifrado homomórfico
cuando se aplica al aprendizaje profundo. Comenzamos sistematizando el conocimiento
existente en el campo a través de un análisis exhaustivo del estado del arte para el aprendizaje
profundo que preserva la privacidad, identificando las tendencias clave, las lagunas
de investigación y los problemas asociados con los enfoques actuales. Una de las
lagunas identificadas radica en el uso de algoritmos vectorizados con cifrado homomórfico
empaquetado, que es una técnica del estado del arte que reduce el coste del cifrado
homomórfico en áreas complejas. Esta tesis analiza exhaustivamente los algoritmos existentes
y propone nuevos algoritmos para el uso de aprendizaje profundo utilizando cifrado
homomórfico empaquetado, presentando un análisis formal y unas pautas de uso para su
implementación.
La selección de parámetros de los esquemas del cifrado homomórfico es otro reto recurrente
en la literatura, dado que juega un papel crítico a la hora de determinar no sólo la
seguridad de la instanciación, sino también la precisión, el rendimiento y el grado de seguridad del esquema. Para abordar este reto, esta tesis propone un sistema innovador que
combina la lógica difusa con tareas de programación lineal para producir parametrizaciones
seguras basadas en argumentos de entrada de alto nivel sin requerir conocimientos
de bajo nivel de las primitivas subyacentes.
Por último, esta tesis propone HEFactory, un compilador de ejecución simbólica diseñado
para agilizar el proceso de producción de código de cifrado homomórfico e integrarlo
con Python. HEFactory es la culminación de las propuestas presentadas en esta
tesis, proporcionando una arquitectura única que estratifica los retos asociados con el
cifrado homomórfico, produciendo operaciones simplificadas que pueden ser interpretadas
por bibliotecas de bajo nivel. Este enfoque permite a HEFactory reducir significativamente
la longitud total del código, lo que supone una reducción del 80% en la
complejidad de programación de aplicaciones de aprendizaje profundo que usan cifrado
homomórfico en comparación con el código escrito por expertos, manteniendo una precisión
equivalente.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidenta: María Isabel González Vasco.- Secretario: David Arroyo Guardeño.- Vocal: Antonis Michala
Recommended from our members
Towards Interpretability and Robustness of Machine Learning Models
Modern machine learning models can be difficult to probe and understand after they have been trained. This is a major problem for the field, with consequences for trustworthiness, diagnostics, debugging, robustness, and a range of other engineering and human interaction issues surrounding the deployment of a model. Another problem of modern machine learning models is their vulnerability to small adversarial perturbations to the input, which incurs a security risk when they are applied to critical areas.In this thesis, we develop systematic and efficient tools for interpreting machine learning models and evaluating their adversarial robustness. Part I focuses on model interpretation. We derive an efficient feature scoring method by exploiting the graph structure in data. We also develop a learning-based method under an information-based framework. As an attempt to leverage prior knowledge about what constitutes a satisfying interpretation in a given domain, we propose a systematic approach to exploiting syntactic constituency structure by leveraging a parse tree for interpretation of models in the setting of linguistic data. Part II focuses on the evaluation of adversarial robustness. We first propose a probabilistic framework for generating adversarial examples on discrete data, and develop two algorithms to implement it. We also introduce a novel attack method in the setting where the attacker has access to model decisions alone. We investigate the robustness of various machine learning models and existing defense mechanisms under the proposed attack method. In Part III, we build a connection between the two fields by developing a method for detecting adversarial examples via tools in model interpretation
Novelty, distillation, and federation in machine learning for medical imaging
The practical application of deep learning methods in the medical domain
has many challenges. Pathologies are diverse and very few examples may
be available for rare cases. Where data is collected it may lie in multiple
institutions and cannot be pooled for practical and ethical reasons. Deep
learning is powerful for image segmentation problems but ultimately its output
must be interpretable at the patient level. Although clearly not an exhaustive
list, these are the three problems tackled in this thesis.
To address the rarity of pathology I investigate novelty detection algorithms
to find outliers from normal anatomy. The problem is structured as first finding
a low-dimension embedding and then detecting outliers in that embedding
space. I evaluate for speed and accuracy several unsupervised embedding and
outlier detection methods. Data consist of Magnetic Resonance Imaging (MRI)
for interstitial lung disease for which healthy and pathological patches are
available; only the healthy patches are used in model training.
I then explore the clinical interpretability of a model output. I take related
work by the Canon team — a model providing voxel-level detection of acute
ischemic stroke signs — and deliver the Alberta Stroke Programme Early CT
Score (ASPECTS, a measure of stroke severity). The data are acute head
computed tomography volumes of suspected stroke patients. I convert from
the voxel level to the brain region level and then to the patient level through a
series of rules. Due to the real world clinical complexity of the problem, there
are at each level — voxel, region and patient — multiple sources of “truth”; I
evaluate my results appropriately against these truths.
Finally, federated learning is used to train a model on data that are divided
between multiple institutions. I introduce a novel evolution of this algorithm
— dubbed “soft federated learning” — that avoids the central coordinating
authority, and takes into account domain shift (covariate shift) and dataset
size. I first demonstrate the key properties of these two algorithms on a series
of MNIST (handwritten digits) toy problems. Then I apply the methods to the
BraTS medical dataset, which contains MRI brain glioma scans from multiple
institutions, to compare these algorithms in a realistic setting
Property Inference Attacks on Convolutional Neural Networks:Influence and Implications of Target Model's Complexity
Machine learning models' goal is to make correct predictions for specific tasks by learning important properties and patterns from data. By doing so, there is a chance that the model learns properties that are unrelated to its primary task. Property Inference Attacks exploit this and aim to infer from a given model (i.e., the target model) properties about the training dataset seemingly unrelated to the model's primary goal. If the training data is sensitive, such an attack could lead to privacy leakage. This paper investigates the influence of the target model's complexity on the accuracy of this type of attack, focusing on convolutional neural network classifiers. We perform attacks on models that are trained on facial images to predict whether someone's mouth is open. Our attacks' goal is to infer whether the training dataset is balanced gender-wise. Our findings reveal that the risk of a privacy breach is present independently of the target model's complexity: for all studied architectures, the attack's accuracy is clearly over the baseline. We discuss the implication of the property inference on personal data in the light of Data Protection Regulations and Guidelines
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Vers une arithmétique efficace pour le chiffrement homomorphe basé sur le Ring-LWE
Fully homomorphic encryption is a kind of encryption offering the ability to manipulate encrypted data directly through their ciphertexts. In this way it is possible to process sensitive data without having to decrypt them beforehand, ensuring therefore the datas' confidentiality. At the numeric and cloud computing era this kind of encryption has the potential to considerably enhance privacy protection. However, because of its recent discovery by Gentry in 2009, we do not have enough hindsight about it yet. Therefore several uncertainties remain, in particular concerning its security and efficiency in practice, and should be clarified before an eventual widespread use. This thesis deals with this issue and focus on performance enhancement of this kind of encryption in practice. In this perspective we have been interested in the optimization of the arithmetic used by these schemes, either the arithmetic underlying the Ring Learning With Errors problem on which the security of these schemes is based on, or the arithmetic specific to the computations required by the procedures of some of these schemes. We have also considered the optimization of the computations required by some specific applications of homomorphic encryption, and in particular for the classification of private data, and we propose methods and innovative technics in order to perform these computations efficiently. We illustrate the efficiency of our different methods through different software implementations and comparisons to the related art.Le chiffrement totalement homomorphe est un type de chiffrement qui permet de manipuler directement des données chiffrées. De cette manière, il est possible de traiter des données sensibles sans avoir à les déchiffrer au préalable, permettant ainsi de préserver la confidentialité des données traitées. À l'époque du numérique à outrance et du "cloud computing" ce genre de chiffrement a le potentiel pour impacter considérablement la protection de la vie privée. Cependant, du fait de sa découverte récente par Gentry en 2009, nous manquons encore de recul à son propos. C'est pourquoi de nombreuses incertitudes demeurent, notamment concernant sa sécurité et son efficacité en pratique, et devront être éclaircies avant une éventuelle utilisation à large échelle.Cette thèse s'inscrit dans cette problématique et se concentre sur l'amélioration des performances de ce genre de chiffrement en pratique. Pour cela nous nous sommes intéressés à l'optimisation de l'arithmétique utilisée par ces schémas, qu'elle soit sous-jacente au problème du "Ring-Learning With Errors" sur lequel la sécurité des schémas considérés est basée, ou bien spécifique aux procédures de calculs requises par certains de ces schémas. Nous considérons également l'optimisation des calculs nécessaires à certaines applications possibles du chiffrement homomorphe, et en particulier la classification de données privées, de sorte à proposer des techniques de calculs innovantes ainsi que des méthodes pour effectuer ces calculs de manière efficace. L'efficacité de nos différentes méthodes est illustrée à travers des implémentations logicielles et des comparaisons aux techniques de l'état de l'art
- …