841 research outputs found

    Secure Computation Protocols for Privacy-Preserving Machine Learning

    Get PDF
    Machine Learning (ML) profitiert erheblich von der Verfügbarkeit großer Mengen an Trainingsdaten, sowohl im Bezug auf die Anzahl an Datenpunkten, als auch auf die Anzahl an Features pro Datenpunkt. Es ist allerdings oft weder möglich, noch gewollt, mehr Daten unter zentraler Kontrolle zu aggregieren. Multi-Party-Computation (MPC)-Protokolle stellen eine Lösung dieses Dilemmas in Aussicht, indem sie es mehreren Parteien erlauben, ML-Modelle auf der Gesamtheit ihrer Daten zu trainieren, ohne die Eingabedaten preiszugeben. Generische MPC-Ansätze bringen allerdings erheblichen Mehraufwand in der Kommunikations- und Laufzeitkomplexität mit sich, wodurch sie sich nur beschränkt für den Einsatz in der Praxis eignen. Das Ziel dieser Arbeit ist es, Privatsphäreerhaltendes Machine Learning mittels MPC praxistauglich zu machen. Zuerst fokussieren wir uns auf zwei Anwendungen, lineare Regression und Klassifikation von Dokumenten. Hier zeigen wir, dass sich der Kommunikations- und Rechenaufwand erheblich reduzieren lässt, indem die aufwändigsten Teile der Berechnung durch Sub-Protokolle ersetzt werden, welche auf die Zusammensetzung der Parteien, die Verteilung der Daten, und die Zahlendarstellung zugeschnitten sind. Insbesondere das Ausnutzen dünnbesetzter Datenrepräsentationen kann die Effizienz der Protokolle deutlich verbessern. Diese Beobachtung verallgemeinern wir anschließend durch die Entwicklung einer Datenstruktur für solch dünnbesetzte Daten, sowie dazugehöriger Zugriffsprotokolle. Aufbauend auf dieser Datenstruktur implementieren wir verschiedene Operationen der Linearen Algebra, welche in einer Vielzahl von Anwendungen genutzt werden. Insgesamt zeigt die vorliegende Arbeit, dass MPC ein vielversprechendes Werkzeug auf dem Weg zu Privatsphäre-erhaltendem Machine Learning ist, und die von uns entwickelten Protokolle stellen einen wesentlichen Schritt in diese Richtung dar.Machine learning (ML) greatly benefits from the availability of large amounts of training data, both in terms of the number of samples, and the number of features per sample. However, aggregating more data under centralized control is not always possible, nor desirable, due to security and privacy concerns, regulation, or competition. Secure multi-party computation (MPC) protocols promise a solution to this dilemma, allowing multiple parties to train ML models on their joint datasets while provably preserving the confidentiality of the inputs. However, generic approaches to MPC result in large computation and communication overheads, which limits the applicability in practice. The goal of this thesis is to make privacy-preserving machine learning with secure computation practical. First, we focus on two high-level applications, linear regression and document classification. We show that communication and computation overhead can be greatly reduced by identifying the costliest parts of the computation, and replacing them with sub-protocols that are tailored to the number and arrangement of parties, the data distribution, and the number representation used. One of our main findings is that exploiting sparsity in the data representation enables considerable efficiency improvements. We go on to generalize this observation, and implement a low-level data structure for sparse data, with corresponding secure access protocols. On top of this data structure, we develop several linear algebra algorithms that can be used in a wide range of applications. Finally, we turn to improving a cryptographic primitive named vector-OLE, for which we propose a novel protocol that helps speed up a wide range of secure computation tasks, within private machine learning and beyond. Overall, our work shows that MPC indeed offers a promising avenue towards practical privacy-preserving machine learning, and the protocols we developed constitute a substantial step in that direction

    Secure Outsourced Computation on Encrypted Data

    Get PDF
    Homomorphic encryption (HE) is a promising cryptographic technique that supports computations on encrypted data without requiring decryption first. This ability allows sensitive data, such as genomic, financial, or location data, to be outsourced for evaluation in a resourceful third-party such as the cloud without compromising data privacy. Basic homomorphic primitives support addition and multiplication on ciphertexts. These primitives can be utilized to represent essential computations, such as logic gates, which subsequently can support more complex functions. We propose the construction of efficient cryptographic protocols as building blocks (e.g., equality, comparison, and counting) that are commonly used in data analytics and machine learning. We explore the use of these building blocks in two privacy-preserving applications. One application leverages our secure prefix matching algorithm, which builds on top of the equality operation, to process geospatial queries on encrypted locations. The other applies our secure comparison protocol to perform conditional branching in private evaluation of decision trees. There are many outsourced computations that require joint evaluation on private data owned by multiple parties. For example, Genome-Wide Association Study (GWAS) is becoming feasible because of the recent advances of genome sequencing technology. Due to the sensitivity of genomic data, this data is encrypted using different keys possessed by different data owners. Computing on ciphertexts encrypted with multiple keys is a non-trivial task. Current solutions often require a joint key setup before any computation such as in threshold HE or incur large ciphertext size (at best, grows linearly in the number of involved keys) such as in multi-key HE. We propose a hybrid approach that combines the advantages of threshold and multi-key HE to support computations on ciphertexts encrypted with different keys while vastly reducing ciphertext size. Moreover, we propose the SparkFHE framework to support large-scale secure data analytics in the Cloud. SparkFHE integrates Apache Spark with Fully HE to support secure distributed data analytics and machine learning and make two novel contributions: (1) enabling Spark to perform efficient computation on large datasets while preserving user privacy, and (2) accelerating intensive homomorphic computation through parallelization of tasks across clusters of computing nodes. To our best knowledge, SparkFHE is the first addressing these two needs simultaneously

    VASA: Vector AES Instructions for Security Applications

    Get PDF
    Due to standardization, AES is today’s most widely used block cipher. Its security is well-studied and hardware acceleration is available on a variety of platforms. Following the success of the Intel AES New Instructions (AES-NI), support for Vectorized AES (VAES) has been added in 2018 and already shown to be useful to accelerate many implementations of AES-based algorithms where the order of AES evaluations is fixed a priori. In our work, we focus on using VAES to accelerate the computation in secure multi-party computation protocols and applications. For some MPC building blocks, such as OT extension, the AES operations are independent and known a priori and hence can be easily parallelized, similar to the original paper on VAES by Drucker et al. (ITNG’19). We evaluate the performance impact of using VAES in the AES-CTR implementations used in Microsoft CrypTFlow2, and the EMP-OT library which we accelerate by up to 24%. The more complex case that we study for the first time in our paper are dependent AES calls that are not fixed yet in advance and hence cannot be parallelized manually. This is the case for garbling schemes. To get optimal efficiency from the hardware, enough independent calls need to be combined for each batch of AES executions. We identify such batches using a deferred execution technique paired with early execution to reduce non-locality issues and more static techniques using circuit depth and explicit gate independence. We present a performance and a modularity focused technique to compute the AES operations efficiently while also immediately using the results and preparing the inputs. Using these techniques, we achieve a performance improvement via VAES of up to 244% for the ABY framework and of up to 28% for the EMP-AGMPC framework. By implementing several garbling schemes from the literature using VAES acceleration, we obtain a 171% better performance for ABY

    Batched differentially private information retrieval

    Full text link
    Private Information Retrieval (PIR) allows several clients to query a database held by one or more servers, such that the contents of their queries remain private. Prior PIR schemes have achieved sublinear communication and computation by leveraging computational assumptions, federating trust among many servers, relaxing security to permit differentially private leakage, refactoring effort into an offline stage to reduce online costs, or amortizing costs over a large batch of queries. In this work, we present an efficient PIR protocol that combines all of the above techniques to achieve constant amortized communication and computation complexity in the size of the database and constant client work. We leverage differentially private leakage in order to provide better trade-offs between privacy and efficiency. Our protocol achieves speed-ups up to and exceeding 10x in practical settings compared to state of the art PIR protocols, and can scale to batches with hundreds of millions of queries on cheap commodity AWS machines. Our protocol builds upon a new secret sharing scheme that is both incremental and non-malleable, which may be of interest to a wider audience. Our protocol provides security up to abort against malicious adversaries that can corrupt all but one party.1414119 - National Science Foundation; CNS-1718135 - National Science Foundation; CNS-1931714 - National Science Foundation; HR00112020021 - Department of Defense/DARPA; 000000000000000000000000000000000000000000000000000000037211 - SRI Internationalhttps://www.usenix.org/system/files/sec22-albab.pdfPublished versio

    Towards Improved Homomorphic Encryption for Privacy-Preserving Deep Learning

    Get PDF
    Mención Internacional en el título de doctorDeep Learning (DL) has supposed a remarkable transformation for many fields, heralded by some as a new technological revolution. The advent of large scale models has increased the demands for data and computing platforms, for which cloud computing has become the go-to solution. However, the permeability of DL and cloud computing are reduced in privacy-enforcing areas that deal with sensitive data. These areas imperatively call for privacy-enhancing technologies that enable responsible, ethical, and privacy-compliant use of data in potentially hostile environments. To this end, the cryptography community has addressed these concerns with what is known as Privacy-Preserving Computation Techniques (PPCTs), a set of tools that enable privacy-enhancing protocols where cleartext access to information is no longer tenable. Of these techniques, Homomorphic Encryption (HE) stands out for its ability to perform operations over encrypted data without compromising data confidentiality or privacy. However, despite its promise, HE is still a relatively nascent solution with efficiency and usability limitations. Improving the efficiency of HE has been a longstanding challenge in the field of cryptography, and with improvements, the complexity of the techniques has increased, especially for non-experts. In this thesis, we address the problem of the complexity of HE when applied to DL. We begin by systematizing existing knowledge in the field through an in-depth analysis of state-of-the-art for privacy-preserving deep learning, identifying key trends, research gaps, and issues associated with current approaches. One such identified gap lies in the necessity for using vectorized algorithms with Packed Homomorphic Encryption (PaHE), a state-of-the-art technique to reduce the overhead of HE in complex areas. This thesis comprehensively analyzes existing algorithms and proposes new ones for using DL with PaHE, presenting a formal analysis and usage guidelines for their implementation. Parameter selection of HE schemes is another recurring challenge in the literature, given that it plays a critical role in determining not only the security of the instantiation but also the precision, performance, and degree of security of the scheme. To address this challenge, this thesis proposes a novel system combining fuzzy logic with linear programming tasks to produce secure parametrizations based on high-level user input arguments without requiring low-level knowledge of the underlying primitives. Finally, this thesis describes HEFactory, a symbolic execution compiler designed to streamline the process of producing HE code and integrating it with Python. HEFactory implements the previous proposals presented in this thesis in an easy-to-use tool. It provides a unique architecture that layers the challenges associated with HE and produces simplified operations interpretable by low-level HE libraries. HEFactory significantly reduces the overall complexity to code DL applications using HE, resulting in an 80% length reduction from expert-written code while maintaining equivalent accuracy and efficiency.El aprendizaje profundo ha supuesto una notable transformación para muchos campos que algunos han calificado como una nueva revolución tecnológica. La aparición de modelos masivos ha aumentado la demanda de datos y plataformas informáticas, para lo cual, la computación en la nube se ha convertido en la solución a la que recurrir. Sin embargo, la permeabilidad del aprendizaje profundo y la computación en la nube se reduce en los ámbitos de la privacidad que manejan con datos sensibles. Estas áreas exigen imperativamente el uso de tecnologías de mejora de la privacidad que permitan un uso responsable, ético y respetuoso con la privacidad de los datos en entornos potencialmente hostiles. Con este fin, la comunidad criptográfica ha abordado estas preocupaciones con las denominadas técnicas de la preservación de la privacidad en el cómputo, un conjunto de herramientas que permiten protocolos de mejora de la privacidad donde el acceso a la información en texto claro ya no es sostenible. Entre estas técnicas, el cifrado homomórfico destaca por su capacidad para realizar operaciones sobre datos cifrados sin comprometer la confidencialidad o privacidad de la información. Sin embargo, a pesar de lo prometedor de esta técnica, sigue siendo una solución relativamente incipiente con limitaciones de eficiencia y usabilidad. La mejora de la eficiencia del cifrado homomórfico en la criptografía ha sido todo un reto, y, con las mejoras, la complejidad de las técnicas ha aumentado, especialmente para los usuarios no expertos. En esta tesis, abordamos el problema de la complejidad del cifrado homomórfico cuando se aplica al aprendizaje profundo. Comenzamos sistematizando el conocimiento existente en el campo a través de un análisis exhaustivo del estado del arte para el aprendizaje profundo que preserva la privacidad, identificando las tendencias clave, las lagunas de investigación y los problemas asociados con los enfoques actuales. Una de las lagunas identificadas radica en el uso de algoritmos vectorizados con cifrado homomórfico empaquetado, que es una técnica del estado del arte que reduce el coste del cifrado homomórfico en áreas complejas. Esta tesis analiza exhaustivamente los algoritmos existentes y propone nuevos algoritmos para el uso de aprendizaje profundo utilizando cifrado homomórfico empaquetado, presentando un análisis formal y unas pautas de uso para su implementación. La selección de parámetros de los esquemas del cifrado homomórfico es otro reto recurrente en la literatura, dado que juega un papel crítico a la hora de determinar no sólo la seguridad de la instanciación, sino también la precisión, el rendimiento y el grado de seguridad del esquema. Para abordar este reto, esta tesis propone un sistema innovador que combina la lógica difusa con tareas de programación lineal para producir parametrizaciones seguras basadas en argumentos de entrada de alto nivel sin requerir conocimientos de bajo nivel de las primitivas subyacentes. Por último, esta tesis propone HEFactory, un compilador de ejecución simbólica diseñado para agilizar el proceso de producción de código de cifrado homomórfico e integrarlo con Python. HEFactory es la culminación de las propuestas presentadas en esta tesis, proporcionando una arquitectura única que estratifica los retos asociados con el cifrado homomórfico, produciendo operaciones simplificadas que pueden ser interpretadas por bibliotecas de bajo nivel. Este enfoque permite a HEFactory reducir significativamente la longitud total del código, lo que supone una reducción del 80% en la complejidad de programación de aplicaciones de aprendizaje profundo que usan cifrado homomórfico en comparación con el código escrito por expertos, manteniendo una precisión equivalente.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidenta: María Isabel González Vasco.- Secretario: David Arroyo Guardeño.- Vocal: Antonis Michala
    corecore