154 research outputs found

    Hidden in the Cloud : Advanced Cryptographic Techniques for Untrusted Cloud Environments

    Get PDF
    In the contemporary digital age, the ability to search and perform operations on encrypted data has become increasingly important. This significance is primarily due to the exponential growth of data, often referred to as the "new oil," and the corresponding rise in data privacy concerns. As more and more data is stored in the cloud, the need for robust security measures to protect this data from unauthorized access and misuse has become paramount. One of the key challenges in this context is the ability to perform meaningful operations on the data while it remains encrypted. Traditional encryption techniques, while providing a high level of security, render the data unusable for any practical purpose other than storage. This is where advanced cryptographic protocols like Symmetric Searchable Encryption (SSE), Functional Encryption (FE), Homomorphic Encryption (HE), and Hybrid Homomorphic Encryption (HHE) come into play. These protocols not only ensure the confidentiality of data but also allow computations on encrypted data, thereby offering a higher level of security and privacy. The ability to search and perform operations on encrypted data has several practical implications. For instance, it enables efficient Boolean queries on encrypted databases, which is crucial for many "big data" applications. It also allows for the execution of phrase searches, which are important for many machine learning applications, such as intelligent medical data analytics. Moreover, these capabilities are particularly relevant in the context of sensitive data, such as health records or financial information, where the privacy and security of user data are of utmost importance. Furthermore, these capabilities can help build trust in digital systems. Trust is a critical factor in the adoption and use of digital services. By ensuring the confidentiality, integrity, and availability of data, these protocols can help build user trust in cloud services. This trust, in turn, can drive the wider adoption of digital services, leading to a more inclusive digital society. However, it is important to note that while these capabilities offer significant advantages, they also present certain challenges. For instance, the computational overhead of these protocols can be substantial, making them less suitable for scenarios where efficiency is a critical requirement. Moreover, these protocols often require sophisticated key management mechanisms, which can be challenging to implement in practice. Therefore, there is a need for ongoing research to address these challenges and make these protocols more efficient and practical for real-world applications. The research publications included in this thesis offer a deep dive into the intricacies and advancements in the realm of cryptographic protocols, particularly in the context of the challenges and needs highlighted above. Publication I presents a novel approach to hybrid encryption, combining the strengths of ABE and SSE. This fusion aims to overcome the inherent limitations of both techniques, offering a more secure and efficient solution for key sharing and access control in cloud-based systems. Publication II further expands on SSE, showcasing a dynamic scheme that emphasizes forward and backward privacy, crucial for ensuring data integrity and confidentiality. Publication III and Publication IV delve into the potential of MIFE, demonstrating its applicability in real-world scenarios, such as designing encrypted private databases and additive reputation systems. These publications highlight the transformative potential of MIFE in bridging the gap between theoretical cryptographic concepts and practical applications. Lastly, Publication V underscores the significance of HE and HHE as a foundational element for secure protocols, emphasizing its potential in devices with limited computational capabilities. In essence, these publications not only validate the importance of searching and performing operations on encrypted data but also provide innovative solutions to the challenges mentioned. They collectively underscore the transformative potential of advanced cryptographic protocols in enhancing data security and privacy, paving the way for a more secure digital future

    Privacy-preserving artificial intelligence in healthcare: Techniques and applications

    Get PDF
    There has been an increasing interest in translating artificial intelligence (AI) research into clinically-validated applications to improve the performance, capacity, and efficacy of healthcare services. Despite substantial research worldwide, very few AI-based applications have successfully made it to clinics. Key barriers to the widespread adoption of clinically validated AI applications include non-standardized medical records, limited availability of curated datasets, and stringent legal/ethical requirements to preserve patients' privacy. Therefore, there is a pressing need to improvise new data-sharing methods in the age of AI that preserve patient privacy while developing AI-based healthcare applications. In the literature, significant attention has been devoted to developing privacy-preserving techniques and overcoming the issues hampering AI adoption in an actual clinical environment. To this end, this study summarizes the state-of-the-art approaches for preserving privacy in AI-based healthcare applications. Prominent privacy-preserving techniques such as Federated Learning and Hybrid Techniques are elaborated along with potential privacy attacks, security challenges, and future directions. [Abstract copyright: Copyright © 2023 The Author(s). Published by Elsevier Ltd.. All rights reserved.

    Delegated Private Matching for Compute

    Get PDF
    Private matching for compute (PMC) establishes a match between two datasets owned by mutually distrusted parties (CC and PP) and allows the parties to input more data for the matched records for arbitrary downstream secure computation without rerunning the private matching component. The state-of-the-art PMC protocols only support two parties and assume that both parties can participate in computationally intensive secure computation. We observe that such operational overhead limits the adoption of these protocols to solely powerful entities as small data owners or devices with minimal computing power will not be able to participate. We introduce two protocols to delegate PMC from party PP to untrusted cloud servers, called delegates, allowing multiple smaller PP parties to provide inputs containing identifiers and associated values. Our Delegated Private Matching for Compute protocols, called DPMC and Ds_sPMC, establish a join between the datasets of party CC and multiple delegators PP based on multiple identifiers and compute secret shares of associated values for the identifiers that the parties have in common. We introduce a rerandomizable encrypted oblivious pseudorandom function (OPRF) primitive, called EO, which allows two parties to encrypt, mask, and shuffle their data. Note that EO may be of independent interest. Our Ds_sPMC protocol limits the leakages of DPMC by combining our EO scheme and secure three-party shuffling. Finally, our implementation demonstrates the efficiency of our constructions by outperforming related works by approximately 10×10\times for the total protocol execution and by at least 20×20\times for the computation on the delegators

    Ibex: Privacy-preserving ad conversion tracking and bidding (full version)

    Get PDF
    This paper introduces Ibex, an advertising system that reduces the amount of data that is collected on users while still allowing advertisers to bid on real-time ad auctions and measure the effectiveness of their ad campaigns. Specifically, Ibex addresses an issue in recent proposals such as Google’s Privacy Sandbox Topics API in which browsers send information about topics that are of interest to a user to advertisers and demand-side platforms (DSPs). DSPs use this information to (1) determine how much to bid on the auction for a user who is interested in particular topics, and (2) measure how well their ad campaign does for a given audience (i.e., measure conversions). While Topics and related proposals reduce the amount of user information that is exposed, they still reveal user preferences. In Ibex, browsers send user information in an encrypted form that still allows DSPs and advertisers to measure conversions, compute aggregate statistics such as histograms about users and their interests, and obliviously bid on auctions without learning for whom they are bidding. Our implementation of Ibex shows that creating histograms is 1.7–2.5× more expensive for browsers than disclosing user information, and Ibex’s oblivious bidding protocol can finish auctions within 550 ms. We think this makes Ibex capable of preserving a good experience while improving user privacy

    General-Purpose Secure Conflict-free Replicated Data Types

    Get PDF
    Conflict-free Replicated Data Types (CRDTs) are a very popular class of distributed data structures that strike a compromise between strong and eventual consistency. Ensuring the protection of data stored within a CRDT, however, cannot be done trivially using standard encryption techniques, as secure CRDT protocols would require replica-side computation. This paper proposes an approach to lift general-purpose implementations of CRDTs to secure variants using secure multiparty computation (MPC). Each replica within the system is realized by a group of MPC parties that compute its functionality. Our results include: i) an extension of current formal models used for reasoning over the security of CRDT solutions to the MPC setting; ii) a MPC language and type system to enable the construction of secure versions of CRDTs and; iii) a proof of security that relates the security of CRDT constructions designed under said semantics to the underlying MPC library. We provide an open-source system implementation with an extensive evaluation, which compares different designs with their baseline throughput and latency

    Harnessing the Power of Distributed Computing: Advancements in Scientific Applications, Homomorphic Encryption, and Federated Learning Security

    Get PDF
    Data explosion poses lot of challenges to the state-of-the art systems, applications, and methodologies. It has been reported that 181 zettabytes of data are expected to be generated in 2025 which is over 150\% increase compared to the data that is expected to be generated in 2023. However, while system manufacturers are consistently developing devices with larger storage spaces and providing alternative storage capacities in the cloud at affordable rates, another key challenge experienced is how to effectively process the fraction of large scale of stored data in time-critical conventional systems. One transformative paradigm revolutionizing the processing and management of these large data is distributed computing whose application requires deep understanding. This dissertation focuses on exploring the potential impact of applying efficient distributed computing concepts to long existing challenges or issues in (i) a widely data-intensive scientific application (ii) applying homomorphic encryption to data intensive workloads found in outsourced databases and (iii) security of tokenized incentive mechanism for Federated learning (FL) systems.The first part of the dissertation tackles the Microelectrode arrays (MEAs) parameterization problem from an orthogonal viewpoint enlightened by algebraic topology, which allows us to algebraically parametrize MEAs whose structure and intrinsic parallelism are hard to identify otherwise. We implement a new paradigm, namely Parma, to demonstrate the effectiveness of the proposed approach and report how it outperforms the state-of-the-practice in time, scalability, and memory usage.The second part discusses our work on introducing the concept of parallel caching of secure aggregation to mitigate the performance overhead incurred by the HE module in outsourced databases. The key idea of this optimization approach is caching selected radix-ciphertexts in parallel without violating existing security guarantees of the primitive/base HE scheme. A new radix HE algorithm was designed and applied to both batch and incremental HE schemes, and experiments carried out on six workloads show that the proposed caching boost state-of-the-art HE schemes by high orders of magnitudes.In the third part, I will discuss our work on leveraging the security benefit of blockchains to enhance or protect the fairness and reliability of tokenized incentive mechanism for FL systems. We designed a blockchain-based auditing protocol to mitigate Gaussian attacks and carried out experiments with multiple FL aggregation algorithms, popular data sets and a variety of scales to validate its effectiveness

    Towards Improved Homomorphic Encryption for Privacy-Preserving Deep Learning

    Get PDF
    Mención Internacional en el título de doctorDeep Learning (DL) has supposed a remarkable transformation for many fields, heralded by some as a new technological revolution. The advent of large scale models has increased the demands for data and computing platforms, for which cloud computing has become the go-to solution. However, the permeability of DL and cloud computing are reduced in privacy-enforcing areas that deal with sensitive data. These areas imperatively call for privacy-enhancing technologies that enable responsible, ethical, and privacy-compliant use of data in potentially hostile environments. To this end, the cryptography community has addressed these concerns with what is known as Privacy-Preserving Computation Techniques (PPCTs), a set of tools that enable privacy-enhancing protocols where cleartext access to information is no longer tenable. Of these techniques, Homomorphic Encryption (HE) stands out for its ability to perform operations over encrypted data without compromising data confidentiality or privacy. However, despite its promise, HE is still a relatively nascent solution with efficiency and usability limitations. Improving the efficiency of HE has been a longstanding challenge in the field of cryptography, and with improvements, the complexity of the techniques has increased, especially for non-experts. In this thesis, we address the problem of the complexity of HE when applied to DL. We begin by systematizing existing knowledge in the field through an in-depth analysis of state-of-the-art for privacy-preserving deep learning, identifying key trends, research gaps, and issues associated with current approaches. One such identified gap lies in the necessity for using vectorized algorithms with Packed Homomorphic Encryption (PaHE), a state-of-the-art technique to reduce the overhead of HE in complex areas. This thesis comprehensively analyzes existing algorithms and proposes new ones for using DL with PaHE, presenting a formal analysis and usage guidelines for their implementation. Parameter selection of HE schemes is another recurring challenge in the literature, given that it plays a critical role in determining not only the security of the instantiation but also the precision, performance, and degree of security of the scheme. To address this challenge, this thesis proposes a novel system combining fuzzy logic with linear programming tasks to produce secure parametrizations based on high-level user input arguments without requiring low-level knowledge of the underlying primitives. Finally, this thesis describes HEFactory, a symbolic execution compiler designed to streamline the process of producing HE code and integrating it with Python. HEFactory implements the previous proposals presented in this thesis in an easy-to-use tool. It provides a unique architecture that layers the challenges associated with HE and produces simplified operations interpretable by low-level HE libraries. HEFactory significantly reduces the overall complexity to code DL applications using HE, resulting in an 80% length reduction from expert-written code while maintaining equivalent accuracy and efficiency.El aprendizaje profundo ha supuesto una notable transformación para muchos campos que algunos han calificado como una nueva revolución tecnológica. La aparición de modelos masivos ha aumentado la demanda de datos y plataformas informáticas, para lo cual, la computación en la nube se ha convertido en la solución a la que recurrir. Sin embargo, la permeabilidad del aprendizaje profundo y la computación en la nube se reduce en los ámbitos de la privacidad que manejan con datos sensibles. Estas áreas exigen imperativamente el uso de tecnologías de mejora de la privacidad que permitan un uso responsable, ético y respetuoso con la privacidad de los datos en entornos potencialmente hostiles. Con este fin, la comunidad criptográfica ha abordado estas preocupaciones con las denominadas técnicas de la preservación de la privacidad en el cómputo, un conjunto de herramientas que permiten protocolos de mejora de la privacidad donde el acceso a la información en texto claro ya no es sostenible. Entre estas técnicas, el cifrado homomórfico destaca por su capacidad para realizar operaciones sobre datos cifrados sin comprometer la confidencialidad o privacidad de la información. Sin embargo, a pesar de lo prometedor de esta técnica, sigue siendo una solución relativamente incipiente con limitaciones de eficiencia y usabilidad. La mejora de la eficiencia del cifrado homomórfico en la criptografía ha sido todo un reto, y, con las mejoras, la complejidad de las técnicas ha aumentado, especialmente para los usuarios no expertos. En esta tesis, abordamos el problema de la complejidad del cifrado homomórfico cuando se aplica al aprendizaje profundo. Comenzamos sistematizando el conocimiento existente en el campo a través de un análisis exhaustivo del estado del arte para el aprendizaje profundo que preserva la privacidad, identificando las tendencias clave, las lagunas de investigación y los problemas asociados con los enfoques actuales. Una de las lagunas identificadas radica en el uso de algoritmos vectorizados con cifrado homomórfico empaquetado, que es una técnica del estado del arte que reduce el coste del cifrado homomórfico en áreas complejas. Esta tesis analiza exhaustivamente los algoritmos existentes y propone nuevos algoritmos para el uso de aprendizaje profundo utilizando cifrado homomórfico empaquetado, presentando un análisis formal y unas pautas de uso para su implementación. La selección de parámetros de los esquemas del cifrado homomórfico es otro reto recurrente en la literatura, dado que juega un papel crítico a la hora de determinar no sólo la seguridad de la instanciación, sino también la precisión, el rendimiento y el grado de seguridad del esquema. Para abordar este reto, esta tesis propone un sistema innovador que combina la lógica difusa con tareas de programación lineal para producir parametrizaciones seguras basadas en argumentos de entrada de alto nivel sin requerir conocimientos de bajo nivel de las primitivas subyacentes. Por último, esta tesis propone HEFactory, un compilador de ejecución simbólica diseñado para agilizar el proceso de producción de código de cifrado homomórfico e integrarlo con Python. HEFactory es la culminación de las propuestas presentadas en esta tesis, proporcionando una arquitectura única que estratifica los retos asociados con el cifrado homomórfico, produciendo operaciones simplificadas que pueden ser interpretadas por bibliotecas de bajo nivel. Este enfoque permite a HEFactory reducir significativamente la longitud total del código, lo que supone una reducción del 80% en la complejidad de programación de aplicaciones de aprendizaje profundo que usan cifrado homomórfico en comparación con el código escrito por expertos, manteniendo una precisión equivalente.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidenta: María Isabel González Vasco.- Secretario: David Arroyo Guardeño.- Vocal: Antonis Michala

    Jornadas Nacionales de Investigación en Ciberseguridad: actas de las VIII Jornadas Nacionales de Investigación en ciberseguridad: Vigo, 21 a 23 de junio de 2023

    Get PDF
    Jornadas Nacionales de Investigación en Ciberseguridad (8ª. 2023. Vigo)atlanTTicAMTEGA: Axencia para a modernización tecnolóxica de GaliciaINCIBE: Instituto Nacional de Cibersegurida

    Distributed and Deep Vertical Federated Learning with Big Data

    Full text link
    In recent years, data are typically distributed in multiple organizations while the data security is becoming increasingly important. Federated Learning (FL), which enables multiple parties to collaboratively train a model without exchanging the raw data, has attracted more and more attention. Based on the distribution of data, FL can be realized in three scenarios, i.e., horizontal, vertical, and hybrid. In this paper, we propose to combine distributed machine learning techniques with Vertical FL and propose a Distributed Vertical Federated Learning (DVFL) approach. The DVFL approach exploits a fully distributed architecture within each party in order to accelerate the training process. In addition, we exploit Homomorphic Encryption (HE) to protect the data against honest-but-curious participants. We conduct extensive experimentation in a large-scale cluster environment and a cloud environment in order to show the efficiency and scalability of our proposed approach. The experiments demonstrate the good scalability of our approach and the significant efficiency advantage (up to 6.8 times with a single server and 15.1 times with multiple servers in terms of the training time) compared with baseline frameworks.Comment: To appear in CCPE (Concurrency and Computation: Practice and Experience
    corecore