90 research outputs found

    Towards Improved Homomorphic Encryption for Privacy-Preserving Deep Learning

    Get PDF
    Mención Internacional en el título de doctorDeep Learning (DL) has supposed a remarkable transformation for many fields, heralded by some as a new technological revolution. The advent of large scale models has increased the demands for data and computing platforms, for which cloud computing has become the go-to solution. However, the permeability of DL and cloud computing are reduced in privacy-enforcing areas that deal with sensitive data. These areas imperatively call for privacy-enhancing technologies that enable responsible, ethical, and privacy-compliant use of data in potentially hostile environments. To this end, the cryptography community has addressed these concerns with what is known as Privacy-Preserving Computation Techniques (PPCTs), a set of tools that enable privacy-enhancing protocols where cleartext access to information is no longer tenable. Of these techniques, Homomorphic Encryption (HE) stands out for its ability to perform operations over encrypted data without compromising data confidentiality or privacy. However, despite its promise, HE is still a relatively nascent solution with efficiency and usability limitations. Improving the efficiency of HE has been a longstanding challenge in the field of cryptography, and with improvements, the complexity of the techniques has increased, especially for non-experts. In this thesis, we address the problem of the complexity of HE when applied to DL. We begin by systematizing existing knowledge in the field through an in-depth analysis of state-of-the-art for privacy-preserving deep learning, identifying key trends, research gaps, and issues associated with current approaches. One such identified gap lies in the necessity for using vectorized algorithms with Packed Homomorphic Encryption (PaHE), a state-of-the-art technique to reduce the overhead of HE in complex areas. This thesis comprehensively analyzes existing algorithms and proposes new ones for using DL with PaHE, presenting a formal analysis and usage guidelines for their implementation. Parameter selection of HE schemes is another recurring challenge in the literature, given that it plays a critical role in determining not only the security of the instantiation but also the precision, performance, and degree of security of the scheme. To address this challenge, this thesis proposes a novel system combining fuzzy logic with linear programming tasks to produce secure parametrizations based on high-level user input arguments without requiring low-level knowledge of the underlying primitives. Finally, this thesis describes HEFactory, a symbolic execution compiler designed to streamline the process of producing HE code and integrating it with Python. HEFactory implements the previous proposals presented in this thesis in an easy-to-use tool. It provides a unique architecture that layers the challenges associated with HE and produces simplified operations interpretable by low-level HE libraries. HEFactory significantly reduces the overall complexity to code DL applications using HE, resulting in an 80% length reduction from expert-written code while maintaining equivalent accuracy and efficiency.El aprendizaje profundo ha supuesto una notable transformación para muchos campos que algunos han calificado como una nueva revolución tecnológica. La aparición de modelos masivos ha aumentado la demanda de datos y plataformas informáticas, para lo cual, la computación en la nube se ha convertido en la solución a la que recurrir. Sin embargo, la permeabilidad del aprendizaje profundo y la computación en la nube se reduce en los ámbitos de la privacidad que manejan con datos sensibles. Estas áreas exigen imperativamente el uso de tecnologías de mejora de la privacidad que permitan un uso responsable, ético y respetuoso con la privacidad de los datos en entornos potencialmente hostiles. Con este fin, la comunidad criptográfica ha abordado estas preocupaciones con las denominadas técnicas de la preservación de la privacidad en el cómputo, un conjunto de herramientas que permiten protocolos de mejora de la privacidad donde el acceso a la información en texto claro ya no es sostenible. Entre estas técnicas, el cifrado homomórfico destaca por su capacidad para realizar operaciones sobre datos cifrados sin comprometer la confidencialidad o privacidad de la información. Sin embargo, a pesar de lo prometedor de esta técnica, sigue siendo una solución relativamente incipiente con limitaciones de eficiencia y usabilidad. La mejora de la eficiencia del cifrado homomórfico en la criptografía ha sido todo un reto, y, con las mejoras, la complejidad de las técnicas ha aumentado, especialmente para los usuarios no expertos. En esta tesis, abordamos el problema de la complejidad del cifrado homomórfico cuando se aplica al aprendizaje profundo. Comenzamos sistematizando el conocimiento existente en el campo a través de un análisis exhaustivo del estado del arte para el aprendizaje profundo que preserva la privacidad, identificando las tendencias clave, las lagunas de investigación y los problemas asociados con los enfoques actuales. Una de las lagunas identificadas radica en el uso de algoritmos vectorizados con cifrado homomórfico empaquetado, que es una técnica del estado del arte que reduce el coste del cifrado homomórfico en áreas complejas. Esta tesis analiza exhaustivamente los algoritmos existentes y propone nuevos algoritmos para el uso de aprendizaje profundo utilizando cifrado homomórfico empaquetado, presentando un análisis formal y unas pautas de uso para su implementación. La selección de parámetros de los esquemas del cifrado homomórfico es otro reto recurrente en la literatura, dado que juega un papel crítico a la hora de determinar no sólo la seguridad de la instanciación, sino también la precisión, el rendimiento y el grado de seguridad del esquema. Para abordar este reto, esta tesis propone un sistema innovador que combina la lógica difusa con tareas de programación lineal para producir parametrizaciones seguras basadas en argumentos de entrada de alto nivel sin requerir conocimientos de bajo nivel de las primitivas subyacentes. Por último, esta tesis propone HEFactory, un compilador de ejecución simbólica diseñado para agilizar el proceso de producción de código de cifrado homomórfico e integrarlo con Python. HEFactory es la culminación de las propuestas presentadas en esta tesis, proporcionando una arquitectura única que estratifica los retos asociados con el cifrado homomórfico, produciendo operaciones simplificadas que pueden ser interpretadas por bibliotecas de bajo nivel. Este enfoque permite a HEFactory reducir significativamente la longitud total del código, lo que supone una reducción del 80% en la complejidad de programación de aplicaciones de aprendizaje profundo que usan cifrado homomórfico en comparación con el código escrito por expertos, manteniendo una precisión equivalente.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidenta: María Isabel González Vasco.- Secretario: David Arroyo Guardeño.- Vocal: Antonis Michala

    Memory Usage Inference for Object-Oriented Programs

    Get PDF
    We present a type-based approach to statically derive symbolic closed-form formulae that characterize the bounds of heap memory usages of programs written in object-oriented languages. Given a program with size and alias annotations, our inference system will compute the amount of memory required by the methods to execute successfully as well as the amount of memory released when methods return. The obtained analysis results are useful for networked devices with limited computational resources as well as embedded software.Singapore-MIT Alliance (SMA

    Towards optimization-safe systems: analyzing the impact of undefined behavior

    Get PDF
    This paper studies an emerging class of software bugs called optimization-unstable code: code that is unexpectedly discarded by compiler optimizations due to undefined behavior in the program. Unstable code is present in many systems, including the Linux kernel and the Postgres database. The consequences of unstable code range from incorrect functionality to missing security checks. To reason about unstable code, this paper proposes a novel model, which views unstable code in terms of optimizations that leverage undefined behavior. Using this model, we introduce a new static checker called Stack that precisely identifies unstable code. Applying Stack to widely used systems has uncovered 160 new bugs that have been confirmed and fixed by developers.United States. Defense Advanced Research Projects Agency (DARPA Clean-slate design of Resilient, Adaptive, Secure Hosts (CRASH) program under contract #N66001-10-2-4089)National Science Foundation (U.S.) (NSF award CNS-1053143

    Privacy-preserving federated deep reinforcement learning for mobility-as-a-service

    Get PDF
    Mobility-as-a-service (MaaS) is a new transport model that combines multiple transport modes in a single platform. Dynamic passenger behavior based on past experiences requires reinforcement-based optimization of MaaS services. Deep reinforcement learning (DRL) may improve passenger satisfaction by offering the most appropriate transport services based on individual passenger experiences and preferences. However, this produces a new privacy risk to the MaaS platform using the centralized DRL method. Information leakage will occur if the platform is not carefully designed with privacy-preserving mechanisms. In this paper, we propose a federated deep deterministic policy gradient (FDDPG) that maximizes passenger satisfaction and MaaS long-term profit while preserving privacy. We enforce an equally weighted experience sampling mechanism to prevent sampling bias such that the solution quality of FDDPG is statistically equivalent to the centralized algorithm. During the model training and inference, information is processed locally, and only the gradients are shared, which prevents information leakage to any semi-honest participants and eavesdroppers. Secure aggregation protocol in line with the dynamic property of the mobile agent is also used in the gradient sharing step to ensure that the algorithm is prevented from inference attacks. We perform experiments on New York City-based real-world and synthetic scenarios. The results show that the proposed FDDPG can improve the MaaS profit and passenger satisfaction by about 90% and 15%, respectively, and maintain stable training against agent dropout. Our approach and findings could enhance MaaS utility as well as facilitate passenger trust and participation in MaaS and other data-driven transportation systems.Engineering and Physical Sciences Research Council (EPSRC): EP/V039164/

    model-based script synthesis for fuzzing

    Full text link
    Kernel fuzzing is important for finding critical kernel vulnerabilities. Close-source (e.g., Windows) operating system kernel fuzzing is even more challenging due to the lack of source code. Existing approaches fuzz the kernel by modeling syscall sequences from traces or static analysis of system codes. However, a common limitation is that they do not learn and mutate the syscall sequences to reach different kernel states, which can potentially result in more bugs or crashes. In this paper, we propose WinkFuzz, an approach to learn and mutate traced syscall sequences in order to reach different kernel states. WinkFuzz learns syscall dependencies from the trace, identifies potential syscalls in the trace that can have dependent subsequent syscalls, and applies the dependencies to insert more syscalls while preserving the dependencies into the trace. Then WinkFuzz fuzzes the synthesized new syscall sequence to find system crashes. We applied WinkFuzz to four seed applications and found a total increase in syscall number of 70.8\%, with a success rate of 61\%, within three insert levels. The average time for tracing, dependency analysis, recovering model script, and synthesizing script was 600, 39, 34, and 129 seconds respectively. The instant fuzzing rate is 3742 syscall executions per second. However, the average fuzz efficiency dropped to 155 syscall executions per second when the initializing time, waiting time, and other factors were taken into account. We fuzzed each seed application for 24 seconds and, on average, obtained 12.25 crashes within that time frame.Comment: 12 pages, conference pape

    Desarrollo de algoritmos para el tratamiento de datos GNSS : su aplicación a los escenarios GPS modernizado y Galileo

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Matemáticas, Sección Departamental de Física de la Tierra, Astronomía y Astrofísica I (Geofísica y Meteorología) (Astronomía y Geodesia), leída el 24-07-2012Nowadays, the major GNSS systems are the american GPS and the russian GLONASS, however, in a near future the european project Galileo and the chinesse system COMPASS will become part of the current GNSS scenario. These systems will transmit for the first time three different frequencies, giving place to a multi-system and multi-frequency scenario which will dramatically push the boundaries of the positioning techniques. Currently, one of the most studied positioning techniques is known as Precise Point Positioning (PPP), which is aimed at estimating precise receiver position from undifferenced GNSS code and carrier phase observations and precise satellite products. In this thesis, some new and original algorithms for static PPP have been developed, which are able to deal with the future multi-system and multifrequency GNSS observations. The new algorithms have been named MAP3. In the new approach, the least squares theory is applied twice to estimate the ionospheric delay, initial ambiguities and smoothed pseudodistances from undifferenced observations, which in turn are used to recover the receiver position and its clock offset. MAP3 provides position estimations with an accuracy of 2.5 cm after 2 hours observation and 7 mm in 1 day, being at the same level as other PPP programs and even better results are obtained with MAP3 in short observation periods. Moreover, MAP3 have provided some of the first results in positioning from GIOVE observations and GPC products. In addition, these algorithms have been applied in the analysis of the influence of ionospheric disturbances on the point positioning, concluding that the presence of a high ROT (Rate of TEC), observed at equatorial latitudes, reflects a significant degradation of the point positioning from dual-frequency observations.Actualmente, los únicos sistemas globales de navegación por satélites operativos son GPS y GLONASS, sin embargo, en un futuro cercano el proyecto europeo Galileo y el sistema chino COMPASS entrarán a formar parte del actual escenario GNSS. Estos sistemas emplearán por primera vez, tres frecuencias distintas, dando lugar a un escenario multi-frecuencia que revolucionará las técnicas de posicionamiento. Entre las técnicas actuales de posicionamiento con GNSS destaca el Posicionamiento Preciso Puntual (PPP), que consiste en determinar la posición de un receptor a partir de observaciones de código y fase no differenciadas y productos precisos. En este trabajo de tesis se han desarrollado unos nuevos y originales algoritmos para PPP estático, llamados MAP3, capaces de procesar observaciones GNSS multifrecuencia y multi-sistema del futuro escenario GNSS y determinar la posición de un receptor de forma precisa y exacta. Los algoritmos MAP3 se dividen en dos partes en las cuales se ha aplicado la teoría mínimos cuadrados y se han obtenido expresiones explícitas para estimar el retraso ionosférico, ambigüedades de fase inicial y pseudodistancias suavizadas, que se emplean para determinar la posición del receptor y el offset de su reloj. MAP3 proporciona una estimación de la posición con una exactitud de 2.5 cm tras 2 horas de observación y de 7 mm tras 24 h, resultados que mejoran los obtenidos hasta el momento con otros programas para PPP en periodos cortos de tiempo. Además, MAP3 han proporcionado los primeros resultados en el posicionamiento con observaciones GIOVE y productos del GPC. Por otro lado, estos algoritmos se han aplicado al análisis de los efectos de ciertas perturbaciones ionosféricas en el posicionamiento concluyendo que la presencia de un ROT (Rate of TEC) elevado, observado en latitudes ecuatoriales, refleja una degradación significativa del posicionamiento puntual con observaciones doble frecuencia.Unidad Deptal. de Astronomía y GeodesiaFac. de Ciencias MatemáticasTRUEunpu

    Monitoring of Aging Software Systems Affected by Integer Overflows

    Full text link
    Numerical aging-related bugs, which can manifest themselves as the accumulation of floating-point errors and the overflow of integers, represent a known but relatively neglected issue in the field of software aging and rejuvenation. Unfortunately, it is very difficult to avoid and to fix these bugs, since the rules of computer arithmetic and programming languages are often misunderstood or disregarded by programmers. Even though software rejuvenation can potentially mitigate these problems, its adoption is prevented by the lack of approaches for forecasting numerical software aging failures: in order to efficiently plan rejuvenation, the rate of numerical errors has to be known, or at least estimated. In this paper, we focus on software aging phenomena related to integer overflows. We present some examples of integer overflow issues of the MySQL open-source DBMS, and an approach for identifying symptoms of potential integer overflows by on-line monitoring
    corecore