2 research outputs found

    Workflow models for heterogeneous distributed systems

    Get PDF
    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    Measuring the Semantic Integrity of a Process Self

    Get PDF
    The focus of the thesis is the definition of a framework to protect a process from attacks against the process self, i.e. attacks that alter the expected behavior of the process, by integrating static analysis and run-time monitoring. The static analysis of the program returns a description of the process self that consists of a context-free grammar, which defines the legal system call traces, and a set of invariants on process variables that hold when a system call is issued. Run-time monitoring assures the semantic integrity of the process by checking that its behavior is coherent with the process self returned by the static analysis. The proposed framework can also cover kernel integrity to protect the process from attacks from the kernel-level. The implementation of the run-time monitoring is based upon introspection, a technique that analyzes the state of a computer to rebuild and check the consistency of kernel or user-level data structures. The ability of observing the run-time values of variables reduces the complexity of the static analysis and increases the amount of information that can be extracted on the run-time behavior of the process. To achieve transparency of the controls for the process while avoiding the introduction of special purpose hardware units that access the memory, the architecture of the run-time monitoring adopts virtualization technology and introduces two virtual machines, the monitored and the introspection virtual machines. This approach increases the overall robustness because a distinct virtual machine, the introspection virtual machine, applies introspection in a transparent way both to verify the kernel integrity and to retrieve the status of the process to check the process self. After presenting the framework and its implementation, the thesis discusses some of its applications to increase the security of a computer network. The first application of the proposed framework is the remote attestation of the semantic integrity of a process. Then, the thesis describes a set of extensions to the framework to protect a process from physical attacks by running an obfuscated version of the process code. Finally, the thesis generalizes the framework to support the efficient sharing of an information infrastructure among users and applications with distinct security and reliability requirements by introducing highly parallel overlays
    corecore