237 research outputs found

    Dynamic re-optimization techniques for stream processing engines and object stores

    Get PDF
    Large scale data storage and processing systems are strongly motivated by the need to store and analyze massive datasets. The complexity of a large class of these systems is rooted in their distributed nature, extreme scale, need for real-time response, and streaming nature. The use of these systems on multi-tenant, cloud environments with potential resource interference necessitates fine-grained monitoring and control. In this dissertation, we present efficient, dynamic techniques for re-optimizing stream-processing systems and transactional object-storage systems.^ In the context of stream-processing systems, we present VAYU, a per-topology controller. VAYU uses novel methods and protocols for dynamic, network-aware tuple-routing in the dataflow. We show that the feedback-driven controller in VAYU helps achieve high pipeline throughput over long execution periods, as it dynamically detects and diagnoses any pipeline-bottlenecks. We present novel heuristics to optimize overlays for group communication operations in the streaming model.^ In the context of object-storage systems, we present M-Lock, a novel lock-localization service for distributed transaction protocols on scale-out object stores to increase transaction throughput. Lock localization refers to dynamic migration and partitioning of locks across nodes in the scale-out store to reduce cross-partition acquisition of locks. The service leverages the observed object-access patterns to achieve lock-clustering and deliver high performance. We also present TransMR, a framework that uses distributed, transactional object stores to orchestrate and execute asynchronous components in amorphous data-parallel applications on scale-out architectures

    Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks

    Get PDF
    The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks

    Application-level Fault Tolerance and Resilience in HPC Applications

    Get PDF
    Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo] As necesidades computacionais das distintas ramas da ciencia medraron enormemente nos últimos anos, o que provocou un gran crecemento no rendemento proporcionado polos supercomputadores. Cada vez constrúense sistemas de computación de altas prestacións de maior tamaño, con máis recursos hardware de distintos tipos, o que fai que as taxas de fallo destes sistemas tamén medren. Polo tanto, o estudo de técnicas de tolerancia a fallos eficientes é indispensábel para garantires que os programas científicos poidan completar a súa execución, evitando ademais que se dispare o consumo de enerxía. O checkpoint/restart é unha das técnicas máis populares. Sen embargo, a maioría da investigación levada a cabo nas últimas décadas céntrase en estratexias stop-and-restart para aplicacións de memoria distribuída tralo acontecemento dun fallo-parada. Esta tese propón técnicas checkpoint/restart a nivel de aplicación para os modelos de programación paralela roáis populares en supercomputación. Implementáronse protocolos de checkpointing para aplicacións híbridas MPI-OpenMP e aplicacións heteroxéneas baseadas en OpenCL, en ámbolos dous casos prestando especial coidado á portabilidade e maleabilidade da solución. En canto a aplicacións de memoria distribuída, proponse unha solución de resiliencia que pode ser empregada de forma xenérica en aplicacións MPI SPMD, permitindo detectar e reaccionar a fallos-parada sen abortar a execución. Neste caso, os procesos fallidos vólvense a lanzar e o estado da aplicación recupérase cunha volta atrás global. A maiores, esta solución de resiliencia optimizouse implementando unha volta atrás local, na que só os procesos fallidos volven atrás, empregando un protocolo de almacenaxe de mensaxes para garantires a consistencia e o progreso da execución. Por último, propónse a extensión dunha librería de checkpointing para facilitares a implementación de estratexias de recuperación ad hoc ante conupcións de memoria. En moitas ocasións, estos erros poden ser xestionados a nivel de aplicación, evitando desencadear un fallo-parada e permitindo unha recuperación máis eficiente.[Resumen] El rápido aumento de las necesidades de cómputo de distintas ramas de la ciencia ha provocado un gran crecimiento en el rendimiento ofrecido por los supercomputadores. Cada vez se construyen sistemas de computación de altas prestaciones mayores, con más recursos hardware de distintos tipos, lo que hace que las tasas de fallo del sistema aumenten. Por tanto, el estudio de técnicas de tolerancia a fallos eficientes resulta indispensable para garantizar que los programas científicos puedan completar su ejecución, evitando además que se dispare el consumo de energía. La técnica checkpoint/restart es una de las más populares. Sin embargo, la mayor parte de la investigación en este campo se ha centrado en estrategias stop-and-restart para aplicaciones de memoria distribuida tras la ocurrencia de fallos-parada. Esta tesis propone técnicas checkpoint/restart a nivel de aplicación para los modelos de programación paralela más populares en supercomputación. Se han implementado protocolos de checkpointing para aplicaciones híbridas MPI-OpenMP y aplicaciones heterogéneas basadas en OpenCL, prestando en ambos casos especial atención a la portabilidad y la maleabilidad de la solución. Con respecto a aplicaciones de memoria distribuida, se propone una solución de resiliencia que puede ser usada de forma genérica en aplicaciones MPI SPMD, permitiendo detectar y reaccionar a fallosparada sin abortar la ejecución. En su lugar, se vuelven a lanzar los procesos fallidos y se recupera el estado de la aplicación con una vuelta atrás global. A mayores, esta solución de resiliencia ha sido optimizada implementando una vuelta atrás local, en la que solo los procesos fallidos vuelven atrás, empleando un protocolo de almacenaje de mensajes para garantizar la consistencia y el progreso de la ejecución. Por último, se propone una extensión de una librería de checkpointing para facilitar la implementación de estrategias de recuperación ad hoc ante corrupciones de memoria. Muchas veces, este tipo de errores puede gestionarse a nivel de aplicación, evitando desencadenar un fallo-parada y permitiendo una recuperación más eficiente.[Abstract] The rapid increase in the computational demands of science has lead to a pronounced growth in the performance offered by supercomputers. As High Performance Computing (HPC) systems grow larger, including more hardware components of different types, the system's failure rate becomes higher. Efficient fault tolerance techniques are essential not only to ensure the execution completion but also to save energy. Checkpoint/restart is one of the most popular fault tolerance techniques. However, most of the research in this field is focused on stop-and-restart strategies for distributed-memory applications in the event of fail-stop failures. Thís thesis focuses on the implementation of application-level checkpoint/restart solutions for the most popular parallel programming models used in HPC. Hence, we have implemented checkpointing solutions to cope with fail-stop failures in hybrid MPI-OpenMP applications and OpenCL-based programs. Both strategies maximize the restart portability and malleability, ie., the recovery can take place on machines with different CPU / accelerator architectures, and/ or operating systems, and can be adapted to the available resources (number of cores/accelerators). Regarding distributed-memory applications, we propose a resilience solution that can be generally applied to SPMD MPI programs. Resilient applications can detect and react to failures without aborting their execution upon fail-stop failures. Instead, failed processes are re-spawned, and the application state is recovered through a global rollback. Moreover, we have optimized this resilience proposal by implementing a local rollback protocol, in which only failed processes rollback to a previous state, while message logging enables global consistency and further progress of the computation. Finally, we have extended a checkpointing library to facilitate the implementation of ad hoc recovery strategies in the event of soft errors) caused by memory corruptions. Many times, these errors can be handled at the software-Ievel, tIms, avoiding fail-stop failures and enabling a more efficient recovery

    Optimization of MPI Collective Communication Operations

    Get PDF
    High-performance computing (HPC) systems keep growing in scale and heterogeneity to satisfy the increasing need for computation, and this brings new challenges to the design of Message Passing Interface (MPI) libraries, especially with regard to collective operations.The implementations of state-of-the-art MPI collective operations heavily rely on synchronizations, and these implementations magnify noise across the participating processes, resulting in significant performance slowdowns. Therefore, I create a new collective communication framework in Open MPI, using an event-driven design to relax synchronizations and maintain the minimal data dependencies of MPI collective operations.The recent growth in hardware heterogeneity results in increasingly complex hardware hierarchies and larger communication performance differences.Hence, in this dissertation, I present two approaches to perform hierarchical collective operations, and both can exploit the different bandwidths of hardware in heterogeneous systems and maximizing concurrent communications.Finally, to provide a fast and accurate autotuning mechanism for my framework, I design a new autotuning approach by combining two existing methods. This new approach significantly reduces the search space to save the autotuning time and is still able to provide accurate estimations.I evaluate my work with microbenchmarks and applications at different scales. Microbenchmark results show my work speedups MPI_Bcast and MPI_Allreduce up to 7.34X and 4.86X, respectively, on 4096 processes.In terms of applications, I achieve a 24.3% improvement for Hovorod and a 143% improvement for ASP on 1536 processes as compared to the current Open MPI

    Enabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis

    Full text link
    As software continues to evolve, software systems increase in complexity. With software systems composed of many distinct but interacting components, today’s system programmers, users, and administrators find themselves requiring automated ways to find, understand, and handle system mis-behavior. Recent information breaches such as the Equifax breach of 2017, and the Heartbleed vulnerability of 2014 show the need to understand and debug prior states of computer systems. In this thesis I focus on enabling practical entire-system retroactive analysis, allowing programmers, users, and system administrators to diagnose and understand the impact of these devastating mishaps. I focus primarly on two techniques. First, I discuss a novel deterministic record and replay system which enables fast, practical recollection of entire systems of computer state. Second, I discuss optimistic hybrid analysis, a novel optimization method capable of dramatically accelerating retroactive program analysis. Record and replay systems greatly aid in solving a variety of problems, such as fault tolerance, forensic analysis, and information providence. These solutions, however, assume ubiquitous recording of any application which may have a problem. Current record and replay systems are forced to trade-off between disk space and replay speed. This trade-off has historically made it impractical to both record and replay large histories of system level computation. I present Arnold, a novel record and replay system which efficiently records years of computation on a commodity hard-drive, and can efficiently replay any recorded information. Arnold combines caching with a unique process-group granularity of recording to produce both small, and quickly recalled recordings. My experiments show that under a desktop workload, Arnold could store 4 years of computation on a commodity 4TB hard drive. Dynamic analysis is used to retroactively identify and address many forms of system mis-behaviors including: programming errors, data-races, private information leakage, and memory errors. Unfortunately, the runtime overhead of dynamic analysis has precluded its adoption in many instances. I present a new dynamic analysis methodology called optimistic hybrid analysis (OHA). OHA uses knowledge of the past to predict program behaviors in the future. These predictions, or likely invariants are speculatively assumed true by a static analysis. This creates a static analysis which can be far more accurate than its traditional counterpart. Once this predicated static analysis is created, it is speculatively used to optimize a final dynamic analysis, creating a far more efficient dynamic analysis than otherwise possible. I demonstrate the effectiveness of OHA by creating an optimistic hybrid backward slicer, OptSlice, and optimistic data-race detector OptFT. OptSlice and OptFT are just as accurate as their traditional hybrid counterparts, but run on average 8.3x and 1.6x faster respectively. In this thesis I demonstrate that Arnold’s ability to record and replay entire computer systems, combined with optimistic hybrid analysis’s ability to quickly analyze prior computation, enable a practical and useful entire system retroactive analysis that has been previously unrealized.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144052/1/ddevec_1.pd

    Efficient Home-Based protocols for reducing asynchronous communication in shared virtual memory systems

    Full text link
    En la presente tesis se realiza una evaluación exhaustiva de ls Sistemas de Memoria Distribuida conocidos como Sistemas de Memoria Virtual Compartida. Este tipo de sistemas posee características que los hacen especialmente atractivos, como son su relativo bajo costo, alta portabilidad y paradigma de progración de memoria compartida. La evaluación consta de dos partes. En la primera se detallan las bases de diseño y el estado del arte de la investigación sobre este tipo de sistemas. En la segunda, se estudia el comportamiento de un conjunto representativo de cargas paralelas respecto a tres ejes de caracterización estrechamente relacionados con las prestaciones en estos sistemas. Mientras que la primera parte apunta la hipótesis de que la comunicación asíncrona es una de las principales causas de pérdida de prestaciones en los Sistemas de Memoria Virtual Compartida, la segunda no sólo la confirma, sino que ofrece un detallado análisis de las cargas del que se obteiene información sobre la potencial comunicación asíncrona atendiendo a diferentes parámetros del sistema. El resultado de la evaluación se utiliza para proponer dos nuevos protocolos para el funcionamiento de estos sistemas que utiliza un mínimo de recursos de hardware, alcanzando prestaciones similares e incluso superiores en algunos casos a sistemas que utilizan circuitos hardware de propósito específico para reducir la comunicación asíncrona. En particular, uno de los protocolos propuestos es comparado con una reconocida técnica hardware para reducir la comunicación asíncrona, obteniendo resultados satisfactorios y complementarios a la técnica comparada. Todos los modelos y técnicas usados en este trabajo han sido implementados y evalados utilizando un nuevo entorno de simulación desarollado en el contexto de este trabajo.Petit Martí, SV. (2003). Efficient Home-Based protocols for reducing asynchronous communication in shared virtual memory systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2908Palanci
    corecore