271 research outputs found

    Resilience for Asynchronous Iterative Methods for Sparse Linear Systems

    Get PDF
    Large scale simulations are used in a variety of application areas in science and engineering to help forward the progress of innovation. Many spend the vast majority of their computational time attempting to solve large systems of linear equations; typically arising from discretizations of partial differential equations that are used to mathematically model various phenomena. The algorithms used to solve these problems are typically iterative in nature, and making efficient use of computational time on High Performance Computing (HPC) clusters involves constantly improving these iterative algorithms. Future HPC platforms are expected to encounter three main problem areas: scalability of code, reliability of hardware, and energy efficiency of the platform. The HPC resources that are expected to run the large programs are planned to consist of billions of processing units that come from more traditional multicore processors as well as a variety of different hardware accelerators. This growth in parallelism leads to the presence of all three problems. Previously, work on algorithm development has focused primarily on creating fault tolerance mechanisms for traditional iterative solvers. Recent work has begun to revisit using asynchronous methods for solving large scale applications, and this dissertation presents research into fault tolerance for fine-grained methods that are asynchronous in nature. Classical convergence results for asynchronous methods are revisited and modified to account for the possible occurrence of a fault, and a variety of techniques for recovery from the effects of a fault are proposed. Examples of how these techniques can be used are shown for various algorithms, including an analysis of a fine-grained algorithm for computing incomplete factorizations. Lastly, numerous modeling and simulation tools for the further construction of iterative algorithms for HPC applications are developed, including numerical models for simulating faults and a simulation framework that can be used to extrapolate the performance of algorithms towards future HPC systems

    Towards Distributed Task-based Visualization and Data Analysis

    Get PDF
    To support scientific work with large and complex data the field of scientific visualization emerged in computer science and produces images through computational analysis of the data. Frameworks for combination of different analysis and visualization modules allow the user to create flexible pipelines for this purpose and set the standard for interactive scientific visualization used by domain scientists. Existing frameworks employ a thread-parallel message-passing approach to parallel and distributed scalability, leaving the field of scientific visualization in high performance computing to specialized ad-hoc implementations. The task-parallel programming paradigm proves promising to improve scalability and portability in high performance computing implementations and thus, this thesis aims towards the creation of a framework for distributed, task-based visualization modules and pipelines. The major contribution of the thesis is the establishment of modules for Merge Tree construction and (based on the former) topological simplification. Such modules already form a necessary first step for most visualization pipelines and can be expected to increase in importance for larger and more complex data produced and/or analysed by high performance computing. To create a task-parallel, distributed Merge Tree construction module the construction process has to be completely revised. We derive a novel property of Merge Tree saddles and introduce a novel task-parallel, distributed Merge Tree construction method that has both good performance and scalability. This forms the basis for a module for topological simplification which we extend by introducing novel alternative simplification parameters that aim to reduce the importance of prior domain knowledge to increase flexibility in typical high performance computing scenarios. Both modules lay the groundwork for continuative analysis and visualization steps and form a fundamental step towards an extensive task-parallel visualization pipeline framework for high performance computing.Wissenschaftliche Visualisierung ist eine Disziplin der Informatik, die durch computergestützte Analyse Bilder aus Datensätzen erzeugt, um das wissenschaftliche Arbeiten mit großen und komplexen Daten zu unterstützen. Softwaresysteme, die dem Anwender die Kombination verschiedener Analyse- und Visualisierungsmodule zu einer flexiblen Pipeline erlauben, stellen den Standard für interaktive wissenschaftliche Visualisierung. Die hierfür bereits existierenden Systeme setzen auf Thread-Parallelisierung mit expliziter Kommunikation, sodass das Feld der wissenschaftlichen Visualisierung auf Hochleistungsrechnern meist spezialisierten Direktlösungen überlassen wird. An dieser Stelle scheint Task-Parallelisierung vielversprechend, um Skalierbarkeit und Übertragbarkeit von Lösungen für Hochleistungsrechner zu verbessern. Daher zielt die vorliegende Arbeit auf die Umsetzung eines Softwaresystems für verteilte und task-parallele Visualisierungsmodule und -pipelines ab. Der zentrale Beitrag den die vorliegende Arbeit leistet ist die Einführung zweier Module für Merge Tree Konstruktion und topologische Datenbereinigung. Solche Module stellen bereits einen notwendigen ersten Schritt für die meisten Visualisierungspipelines dar und werden für größere und komplexere Datensätze, die im Hochleistungsrechnen erzeugt beziehungsweise analysiert werden, erwartungsgemäß noch wichtiger. Um eine Task-parallele, verteilbare Konstruktionsmethode für Merge Trees zu entwickeln musste der etablierte Algorithmus grundlegend überarbeitet werden. In dieser Arbeit leiten wir eine neue Eigenschaft für Merge Tree Knoten her und entwickeln einen neuartigen Konstruktionsalgorithmus, der gute Performance und Skalierbarkeit aufweist. Darauf aufbauend entwickeln wir ein Modul für topologische Datenbereinigung, welche wir durch neue, alternative Bereinigungsparameter erweitern, um die Flexibilität im Einstaz auf Hochleistungsrechnern zu erhöhen. Beide Module ermöglichen weiterführende Analyse und Visualisierung und setzen einen Grundstein für die Entwicklung eines umfassenden Task-parallelen Softwaresystems für Visualisierungspipelines auf Hochleistungsrechnern

    Intrusion tolerance in large scale networks

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2010The growing reliance on wide-area services demands highly available systems that provide a correct and uninterrupted service. Therefore, Byzantine Fault-Tolerant (BFT) algorithms have received considerable attention in the last years. A service is replicated over several servers and can survive even in the presence of a bounded number of Byzantine failures. The main motivation for this thesis is that for a replicated service to be fault-tolerant, common mode failures have to be avoided. More specifically, the thesis is concerned with common mode failures caused by natural disasters, power outages and physical attacks, which have to be prevented by scattering replicas geographically. This requires the sites where the replicas reside to be connected by a wide-area network (WAN) like the Internet. Unfortunately, when the replicas are distributed geographically the performance of current BFT algorithms is affected by the lower bandwidths, and the higher and more heterogeneous network latencies. In order to deal with these limitations this thesis introduces novel BFT algorithms that are simultaneously efficient and secure. Some algorithms of this thesis are based on a hybrid fault model, i.e., considering that a part of the system is secure by construction. A notable contribution of this thesis is the definition and implementation of a minimal trusted service: the Unique Sequential Identifier Generator (USIG). The thesis describes how to implement a 2 f +1 Byzantine consensus algorithm using a 2 f +1 reliable multicast algorithm that requires a trusted service, that is an abstract version of the USIG. Then, the USIG service and the reliable multicast primitive are applied as a core component to implement two novel BFT algorithms introduced in this thesis: MinBFT and MinZyzzyva. These BFT algorithms are minimal in terms of number of replicas, complexity of the trusted service used, and number of communication steps. In order to mitigate performance degradation attacks, this thesis proposes the use of a rotating primary defining a novel BFT algorithm, Spinning, that is less vulnerable to attacks caused by a faulty primary and attains a throughput similar to the baseline algorithm in the area. Finally, the mechanisms and techniques developed in this thesis are combined in order to define EBAWA, a novel BFT algorithm that is suitable for supporting the execution of wide-area replicated services.O crescimento da dependência na utilização de serviços informáticos em redes de larga escala demanda sistemas que forneçam um serviço correcto e ininterrupto. Por este motivo, algoritmos tolerantes a faltas bizantinas (BFT) têm recebido considerável atenção nos últimos anos. A ideia fundamental destes algoritmos é replicar um determinado serviço num conjunto de servidores, assegurando a sua operação contínua mesmo na presença de um número limitado de servidores faltosos. Cada servidor é uma réplica, uma máquina de estados determinística que executa operações em resposta a requisições realizadas por clientes. Para que serviços replicados sejam tolerantes a faltas, modos comuns de falhas devem ser evitados, essa ´e a principal motivação desta tese. Mais especificamente, a tese trata de falhas causadas por desastres naturais, falta de energia e ataques físicos. Para que a ocorrência destas falhas afecte um número limitado de servidores é necessário distribuir as réplicas geograficamente. Esta distribuição, requer que os locais onde se situam as réplicas sejam conectados por uma rede de larga-escala (WAN), como a Internet. Infelizmente, quando as réplicas estão distribuídas geograficamente o desempenho dos algoritmos BFT actuais é afectado pelas limitações de largura de banda e latências heterogéneas, típicas em redes de larga-escala. A fim de tratar destas limitações esta tese introduz novos algoritmos BFT que são simultaneamente eficientes e seguros. Alguns destes algoritmos são baseados em um modelo de faltas híbrido, por exemplo, parte do sistema ´e considerado seguro pela sua construção. Uma importante contribuição desta tese é a definição e concretização de um serviço confiável mínimo: o gerador de identificador único e sequencial (USIG). A tese descreve como concretizar algoritmos de consenso bizantinos com 2 f +1 processos, usando um algoritmo de reliable multicast que requer um componente confiável, uma abstração do USIG. O serviço USIG e a primitiva de reliable multicast são aplicados como componentes nucleares na concretização de dois novos algoritmos BFT introduzidos nesta tese: MinBFT e MinZyzzyva. Estes algoritmos são mínimos em termos de número de réplicas, complexidade do componente confiável e número de passos de comunicação. A fim de mitigar os ataques de degradação de desempenho esta tese propõe o uso de um primário rotativo, definindo assim um novo algoritmo BFT, o Spinning. Al´em de ser menos vulnerável a ataques causados por primários faltosos, o Spinning atinge um débito similar ao algoritmo base. Finalmente, os mecanismos e técnicas desenvolvidos ao longo desta tese são combinados com o objectivo de definir o EBAWA, um novo algoritmo BFT que é adequado para suportar a execução de serviços replicados em redes de larga-escala.Programme ALBAN; Fundação para a Ciência e a Tecnologia - Portuga

    Engineering Automation for Reliable Software Interim Progress Report (10/01/2000 - 09/30/2001)

    Get PDF
    Prepared for: U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211The objective of our effort is to develop a scientific basis for producing reliable software that is also flexible and cost effective for the DoD distributed software domain. This objective addresses the long term goals of increasing the quality of service provided by complex systems while reducing development risks, costs, and time. Our work focuses on "wrap and glue" technology based on a domain specific distributed prototype model. The key to making the proposed approach reliable, flexible, and cost-effective is the automatic generation of glue and wrappers based on a designer's specification. The "wrap and glue" approach allows system designers to concentrate on the difficult interoperability problems and defines solutions in terms of deeper and more difficult interoperability issues, while freeing designers from implementation details. Specific research areas for the proposed effort include technology enabling rapid prototyping, inference for design checking, automatic program generation, distributed real-time scheduling, wrapper and glue technology, and reliability assessment and improvement. The proposed technology will be integrated with past research results to enable a quantum leap forward in the state of the art for rapid prototyping.U. S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-22110473-MA-SPApproved for public release; distribution is unlimited

    Report on the Second European SIGOPS Workshop “making distributed systems work”

    Full text link

    A distributed control microprocessor system

    Get PDF
    Imperial Users onl

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore