271 research outputs found
Recommended from our members
Galois : a system for parallel execution of irregular algorithms
textA programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation makes progress towards this elusive goal by describing the design and implementation of the Galois system, a parallel programming model for shared-memory, multicore machines. Central to the design is the idea that scheduling of a program can be decoupled from the core computational operator and data structures. However, efficient programs often require application-specific scheduling to achieve best performance. To bridge this gap, an extensible and abstract scheduling policy language is proposed, which allows programmers to focus on selecting high-level scheduling policies while delegating the tedious task of implementing the policy to a scheduler synthesizer and runtime system. Implementations of deterministic and prioritized scheduling also are described. An evaluation of a well-studied benchmark suite reveals that factoring programs into operators, schedulers and data structures can produce significant performance improvements over unfactored approaches. Comparison of the Galois system with existing programming models for graph analytics shows significant performance improvements, often orders of magnitude more, due to (1) better support for the restrictive programming models of existing systems and (2) better support for more sophisticated algorithms and scheduling, which cannot be expressed in other systems.Computer Science
Resilience for Asynchronous Iterative Methods for Sparse Linear Systems
Large scale simulations are used in a variety of application areas in science and engineering to help forward the progress of innovation. Many spend the vast majority of their computational time attempting to solve large systems of linear equations; typically arising from discretizations of partial differential equations that are used to mathematically model various phenomena. The algorithms used to solve these problems are typically iterative in nature, and making efficient use of computational time on High Performance Computing (HPC) clusters involves constantly improving these iterative algorithms. Future HPC platforms are expected to encounter three main problem areas: scalability of code, reliability of hardware, and energy efficiency of the platform. The HPC resources that are expected to run the large programs are planned to consist of billions of processing units that come from more traditional multicore processors as well as a variety of different hardware accelerators. This growth in parallelism leads to the presence of all three problems.
Previously, work on algorithm development has focused primarily on creating fault tolerance mechanisms for traditional iterative solvers. Recent work has begun to revisit using asynchronous methods for solving large scale applications, and this dissertation presents research into fault tolerance for fine-grained methods that are asynchronous in nature. Classical convergence results for asynchronous methods are revisited and modified to account for the possible occurrence of a fault, and a variety of techniques for recovery from the effects of a fault are proposed. Examples of how these techniques can be used are shown for various algorithms, including an analysis of a fine-grained algorithm for computing incomplete factorizations. Lastly, numerous modeling and simulation tools for the further construction of iterative algorithms for HPC applications are developed, including numerical models for simulating faults and a simulation framework that can be used to extrapolate the performance of algorithms towards future HPC systems
Towards Distributed Task-based Visualization and Data Analysis
To support scientific work with large and complex data the field of scientific visualization emerged in computer science and produces images through computational analysis of the data. Frameworks for combination of different analysis and visualization modules allow the user to create flexible pipelines for this purpose and set the standard for interactive scientific visualization used by domain scientists.
Existing frameworks employ a thread-parallel message-passing approach to parallel and distributed scalability, leaving the field of scientific visualization in high performance computing to specialized ad-hoc implementations. The task-parallel programming paradigm proves promising to improve scalability and portability in high performance computing implementations and thus, this thesis aims towards the creation of a framework for distributed, task-based visualization modules and pipelines.
The major contribution of the thesis is the establishment of modules for Merge Tree construction and (based on the former) topological simplification. Such modules already form a necessary first step for most visualization pipelines and can be expected to increase in importance for larger and more complex data produced and/or analysed by high performance computing.
To create a task-parallel, distributed Merge Tree construction module the construction process has to be completely revised. We derive a novel property of Merge Tree saddles and introduce a novel task-parallel, distributed Merge Tree construction method that has both good performance and scalability. This forms the basis for a module for topological simplification which we extend by introducing novel alternative simplification parameters that aim to reduce the importance of prior domain knowledge to increase flexibility in typical high performance computing scenarios.
Both modules lay the groundwork for continuative analysis and visualization steps and form a fundamental step towards an extensive task-parallel visualization pipeline framework for high performance computing.Wissenschaftliche Visualisierung ist eine Disziplin der Informatik, die durch computergestützte Analyse Bilder aus Datensätzen erzeugt, um das wissenschaftliche Arbeiten mit großen und komplexen Daten zu unterstützen. Softwaresysteme, die dem Anwender die Kombination verschiedener Analyse- und Visualisierungsmodule zu einer flexiblen Pipeline erlauben, stellen den Standard für interaktive wissenschaftliche Visualisierung.
Die hierfür bereits existierenden Systeme setzen auf Thread-Parallelisierung mit expliziter Kommunikation, sodass das Feld der wissenschaftlichen Visualisierung auf Hochleistungsrechnern meist spezialisierten Direktlösungen überlassen wird. An dieser Stelle scheint Task-Parallelisierung vielversprechend, um Skalierbarkeit und Übertragbarkeit von Lösungen für Hochleistungsrechner zu verbessern. Daher zielt die vorliegende Arbeit auf die Umsetzung eines Softwaresystems für verteilte und task-parallele Visualisierungsmodule und -pipelines ab.
Der zentrale Beitrag den die vorliegende Arbeit leistet ist die Einführung zweier Module für Merge Tree Konstruktion und topologische Datenbereinigung. Solche Module stellen bereits einen notwendigen ersten Schritt für die meisten Visualisierungspipelines dar und werden für größere und komplexere Datensätze, die im Hochleistungsrechnen erzeugt beziehungsweise analysiert werden, erwartungsgemäß noch wichtiger.
Um eine Task-parallele, verteilbare Konstruktionsmethode für Merge Trees zu entwickeln musste der etablierte Algorithmus grundlegend überarbeitet werden. In dieser Arbeit leiten wir eine neue Eigenschaft für Merge Tree Knoten her und entwickeln einen neuartigen Konstruktionsalgorithmus, der gute Performance und Skalierbarkeit aufweist. Darauf aufbauend entwickeln wir ein Modul für topologische Datenbereinigung, welche wir durch neue, alternative Bereinigungsparameter erweitern, um die Flexibilität im Einstaz auf Hochleistungsrechnern zu erhöhen.
Beide Module ermöglichen weiterführende Analyse und Visualisierung und setzen einen Grundstein für die Entwicklung eines umfassenden Task-parallelen Softwaresystems für Visualisierungspipelines auf Hochleistungsrechnern
Intrusion tolerance in large scale networks
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2010The growing reliance on wide-area services demands highly available systems that provide
a correct and uninterrupted service. Therefore, Byzantine Fault-Tolerant (BFT) algorithms have
received considerable attention in the last years. A service is replicated over several servers and
can survive even in the presence of a bounded number of Byzantine failures.
The main motivation for this thesis is that for a replicated service to be fault-tolerant,
common mode failures have to be avoided. More specifically, the thesis is concerned with
common mode failures caused by natural disasters, power outages and physical attacks, which
have to be prevented by scattering replicas geographically. This requires the sites where the
replicas reside to be connected by a wide-area network (WAN) like the Internet.
Unfortunately, when the replicas are distributed geographically the performance of current
BFT algorithms is affected by the lower bandwidths, and the higher and more heterogeneous
network latencies. In order to deal with these limitations this thesis introduces novel BFT
algorithms that are simultaneously efficient and secure. Some algorithms of this thesis are based
on a hybrid fault model, i.e., considering that a part of the system is secure by construction. A
notable contribution of this thesis is the definition and implementation of a minimal trusted
service: the Unique Sequential Identifier Generator (USIG).
The thesis describes how to implement a 2 f +1 Byzantine consensus algorithm using a
2 f +1 reliable multicast algorithm that requires a trusted service, that is an abstract version
of the USIG. Then, the USIG service and the reliable multicast primitive are applied as a core
component to implement two novel BFT algorithms introduced in this thesis: MinBFT and
MinZyzzyva. These BFT algorithms are minimal in terms of number of replicas, complexity of
the trusted service used, and number of communication steps. In order to mitigate performance
degradation attacks, this thesis proposes the use of a rotating primary defining a novel BFT
algorithm, Spinning, that is less vulnerable to attacks caused by a faulty primary and attains a
throughput similar to the baseline algorithm in the area. Finally, the mechanisms and techniques developed in this thesis are combined in order to
define EBAWA, a novel BFT algorithm that is suitable for supporting the execution of wide-area
replicated services.O crescimento da dependência na utilização de serviços informáticos em redes de larga escala
demanda sistemas que forneçam um serviço correcto e ininterrupto. Por este motivo,
algoritmos tolerantes a faltas bizantinas (BFT) têm recebido considerável atenção nos últimos
anos. A ideia fundamental destes algoritmos é replicar um determinado serviço num conjunto
de servidores, assegurando a sua operação contínua mesmo na presença de um número limitado
de servidores faltosos. Cada servidor é uma réplica, uma máquina de estados determinística que
executa operações em resposta a requisições realizadas por clientes.
Para que serviços replicados sejam tolerantes a faltas, modos comuns de falhas devem
ser evitados, essa ´e a principal motivação desta tese. Mais especificamente, a tese trata de
falhas causadas por desastres naturais, falta de energia e ataques físicos. Para que a ocorrência
destas falhas afecte um número limitado de servidores é necessário distribuir as réplicas
geograficamente. Esta distribuição, requer que os locais onde se situam as réplicas sejam
conectados por uma rede de larga-escala (WAN), como a Internet.
Infelizmente, quando as réplicas estão distribuídas geograficamente o desempenho dos
algoritmos BFT actuais é afectado pelas limitações de largura de banda e latências heterogéneas,
típicas em redes de larga-escala. A fim de tratar destas limitações esta tese introduz novos
algoritmos BFT que são simultaneamente eficientes e seguros. Alguns destes algoritmos são
baseados em um modelo de faltas híbrido, por exemplo, parte do sistema ´e considerado seguro
pela sua construção. Uma importante contribuição desta tese é a definição e concretização de
um serviço confiável mínimo: o gerador de identificador único e sequencial (USIG).
A tese descreve como concretizar algoritmos de consenso bizantinos com 2 f +1 processos,
usando um algoritmo de reliable multicast que requer um componente confiável, uma abstração
do USIG. O serviço USIG e a primitiva de reliable multicast são aplicados como componentes
nucleares na concretização de dois novos algoritmos BFT introduzidos nesta tese: MinBFT e
MinZyzzyva. Estes algoritmos são mínimos em termos de número de réplicas, complexidade do
componente confiável e número de passos de comunicação.
A fim de mitigar os ataques de degradação de desempenho esta tese propõe o uso de um primário rotativo, definindo assim um novo algoritmo BFT, o Spinning. Al´em de ser menos
vulnerável a ataques causados por primários faltosos, o Spinning atinge um débito similar ao
algoritmo base. Finalmente, os mecanismos e técnicas desenvolvidos ao longo desta tese são
combinados com o objectivo de definir o EBAWA, um novo algoritmo BFT que é adequado para
suportar a execução de serviços replicados em redes de larga-escala.Programme ALBAN; Fundação para a Ciência e a Tecnologia - Portuga
Engineering Automation for Reliable Software Interim Progress Report (10/01/2000 - 09/30/2001)
Prepared for: U.S. Army Research Office
P.O. Box 12211
Research Triangle Park, NC 27709-2211The objective of our effort is to develop a scientific basis for producing reliable
software that is also flexible and cost effective for the DoD distributed software domain.
This objective addresses the long term goals of increasing the quality of service provided
by complex systems while reducing development risks, costs, and time. Our work focuses on
"wrap and glue" technology based on a domain specific distributed prototype model. The key
to making the proposed approach reliable, flexible, and cost-effective is the automatic
generation of glue and wrappers based on a designer's specification. The "wrap and glue"
approach allows system designers to concentrate on the difficult interoperability problems
and defines solutions in terms of deeper and more difficult interoperability issues, while
freeing designers from implementation details. Specific research areas for the proposed
effort include technology enabling rapid prototyping, inference for design checking,
automatic program generation, distributed real-time scheduling, wrapper and glue
technology, and reliability assessment and improvement. The proposed technology will be
integrated with past research results to enable a quantum leap forward in the state of the
art for rapid prototyping.U. S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-22110473-MA-SPApproved for public release; distribution is unlimited
A distributed control microprocessor system
Imperial Users onl
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
• The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
• Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
• NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
• Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
- …