1,434 research outputs found
Performance comparison of hierarchical checkpoint protocols grid computing
Grid infrastructure is a large set of nodes
geographically distributed and connected by a communication. In
this context, fault tolerance is a necessity imposed by the
distribution that poses a number of problems related to the
heterogeneity of hardware, operating systems, networks,
middleware, applications, the dynamic resource, the scalability,
the lack of common memory, the lack of a common clock, the
asynchronous communication between processes. To improve the
robustness of supercomputing applications in the presence of
failures, many techniques have been developed to provide
resistance to these faults of the system. Fault tolerance is intended
to allow the system to provide service as specified in spite of
occurrences of faults. It appears as an indispensable element in
distributed systems. To meet this need, several techniques have
been proposed in the literature. We will study the protocols based
on rollback recovery. These protocols are classified into two
categories: coordinated checkpointing and rollback protocols and
log-based independent checkpointing protocols or message
logging protocols. However, the performance of a protocol
depends on the characteristics of the system, network and
applications running. Faced with the constraints of large-scale
environments, many of algorithms of the literature showed
inadequate. Given an application environment and a system, it is
not easy to identify the recovery protocol that is most appropriate
for a cluster or hierarchical environment, like grid computing.
While some protocols have been used successfully in small scale,
they are not suitable for use in large scale. Hence there is a need
to implement these protocols in a hierarchical fashion to compare
their performance in grid computing. In this paper, we propose
hierarchical version of four well-known protocols. We have
implemented and compare the performance of these protocols in
clusters and grid computing using the Omnet++ simulator
Magic-State Functional Units: Mapping and Scheduling Multi-Level Distillation Circuits for Fault-Tolerant Quantum Architectures
Quantum computers have recently made great strides and are on a long-term
path towards useful fault-tolerant computation. A dominant overhead in
fault-tolerant quantum computation is the production of high-fidelity encoded
qubits, called magic states, which enable reliable error-corrected computation.
We present the first detailed designs of hardware functional units that
implement space-time optimized magic-state factories for surface code
error-corrected machines. Interactions among distant qubits require surface
code braids (physical pathways on chip) which must be routed. Magic-state
factories are circuits comprised of a complex set of braids that is more
difficult to route than quantum circuits considered in previous work [1]. This
paper explores the impact of scheduling techniques, such as gate reordering and
qubit renaming, and we propose two novel mapping techniques: braid repulsion
and dipole moment braid rotation. We combine these techniques with graph
partitioning and community detection algorithms, and further introduce a
stitching algorithm for mapping subgraphs onto a physical machine. Our results
show a factor of 5.64 reduction in space-time volume compared to the best-known
previous designs for magic-state factories.Comment: 13 pages, 10 figure
Rollback recovery with low overhead for fault tolerance in mobile ad hoc networks
AbstractMobile ad hoc networks (MANETs) have significantly enhanced the wireless networks by eliminating the need for any fixed infrastructure. Hence, these are increasingly being used for expanding the computing capacity of existing networks or for implementation of autonomous mobile computing Grids. However, the fragile nature of MANETs makes the constituent nodes susceptible to failures and the computing potential of these networks can be utilized only if they are fault tolerant. The technique of checkpointing based rollback recovery has been used effectively for fault tolerance in static and cellular mobile systems; yet, the implementation of existing protocols for MANETs is not straightforward. The paper presents a novel rollback recovery protocol for handling the failures of mobile nodes in a MANET using checkpointing and sender based message logging. The proposed protocol utilizes the routing protocol existing in the network for implementing a low overhead recovery mechanism. The presented recovery procedure at a node is completely domino-free and asynchronous. The protocol is resilient to the dynamic characteristics of the MANET; allowing a distributed application to be executed independently without access to any wired Grid or cellular network access points. We also present an algorithm to record a consistent global snapshot of the MANET
Effective Node Clustering and Data Dissemination In Large-Scale Wireless Sensor Networks
The denseness and random distribution of large-scale WSNs makes it quite difficult to replace or recharge nodes. Energy efficiency and management is a major design goal in these networks. In addition, reliability and scalability are two other major goals that have been identified by researchers as necessary in order to further expand the deployment of such networks for their use in various applications. This thesis aims to provide an energy efficient and effective node clustering and data dissemination algorithm in large-scale wireless sensor networks. In the area of clustering, the proposed research prolongs the lifetime of the network by saving energy through the use of node ranking to elect cluster heads, contrary to other existing cluster-based work that selects a random node or the node with the highest energy at a particular time instance as the new cluster head. Moreover, a global knowledge strategy is used to maintain a level of universal awareness of existing nodes in the subject area and to avoid the problem of disconnected or forgotten nodes. In the area of data dissemination, the aim of this research is to effectively manage the data collection by developing an efficient data collection scheme using a ferry node and applying a selective duty cycle strategy to the sensor nodes. Depending on the application, mobile ferries can be used for collecting data in a WSN, especially those that are large in scale, with delay tolerant applications. Unlike data collection via multi-hop forwarding among the sensing nodes, ferries travel across the sensing field to collect data. A ferry-based approach thus eliminates, or minimizes, the need for the multi-hop forwarding of data, and as a result, energy consumption at the nodes will be significantly reduced. This is especially true for nodes that are near the base station as they are used by other nodes to forward data to the base station. MATLAB is used to design, simulate and evaluate the proposed work against the work that has already been done by others by using various performance criteria
Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale
The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian
method, used in numerical simulations of fluids in astrophysics and
computational fluid dynamics, among many other fields. SPH simulations with
detailed physics represent computationally-demanding calculations. The
parallelization of SPH codes is not trivial due to the absence of a structured
grid. Additionally, the performance of the SPH codes can be, in general,
adversely impacted by several factors, such as multiple time-stepping,
long-range interactions, and/or boundary conditions. This work presents
insights into the current performance and functionalities of three SPH codes:
SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an
interdisciplinary co-design project, SPH-EXA, for the development of an
Exascale-ready SPH mini-app. To gain such insights, a rotating square patch
test was implemented as a common test simulation for the three SPH codes and
analyzed on two modern HPC systems. Furthermore, to stress the differences with
the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an
additional test case, the Evrard collapse, has also been carried out. This work
extrapolates the common basic SPH features in the three codes for the purpose
of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app.
Moreover, the outcome of this serves as direct feedback to the parent codes, to
improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on
Cluster Computing proceedings for WRAp1
CPPC: a compiler‐assisted tool for portable checkpointing of message‐passing applications
This is the peer reviewed version of the following article: Rodríguez, G. , Martín, M. J., González, P. , Touriño, J. and Doallo, R. (2010), CPPC: a compiler‐assisted tool for portable checkpointing of message‐passing applications. Concurrency Computat.: Pract. Exper., 22: 749-766. doi:10.1002/cpe.1541, which has been published in final form at https://doi.org/10.1002/cpe.1541. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] With the evolution of high‐performance computing toward heterogeneous, massively parallel systems, parallel applications have developed new checkpoint and restart necessities. Whether due to a failure in the execution or to a migration of the application processes to different machines, checkpointing tools must be able to operate in heterogeneous environments. However, some of the data manipulated by a parallel application are not truly portable. Examples of these include opaque state (e.g. data structures for communications support) or diversity of interfaces for a single feature (e.g. communications, I/O). Directly manipulating the underlying ad hoc representations renders checkpointing tools unable to work on different environments. Portable checkpointers usually work around portability issues at the cost of transparency: the user must provide information such as what data need to be stored, where to store them, or where to checkpoint. CPPC (ComPiler for Portable Checkpointing) is a checkpointing tool designed to feature both portability and transparency. It is made up of a library and a compiler. The CPPC library contains routines for variable level checkpointing, using portable code and protocols. The CPPC compiler helps to achieve transparency by relieving the user from time‐consuming tasks, such as data flow and communications analyses and adding instrumentation code. This paper covers both the operation of the CPPC library and its compiler support. Experimental results using benchmarks and large‐scale real applications are included, demonstrating usability, efficiency, and portability.Miniesterio de Educación y Ciencia; TIN2007‐67537‐C03Xunta de Galicia; 2006/
- …