38,459 research outputs found
Fault-Tolerant Adaptive Parallel and Distributed Simulation
Discrete Event Simulation is a widely used technique that is used to model
and analyze complex systems in many fields of science and engineering. The
increasingly large size of simulation models poses a serious computational
challenge, since the time needed to run a simulation can be prohibitively
large. For this reason, Parallel and Distributes Simulation techniques have
been proposed to take advantage of multiple execution units which are found in
multicore processors, cluster of workstations or HPC systems. The current
generation of HPC systems includes hundreds of thousands of computing nodes and
a vast amount of ancillary components. Despite improvements in manufacturing
processes, failures of some components are frequent, and the situation will get
worse as larger systems are built. In this paper we describe FT-GAIA, a
software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation
middleware. FT-GAIA transparently replicates simulation entities and
distributes them on multiple execution nodes. This allows the simulation to
tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some
protection against byzantine failures since synchronization messages are
replicated as well, so that the receiving entity can identify and discard
corrupted messages. We provide an experimental evaluation of FT-GAIA on a
running prototype. Results show that a high degree of fault tolerance can be
achieved, at the cost of a moderate increase in the computational load of the
execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed
Simulation and Real Time Applications (DS-RT 2016
Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication
This paper presents FT-GAIA, a software-based fault-tolerant parallel and
distributed simulation middleware. FT-GAIA has being designed to reliably
handle Parallel And Distributed Simulation (PADS) models, which are needed to
properly simulate and analyze complex systems arising in any kind of scientific
or engineering field. PADS takes advantage of multiple execution units run in
multicore processors, cluster of workstations or HPC systems. However, large
computing systems, such as HPC systems that include hundreds of thousands of
computing nodes, have to handle frequent failures of some components. To cope
with this issue, FT-GAIA transparently replicates simulation entities and
distributes them on multiple execution nodes. This allows the simulation to
tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some
protection against Byzantine failures, since interaction messages among the
simulated entities are replicated as well, so that the receiving entity can
identify and discard corrupted messages. Results from an analytical model and
from an experimental evaluation show that FT-GAIA provides a high degree of
fault tolerance, at the cost of a moderate increase in the computational load
of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731
On Constructing Persistent Identifiers with Persistent Resolution Targets
Persistent Identifiers (PID) are the foundation referencing digital assets in
scientific publications, books, and digital repositories. In its realization,
PIDs contain metadata and resolving targets in form of URLs that point to data
sets located on the network. In contrast to PIDs, the target URLs are typically
changing over time; thus, PIDs need continuous maintenance -- an effort that is
increasing tremendously with the advancement of e-Science and the advent of the
Internet-of-Things (IoT). Nowadays, billions of sensors and data sets are
subject of PID assignment. This paper presents a new approach of embedding
location independent targets into PIDs that allows the creation of
maintenance-free PIDs using content-centric network technology and overlay
networks. For proving the validity of the presented approach, the Handle PID
System is used in conjunction with Magnet Link access information encoding,
state-of-the-art decentralized data distribution with BitTorrent, and Named
Data Networking (NDN) as location-independent data access technology for
networks. Contrasting existing approaches, no green-field implementation of PID
or major modifications of the Handle System is required to enable
location-independent data dissemination with maintenance-free PIDs.Comment: Published IEEE paper of the FedCSIS 2016 (SoFAST-WS'16) conference,
11.-14. September 2016, Gdansk, Poland. Also available online:
http://ieeexplore.ieee.org/document/7733372
- …