1,742 research outputs found
Integrated platform to assess seismic resilience at the community level
Due to the increasing frequency of disastrous events, the challenge of creating large-scale simulation models has become of major significance. Indeed, several simulation strategies and methodologies have been recently developed to explore the response of communities to natural disasters. Such models can support decision-makers during emergency operations allowing to create a global view of the emergency identifying consequences. An integrated platform that implements a community hybrid model with real-time simulation capabilities is presented in this paper. The platform's goal is to assess seismic resilience and vulnerability of critical infrastructures (e.g., built environment, power grid, socio-technical network) at the urban level, taking into account their interdependencies. Finally, different seismic scenarios have been applied to a large-scale virtual city model. The platform proved to be effective to analyze the emergency and could be used to implement countermeasures that improve community response and overall resilience
Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed
Efficient Parallel Reinforcement Learning Framework using the Reactor Model
Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL
workloads to multiple computational resources, allowing for faster generation
of samples, estimation of values, and policy improvement. These computational
paradigms require a seamless integration of training, serving, and simulation
workloads. Existing frameworks, such as Ray, are not managing this
orchestration efficiently, especially in RL tasks that demand intensive
input/output and synchronization between actors on a single node. In this
study, we have proposed a solution implementing the reactor model, which
enforces a set of actors to have a fixed communication pattern. This allows the
scheduler to eliminate work needed for synchronization, such as acquiring and
releasing locks for each actor or sending and processing coordination-related
messages. Our framework, Lingua Franca (LF), a coordination language based on
the reactor model, also supports true parallelism in Python and provides a
unified interface that allows users to automatically generate dataflow graphs
for RL tasks. In comparison to Ray on a single-node multi-core compute
platform, LF achieves 1.21x and 11.62x higher simulation throughput in OpenAI
Gym and Atari environments, reduces the average training time of synchronized
parallel Q-learning by 31.2%, and accelerates multi-agent RL inference by
5.12x.Comment: 10 pages, 11 figure
Parallélisation massive des algorithmes de branchement
Les problèmes d'optimisation et de recherche sont souvent NP-complets et des techniques de force brute doivent généralement être mises en œuvre pour trouver des solutions exactes. Des problèmes tels que le regroupement de gènes en bio-informatique ou la recherche de routes optimales dans les réseaux de distribution peuvent être résolus en temps exponentiel à l'aide de stratégies de branchement récursif. Néanmoins, ces algorithmes deviennent peu pratiques au-delà de certaines tailles d'instances en raison du grand nombre de scénarios à explorer, pour lesquels des techniques de parallélisation sont nécessaires pour améliorer les performances.
Dans des travaux antérieurs, des techniques centralisées et décentralisées ont été mises en œuvre afin d'augmenter le parallélisme des algorithmes de branchement tout en essayant de réduire les coûts de communication, qui jouent un rôle important dans les implémentations massivement parallèles en raison des messages passant entre les processus.
Ainsi, notre travail consiste à développer une bibliothèque entièrement générique en C++, nommée GemPBA, pour accélérer presque tous les algorithmes de branchement avec une parallélisation massive, ainsi que le développement d'un outil novateur et simpliste d'équilibrage de charge dynamique pour réduire le nombre de messages transmis en envoyant les tâches prioritaires en premier. Notre approche utilise une stratégie hybride centralisée-décentralisée, qui fait appel à un processus central chargé d'attribuer les rôles des travailleurs par des messages de quelques bits,
telles que les tâches n'ont pas besoin de passer par un processeur central.
De plus, un processeur en fonctionnement génère de nouvelles tâches si et seulement s'il y a des processeurs disponibles pour les recevoir, garantissant ainsi leur transfert, ce qui réduit considérablement les coûts de communication.
Nous avons réalisé nos expériences sur le problème de la couverture minimale de sommets, qui a montré des résultats remarquables, étant capable de résoudre même les graphes DIMACS les plus difficiles avec un simple algorithme MVC.Abstract: Optimization and search problems are often NP-complete, and brute-force techniques
must typically be implemented to find exact solutions. Problems such as clustering
genes in bioinformatics or finding optimal routes in delivery networks can be
solved in exponential-time using recursive branching strategies. Nevertheless, these
algorithms become impractical above certain instance sizes due to the large number of
scenarios that need to be explored, for which parallelization techniques are necessary
to improve the performance.
In previous works, centralized and decentralized techniques have been implemented
aiming to scale up parallelism on branching algorithms whilst attempting to
reduce communication overhead, which plays a significant role in massively parallel
implementations due to the messages passing across processes.
Thus, our work consists of the development of a fully generic library in C++,
named GemPBA, to speed up almost any branching algorithms with massive parallelization,
along with the development of a novel and simplistic Dynamic Load Balancing
tool to reduce the number of passed messages by sending high priority tasks
first. Our approach uses a hybrid centralized-decentralized strategy, which makes use
of a center process in charge of assigning worker roles by messages of a few bits of
size, such that tasks do not need to pass through a center processor.
Also, a working processor will spawn new tasks if and only if there are available
processors to receive them, thus, guaranteeing its transfer, and thereby the communication
overhead is notably decreased.
We performed our experiments on the Minimum Vertex Cover problem, which showed
remarkable results, being capable of solving even the toughest DIMACS graphs
with a simple MVC algorithm
Representation of distribution networks of ships using graph-theory
CETENA S.p.A., SISSA (International School for Advanced Studies) and Lloyd\u2019s Register
(Class Society) have recently been involved in a challenge aimed at developing smart algorithms capable to evaluate the effect of different failure modes \u2014 caused by a fire or a
flooding\u2014on the systems of passenger ships in order to improve the design of new passenger
ships [1]. Considering that a failure may cause serious accidents both to the vessel and
human lives, the goal of this project is to evaluate the best reconfiguration of current ship
plants after each casualty scenario so as to guarantee the minimal functioning requirements.
This implies a continuous cross check activity (design against installation) that follows the
whole ship construction process. The urgency of this work is motivated by the necessity to
meet the International Maritime Organizations (IMO) Safety Of Life At Sea (Solas) design
prescriptions defined in the Safe Return to Port (SRtP) regulations [2]. According to these
criteria, a vessel should be able to safely return to port under its own propulsion after an adverse event not exceeding any of the defined casualty thresholds and criteria imposed by the regulations. Thus, the identification of all the possible failure modes and their propagation
through the on-board systems has become a task of paramount importance for the proper
design of the ship\u2019s systems against failure events. Currently, in accordance with IMO MSC.1/circ.1369 [3], CETENA produces the Operating Manuals that allow the crew to reconfigure the essential systems after a SRtP casualty so as to be able to bring the ship to a port with adequate comfort and safety standards. However, the ship can be operated in a different way from what is planned in the design stage. In these scenarios, the present static Operational Manuals can be a limitation. In order to be effective during emergency operation, Operational Manuals must be dynamic so as to provide interactive information and guidance to crew members about the reconfiguration of the ship and the recovery of her functions based on the systems configuration at the moment of the casualty.
The focus of this work is the study of domino effects triggered by fire or flooding casualties
in passenger ships in order to provide crew with a tool which speeds up and facilitates
the decision-making process when choices have to be made to optimize the ship residual
capability after a casualty. The framework of this study may be extended to other types of
domino escalation
EVALUATION OF CLASSICAL INTER-PROCESS COMMUNICATION PROBLEMS IN PARALLEL PROGRAMMING LANGUAGES
It is generally believed for the past several years that parallel programming is the future of computing technology due to its incredible speed and vastly superior performance as compared to classic linear programming. However, how sure are we that this is the case? Despite its aforesaid average superiority, usually parallel-program implementations run in single-processor machines, making the parallelism almost virtual. In this case, does parallel programming still remain superior?The purpose of this document is to research and analyze the performance, in both storage and speed, of three parallel-programming language libraries: OpenMP, OpenMPI and PThreads, along with a few other hybrids obtained by combining two of these three libraries. These analyses will be applied to three classical multi-process synchronization problems: Dining Philosophers, Producers-Consumers and Sleeping Barbers
Exascale Deep Learning for Climate Analytics
We extract pixel-level masks of extreme weather patterns using variants of
Tiramisu and DeepLabv3+ neural networks. We describe improvements to the
software frameworks, input pipeline, and the network training algorithms
necessary to efficiently scale deep learning on the Piz Daint and Summit
systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained
throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up
to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel
efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor
Cores, a half-precision version of the DeepLabv3+ network achieves a peak and
sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November
11-16, 2018, Dallas, TX, US
- …