5,878 research outputs found
Construction and Application of an AMR Algorithm for Distributed Memory Computers
While the parallelization of blockstructured adaptive mesh refinement techniques is relatively straight-forward on shared memory architectures, appropriate distribution strategies for the emerging generation of distributed
memory machines are a topic of on-going research. In this paper, a locality-preserving domain decomposition is proposed that partitions the entire AMR hierarchy from the base level on. It is shown that the approach reduces the
communication costs and simplifies the implementation. Emphasis is put on the effective parallelization of the flux correction procedure at coarse-fine boundaries, which is indispensable for conservative finite volume schemes. An
easily reproducible standard benchmark and a highly resolved parallel AMR
simulation of a diffracting hydrogen-oxygen detonation demonstrate the proposed
strategy in practice
Tolerating Correlated Failures in Massively Parallel Stream Processing Engines
Fault-tolerance techniques for stream processing engines can be categorized
into passive and active approaches. A typical passive approach periodically
checkpoints a processing task's runtime states and can recover a failed task by
restoring its runtime state using its latest checkpoint. On the other hand, an
active approach usually employs backup nodes to run replicated tasks. Upon
failure, the active replica can take over the processing of the failed task
with minimal latency. However, both approaches have their own inadequacies in
Massively Parallel Stream Processing Engines (MPSPE). The passive approach
incurs a long recovery latency especially when a number of correlated nodes
fail simultaneously, while the active approach requires extra replication
resources. In this paper, we propose a new fault-tolerance framework, which is
Passive and Partially Active (PPA). In a PPA scheme, the passive approach is
applied to all tasks while only a selected set of tasks will be actively
replicated. The number of actively replicated tasks depends on the available
resources. If tasks without active replicas fail, tentative outputs will be
generated before the completion of the recovery process. We also propose
effective and efficient algorithms to optimize a partially active replication
plan to maximize the quality of tentative outputs. We implemented PPA on top of
Storm, an open-source MPSPE and conducted extensive experiments using both real
and synthetic datasets to verify the effectiveness of our approach
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
- …