186,055 research outputs found

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    On the Easy Use of Scientific Computing Services for Large Scale Linear Algebra and Parallel Decision Making with the P-Grade Portal

    Get PDF
    International audienceScientific research is becoming increasingly dependent on the large-scale analysis of data using distributed computing infrastructures (Grid, cloud, GPU, etc.). Scientific computing (Petitet et al. 1999) aims at constructing mathematical models and numerical solution techniques for solving problems arising in science and engineering. In this paper, we describe the services of an integrated portal based on the P-Grade (Parallel Grid Run-time and Application Development Environment) portal (http://www.p-grade.hu) that enables the solution of large-scale linear systems of equations using direct solvers, makes easier the use of parallel block iterative algorithm and provides an interface for parallel decision making algorithms. The ultimate goal is to develop a single sign on integrated multi-service environment providing an easy access to different kind of mathematical calculations and algorithms to be performed on hybrid distributed computing infrastructures combining the benefits of large clusters, Grid or cloud, when needed

    Parallel and Distributed Computing for High-Performance Applications

    Get PDF
    The study of parallel and distributed computing has become an important area in computer science because it makes it possible to create high-performance software that can effectively handle challenging computational tasks. In terms of their use in the world of high-performance applications, parallel and distributed computing techniques are given a thorough introduction in this study. The partitioning of computational processes into smaller subtasks that may be completed concurrently on numerous processors or computers is the core idea underpinning parallel and distributed computing. This strategy enables quicker execution times and enhanced performance in general. Parallel and distributed computing are essential for high-performance applications like scientific simulations, data analysis, and artificial intelligence since they frequently call for significant computational resources. High-performance apps are able to effectively handle computationally demanding tasks thanks in large part to parallel and distributed computing. This article offers a thorough review of the theories, methods, difficulties, and developments in parallel and distributed computing for high-performance applications. Researchers and practitioners may fully utilize the potential of parallel and distributed computing to open up new vistas in computational science and engineering by comprehending the underlying concepts and utilizing the most recent breakthroughs

    On the acceleration of wavefront applications using distributed many-core architectures

    Get PDF
    In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures

    Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques

    Full text link
    Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalances and waiting time due to memory latencies. Compiler optimization is one of the most effective solutions to tackle this problem. The compiler is able to detect the data dependencies in an application and is able to analyze the specific sections of code for parallelization potential. However, all of these techniques provided with a compiler are usually applied at compile time, so they rely on static analysis, which is insufficient for achieving maximum parallelism and producing desired application scalability. One solution to address this challenge is the use of runtime methods. This strategy can be implemented by delaying certain amount of code analysis to be done at runtime. In this research, we improve the parallel application performance generated by the OP2 compiler by leveraging HPX, a C++ runtime system, to provide runtime optimizations. These optimizations include asynchronous tasking, loop interleaving, dynamic chunk sizing, and data prefetching. The results of the research were evaluated using an Airfoil application which showed a 40-50% improvement in parallel performance.Comment: 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017

    Per Aspera ad Astra: On the Way to Parallel Processing

    Get PDF
    Computational Science and Engineering is being established as a third category of scientific methodology; this innovative discipline supports and supplements the traditional categories: theory and experiment, in order to solve the problems arising from complex systems challenging science and technology. While the successes of the past two decades in scientific computing have been achieved essentially by the technical breakthrough of the vector-supercomputers, today the discussion about the future of supercomputing is focussed on massively parallel computers. The discrepancy, however, between peak performance and sustained performance achievable with algorithmic kernels, software packages, and real applications is still disappointingly high. An important issue are programming models. While Message Passing on parallel computers with distributed memory is the only efficient programming paradigm available today, from a user's point of view it is hard to imagine that this programming model, rather than Shared Virtual Memory, will be capable to serve as the central basis in order to bring computing on massively parallel systems from a sheer computer science trend to the technological breakthrough needed to deal with the large applications of the future; this is especially true for commercial applications where explicit programming the data communication via Message Passing may turn out to be a huge software-technological barrier which nobody might be willing to surmount.KFA JĂŒlich is one of the largest big-science research centres in Europe; its scientific and engineering activities are ranging from fundamental research to applied science and technology. KFA's Central Institute for Applied Mathematics (ZAM) is running the large-scale computing facilities and network systems at KFA and is providing communication services, general-purpose and supercomputer capacity also to the HLRZ ("Höchstleistungsrechenzentrum") established in 1987 in order to further enhance and promote computational science in Germany. Thus, at KFA - and in particular enforced by ZAM - supercomputing has received high priority since more than ten years. What particle accelerators mean to experimental physics, supercomputers mean to Computational Science and Engineering: Supercomputers are the accelerators of theory

    Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures

    Get PDF
    As computing technology continues to advance, computational modeling of scientific and engineering problems produces data of increasing complexity: large in size and unstructured in shape. Volume visualization of such data is a challenging problem. This paper proposes a distributed parallel solution that makes ray-casting volume rendering of unstructured-grid data practical. Both the data and the rendering process are distributed among processors. At each processor, ray-casting of local data is performed independent of the other processors. The global image composing processes, which require inter-processor communication, are overlapped with the local ray-casting processes to achieve maximum parallel efficiency. This algorithm differs from previous ones in four ways: it is completely distributed, less view-dependent, reasonably scalable, and flexible. Without using dynamic load balancing, test results on the Intel Paragon using from two to 128 processors show, on average, about 60% parallel efficiency
    • 

    corecore