789,827 research outputs found

    Parallel and Distributed Computing for High-Performance Applications

    Get PDF
    The study of parallel and distributed computing has become an important area in computer science because it makes it possible to create high-performance software that can effectively handle challenging computational tasks. In terms of their use in the world of high-performance applications, parallel and distributed computing techniques are given a thorough introduction in this study. The partitioning of computational processes into smaller subtasks that may be completed concurrently on numerous processors or computers is the core idea underpinning parallel and distributed computing. This strategy enables quicker execution times and enhanced performance in general. Parallel and distributed computing are essential for high-performance applications like scientific simulations, data analysis, and artificial intelligence since they frequently call for significant computational resources. High-performance apps are able to effectively handle computationally demanding tasks thanks in large part to parallel and distributed computing. This article offers a thorough review of the theories, methods, difficulties, and developments in parallel and distributed computing for high-performance applications. Researchers and practitioners may fully utilize the potential of parallel and distributed computing to open up new vistas in computational science and engineering by comprehending the underlying concepts and utilizing the most recent breakthroughs

    Accelerating the Rate of Astronomical Discovery with GPU-Powered Clusters

    Full text link
    In recent years, the Graphics Processing Unit (GPU) has emerged as a low-cost alternative for high performance computing, enabling impressive speed-ups for a range of scientific computing applications. Early adopters in astronomy are already benefiting in adapting their codes to take advantage of the GPU's massively parallel processing paradigm. I give an introduction to, and overview of, the use of GPUs in astronomy to date, highlighting the adoption and application trends from the first ~100 GPU-related publications in astronomy. I discuss the opportunities and challenges of utilising GPU computing clusters, such as the new Australian GPU supercomputer, gSTAR, for accelerating the rate of astronomical discovery.Comment: To appear in the proceedings of ADASS XXI, ed. P.Ballester and D.Egret, ASP Conf. Se

    Ab initio computations of molecular systems by the auxiliary-field quantum Monte Carlo method

    Full text link
    The auxiliary-field quantum Monte Carlo (AFQMC) method provides a computational framework for solving the time-independent Schroedinger equation in atoms, molecules, solids, and a variety of model systems. AFQMC has recently witnessed remarkable growth, especially as a tool for electronic structure computations in real materials. The method has demonstrated excellent accuracy across a variety of correlated electron systems. Taking the form of stochastic evolution in a manifold of non-orthogonal Slater determinants, the method resembles an ensemble of density-functional theory (DFT) calculations in the presence of fluctuating external potentials. Its computational cost scales as a low-power of system size, similar to the corresponding independent-electron calculations. Highly efficient and intrinsically parallel, AFQMC is able to take full advantage of contemporary high-performance computing platforms and numerical libraries. In this review, we provide a self-contained introduction to the exact and constrained variants of AFQMC, with emphasis on its applications to the electronic structure in molecular systems. Representative results are presented, and theoretical foundations and implementation details of the method are discussed.Comment: 22 pages, 11 figure

    Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads

    Full text link
    The use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations

    Sparse matrix-vector multiplication on GPGPUs

    Get PDF
    The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high performance computing architectures. The introduction of General Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem. With this paper we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and trade-offs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains

    Study and development of innovative strategies for energy-efficient cross-layer design of digital VLSI systems based on Approximate Computing

    Get PDF
    The increasing demand on requirements for high performance and energy efficiency in modern digital systems has led to the research of new design approaches that are able to go beyond the established energy-performance tradeoff. Looking at scientific literature, the Approximate Computing paradigm has been particularly prolific. Many applications in the domain of signal processing, multimedia, computer vision, machine learning are known to be particularly resilient to errors occurring on their input data and during computation, producing outputs that, although degraded, are still largely acceptable from the point of view of quality. The Approximate Computing design paradigm leverages the characteristics of this group of applications to develop circuits, architectures, algorithms that, by relaxing design constraints, perform their computations in an approximate or inexact manner reducing energy consumption. This PhD research aims to explore the design of hardware/software architectures based on Approximate Computing techniques, filling the gap in literature regarding effective applicability and deriving a systematic methodology to characterize its benefits and tradeoffs. The main contributions of this work are: -the introduction of approximate memory management inside the Linux OS, allowing dynamic allocation and de-allocation of approximate memory at user level, as for normal exact memory; - the development of an emulation environment for platforms with approximate memory units, where faults are injected during the simulation based on models that reproduce the effects on memory cells of circuital and architectural techniques for approximate memories; -the implementation and analysis of the impact of approximate memory hardware on real applications: the H.264 video encoder, internally modified to allocate selected data buffers in approximate memory, and signal processing applications (digital filter) using approximate memory for input/output buffers and tap registers; -the development of a fully reconfigurable and combinatorial floating point unit, which can work with reduced precision formats

    Scalable communication for high-order stencil computations using CUDA-aware MPI

    Full text link
    Modern compute nodes in high-performance computing provide a tremendous level of parallelism and processing power. However, as arithmetic performance has been observed to increase at a faster rate relative to memory and network bandwidths, optimizing data movement has become critical for achieving strong scaling in many communication-heavy applications. This performance gap has been further accentuated with the introduction of graphics processing units, which can provide by multiple factors higher throughput in data-parallel tasks than central processing units. In this work, we explore the computational aspects of iterative stencil loops and implement a generic communication scheme using CUDA-aware MPI, which we use to accelerate magnetohydrodynamics simulations based on high-order finite differences and third-order Runge-Kutta integration. We put particular focus on improving intra-node locality of workloads. In comparison to a theoretical performance model, our implementation exhibits strong scaling from one to 6464 devices at 50%50\%--87%87\% efficiency in sixth-order stencil computations when the problem domain consists of 2563256^3--102431024^3 cells.Comment: 17 pages, 15 figure
    • …
    corecore