789,827 research outputs found
Molecular simulations and visualization: introduction and overview
Here we provide an introduction and overview of current progress in the field of molecular simulation and visualization, touching on the following topics: (1) virtual and augmented reality for immersive molecular simulations; (2) advanced visualization and visual analytic techniques; (3) new developments in high performance computing; and (4) applications and model building
Parallel and Distributed Computing for High-Performance Applications
The study of parallel and distributed computing has become an important area in computer science because it makes it possible to create high-performance software that can effectively handle challenging computational tasks. In terms of their use in the world of high-performance applications, parallel and distributed computing techniques are given a thorough introduction in this study. The partitioning of computational processes into smaller subtasks that may be completed concurrently on numerous processors or computers is the core idea underpinning parallel and distributed computing. This strategy enables quicker execution times and enhanced performance in general. Parallel and distributed computing are essential for high-performance applications like scientific simulations, data analysis, and artificial intelligence since they frequently call for significant computational resources. High-performance apps are able to effectively handle computationally demanding tasks thanks in large part to parallel and distributed computing. This article offers a thorough review of the theories, methods, difficulties, and developments in parallel and distributed computing for high-performance applications. Researchers and practitioners may fully utilize the potential of parallel and distributed computing to open up new vistas in computational science and engineering by comprehending the underlying concepts and utilizing the most recent breakthroughs
Accelerating the Rate of Astronomical Discovery with GPU-Powered Clusters
In recent years, the Graphics Processing Unit (GPU) has emerged as a low-cost
alternative for high performance computing, enabling impressive speed-ups for a
range of scientific computing applications. Early adopters in astronomy are
already benefiting in adapting their codes to take advantage of the GPU's
massively parallel processing paradigm. I give an introduction to, and overview
of, the use of GPUs in astronomy to date, highlighting the adoption and
application trends from the first ~100 GPU-related publications in astronomy. I
discuss the opportunities and challenges of utilising GPU computing clusters,
such as the new Australian GPU supercomputer, gSTAR, for accelerating the rate
of astronomical discovery.Comment: To appear in the proceedings of ADASS XXI, ed. P.Ballester and
D.Egret, ASP Conf. Se
Ab initio computations of molecular systems by the auxiliary-field quantum Monte Carlo method
The auxiliary-field quantum Monte Carlo (AFQMC) method provides a
computational framework for solving the time-independent Schroedinger equation
in atoms, molecules, solids, and a variety of model systems. AFQMC has recently
witnessed remarkable growth, especially as a tool for electronic structure
computations in real materials. The method has demonstrated excellent accuracy
across a variety of correlated electron systems. Taking the form of stochastic
evolution in a manifold of non-orthogonal Slater determinants, the method
resembles an ensemble of density-functional theory (DFT) calculations in the
presence of fluctuating external potentials. Its computational cost scales as a
low-power of system size, similar to the corresponding independent-electron
calculations. Highly efficient and intrinsically parallel, AFQMC is able to
take full advantage of contemporary high-performance computing platforms and
numerical libraries. In this review, we provide a self-contained introduction
to the exact and constrained variants of AFQMC, with emphasis on its
applications to the electronic structure in molecular systems. Representative
results are presented, and theoretical foundations and implementation details
of the method are discussed.Comment: 22 pages, 11 figure
Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads
The use of large-scale supercomputing architectures is a hard requirement for
scientific computing Big-Data applications. An example is genomics analytics,
where millions of data transformations and tests per patient need to be done to
find relevant clinical indicators. Therefore, to ensure open and broad access
to high-performance technologies, governments, and academia are pushing toward
the introduction of novel computing architectures in large-scale scientific
environments. This is the case of RISC-V, an open-source and royalty-free
instruction-set architecture. To evaluate such technologies, here we present
the Variant-Interaction Analytics use case benchmarking suite and datasets.
Through this use case, we search for possible genetic interactions using
computational and statistical methods, providing a representative case for
heavy ETL (Extract, Transform, Load) data processing. Current implementations
are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the
Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part
of the next MareNostrum generations. Here we describe the Variant Interaction
Use Case, highlighting the characteristics leveraging high-performance
computing, indicating the caveats and challenges towards the next RISC-V
developments and designs to come from a first comparison between x86 and RISC-V
architectures on real Variant Interaction executions over real hardware
implementations
Sparse matrix-vector multiplication on GPGPUs
The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high performance computing architectures. The introduction of General Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem. With this paper we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and trade-offs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains
Study and development of innovative strategies for energy-efficient cross-layer design of digital VLSI systems based on Approximate Computing
The increasing demand on requirements for high performance and energy efficiency in modern digital systems has led to the research of new design approaches that are able to go beyond the established energy-performance tradeoff. Looking at scientific literature, the Approximate Computing paradigm has been particularly prolific. Many applications in the domain of signal processing, multimedia, computer vision, machine learning are known to be particularly resilient to errors occurring on their input data and during computation, producing outputs that, although degraded, are still largely acceptable from the point of view of quality. The Approximate Computing design paradigm leverages the characteristics of this group of applications to develop circuits, architectures, algorithms that, by relaxing design constraints, perform their computations in an approximate or inexact manner reducing energy consumption. This PhD research aims to explore the design of hardware/software architectures based on Approximate Computing techniques, filling the gap in literature regarding effective applicability and deriving a systematic methodology to characterize its benefits and tradeoffs. The main contributions of this work are: -the introduction of approximate memory management inside the Linux OS, allowing dynamic allocation and de-allocation of approximate memory at user level, as for normal exact memory; - the development of an emulation environment for platforms with approximate memory units, where faults are injected during the simulation based on models that reproduce the effects on memory cells of circuital and architectural techniques for approximate memories; -the implementation and analysis of the impact of approximate memory hardware on real applications: the H.264 video encoder, internally modified to allocate selected data buffers in approximate memory, and signal processing applications (digital filter) using approximate memory for input/output buffers and tap registers; -the development of a fully reconfigurable and combinatorial floating point unit, which can work with reduced precision formats
Scalable communication for high-order stencil computations using CUDA-aware MPI
Modern compute nodes in high-performance computing provide a tremendous level
of parallelism and processing power. However, as arithmetic performance has
been observed to increase at a faster rate relative to memory and network
bandwidths, optimizing data movement has become critical for achieving strong
scaling in many communication-heavy applications. This performance gap has been
further accentuated with the introduction of graphics processing units, which
can provide by multiple factors higher throughput in data-parallel tasks than
central processing units. In this work, we explore the computational aspects of
iterative stencil loops and implement a generic communication scheme using
CUDA-aware MPI, which we use to accelerate magnetohydrodynamics simulations
based on high-order finite differences and third-order Runge-Kutta integration.
We put particular focus on improving intra-node locality of workloads. In
comparison to a theoretical performance model, our implementation exhibits
strong scaling from one to devices at -- efficiency in
sixth-order stencil computations when the problem domain consists of
-- cells.Comment: 17 pages, 15 figure
- …