561 research outputs found
Аналіз обчислювальних архітектур для реалізації розподіленої САПР
This paper describes a comparative analysis of high-performance computing architectures for building the core infrastructure of a distributed computer-aided design systems, such as a cluster, Grid and Cloud Computing.В данной работе приведены сравнительный анализ высокопроизводительных вычислительных архитектур, таких как вычислительный кластер, Grid, Cloud Computing, для построения базовой информационной инфраструктуры распределенных систем автоматизированного проектирования
Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures
The protein-folding problem has been extensively studied during the last
fifty years. The understanding of the dynamics of global shape of a protein and the influence
on its biological function can help us to discover new and more effective
drugs to deal with diseases of pharmacological relevance. Different computational approaches
have been developed by different researchers in order to foresee the threedimensional
arrangement of atoms of proteins from their sequences. However, the
computational complexity of this problem makes mandatory the search for new models,
novel algorithmic strategies and hardware platforms that provide solutions in a
reasonable time frame. We present in this revision work the past and last tendencies
regarding protein folding simulations from both perspectives; hardware and software.
Of particular interest to us are both the use of inexact solutions to this computationally hard problem as
well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs
Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems
An efficient three-dimensional Poisson solver for SIMD high-performance-computing architectures
We present an algorithm that solves the three-dimensional Poisson equation on a cylindrical grid. The technique uses a finite-difference scheme with operator splitting. This splitting maps the banded structure of the operator matrix into a two-dimensional set of tridiagonal matrices, which are then solved in parallel. Our algorithm couples FFT techniques with the well-known ADI (Alternating Direction Implicit) method for solving Elliptic PDE's, and the implementation is extremely well suited for a massively parallel environment like the SIMD architecture of the MasPar MP-1. Due to the highly recursive nature of our problem, we believe that our method is highly efficient, as it avoids excessive interprocessor communication
Arbor -- a morphologically-detailed neural network simulation library for contemporary high-performance computing architectures
We introduce Arbor, a performance portable library for simulation of large
networks of multi-compartment neurons on HPC systems. Arbor is open source
software, developed under the auspices of the HBP. The performance portability
is by virtue of back-end specific optimizations for x86 multicore, Intel KNL,
and NVIDIA GPUs. When coupled with low memory overheads, these optimizations
make Arbor an order of magnitude faster than the most widely-used comparable
simulation software. The single-node performance can be scaled out to run very
large models at extreme scale with efficient weak scaling.
HPC, GPU, neuroscience, neuron, softwareComment: PDP 2019 27th Euromicro International Conference on Parallel,
Distributed and Network-based Processin
Local time stepping on high performance computing architectures: mitigating CFL bottlenecks for large-scale wave propagation
Modeling problems that require the simulation of hyperbolic PDEs (wave equations) on large heterogeneous domains have potentially many bottlenecks. We attack this problem through two techniques: the massively parallel capabilities of graphics processors (GPUs) and local time stepping (LTS) to mitigate any CFL bottlenecks on a multiscale mesh. Many modern supercomputing centers are installing GPUs due to their high performance, and extending existing seismic wave-propagation software to use GPUs is vitally important to give application scientists the highest possible performance. In addition to this architectural optimization, LTS schemes avoid performance losses in meshes with localized areas of refinement. Coupled with the GPU performance optimizations, the derivation and implementation of an Newmark LTS scheme enables next-generation performance for real-world applications. Included in this implementation is work addressing the load-balancing problem inherent to multi-level LTS schemes, enabling scalability to hundreds and thousands of CPUs and GPUs. These GPU, LTS, and scaling optimizations accelerate the performance of existing applications by a factor of 30 or more, and enable future modeling scenarios previously made unfeasible by the cost of standard explicit time-stepping schemes
Using graph partitioning to accelerate task-based parallel applications
Current high performance computing architectures are composed of large shared memory NUMA nodes, among other components. Such nodes are becoming increasingly complex as they have several NUMA domains with different access latencies depending on the core where the access is issued.
In this work, we propose techniques based on graph partitioning to efficiently mitigate the negative impact of NUMA effects on parallel applications performance, which are able to improve the execution time of OpenMP parallel codes 2.02× times on average when run on architectures with strong NUMA effects
- …