83,133 research outputs found

    Performance Evaluation of Parallel Haemodynamic Computations on Heterogeneous Clouds

    Get PDF
    The article presents performance evaluation of parallel haemodynamic flow computations on heterogeneous resources of the OpenStack cloud infrastructure. The main focus is on the parallel performance analysis, energy consumption and virtualization overhead of the developed software service based on ANSYS Fluent platform which runs on Docker containers of the private university cloud. The haemodynamic aortic valve flow described by incompressible Navier-Stokes equations is considered as a target application of the hosted cloud infrastructure. The parallel performance of the developed software service is assessed measuring the parallel speedup of computations carried out on virtualized heterogeneous resources. The performance measured on Docker containers is compared with that obtained by using the native hardware. The alternative solution algorithms are explored in terms of the parallel performance and power consumption. The investigation of a trade-off between the computing speed and the consumed energy is performed by using Pareto front analysis and a linear scalarization method

    Performance Portable High Performance Conjugate Gradients Benchmark

    Get PDF
    The High Performance Conjugate Gradient Benchmark (HPCG) is an international project to create a more appropriate benchmark test for the world\u27s most powerful computers. The current LINPACK benchmark, which is the standard for measuring the performance of the top 500 fastest computers in the world, is moving computers in a direction that is no longer beneficial to many important parallel applications. HPCG is designed to exercise computations and data access patterns more commonly found in applications. The reference version of HPCG exploits only some parallelism available on existing supercomputers and the main focus of this work was to create a performance portable version of HPCG that gives reasonable performance on hybrid architectures

    Измерение производительности параллельных компьютеров с распределенной памятью

    No full text
    Розглянуто основні механізми вимірювання продуктивності паралельних комп’ютерів із розподіленою пам’яттю. Показано, що результати, отримані на де-факто стандартному тесті LINPACK, здебільшого мало пов’язані з ефективністю програм і застосувань. В результаті знову набули актуальності моделі та методи макроконвеєрних обчислень, сформульовані В.М. Глушковим ще наприкінці 1970-х років Результати цих досліджень висвітлено з позицій сучасної архітектури кластерних комплексів.The basic techniques for measuring the performance of distributed memory parallel computers are reviewed. The results obtained via de-facto standard LINPACK benchmark suite are shown to be weakly related to the efficiency of applied parallel programs. As a result, models and methods of macro-piping computations proposed by V.M. Glushkov in the late 1970s become topical again. These results are presented in the context of the modern architecture of cluster complexes

    Evaluation and Analysis of Distributed Graph-Parallel Processing Frameworks

    Get PDF
    A number of graph-parallel processing frameworks have been proposed to address the needs of processing complex and large-scale graph structured datasets in recent years. Although significant performance improvement made by those frameworks were reported, comparative advantages of each of these frameworks over the others have not been fully studied, which impedes the best utilization of those frameworks for a specific graph computing task and setting. In this work, we conducted a comparison study on parallel processing systems for large-scale graph computations in a systematic manner, aiming to reveal the characteristics of those systems in performing common graph algorithms with real-world datasets on the same ground. We selected three popular graph-parallel processing frameworks (Giraph, GPS and GraphLab) for the study and also include a representative general data-parallel computing system— Spark—in the comparison in order to understand how well a general data-parallel system can run graph problems. We applied basic performance metrics measuring speed, resource utilization, and scalability to answer a basic question of which graph-parallel processing platform is better suited for what applications and datasets. Three widely-used graph algorithms— clustering coefficient, shortest path length, and PageRank score—were used for benchmarking on the targeted computing systems.We ran those algorithms against three real world network datasets with diverse characteristics and scales on a research cluster and have obtained a number of interesting observations. For instance, all evaluated systems showed poor scalability (i.e., the runtime increases with more computing nodes) with small datasets likely due to communication overhead. Further, out of the evaluated graphparallel computing platforms, PowerGraph consistently exhibits better performance than others

    Efficiency analysis methodology of FPGAs based on lost frequencies, area and cycles

    Get PDF
    We propose a methodology to study and to quantify efficiency and the impact of overheads on runtime performance. Most work on High-Performance Computing (HPC) for FPGAs only studies runtime performance or cost, while we are interested in how far we are from peak performance and, more importantly, why. The efficiency of runtime performance is defined with respect to the ideal computational runtime in absence of inefficiencies. The analysis of the difference between actual and ideal runtime reveals the overheads and bottlenecks. A formal approach is proposed to decompose the efficiency into three components: frequency, area and cycles. After quantification of the efficiencies, a detailed analysis has to reveal the reasons for the lost frequencies, lost area and lost cycles. We propose a taxonomy of possible causes and practical methods to identify and quantify the overheads. The proposed methodology is applied on a number of use cases to illustrate the methodology. We show the interaction between the three components of efficiency and show how bottlenecks are revealed
    corecore