Search CORE

343 research outputs found

CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD CODE - GHOST

Author: Palki Anand B.
Publication venue: UKnowledge
Publication date: 01/01/2006
Field of study

This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD code - GHOST, on modern commodity clusters. The basic philosophy of this work is to optimize the cache performance of the code by splitting up the grid into smaller blocks and carrying out the required calculations on these smaller blocks. This in turn leads to enhanced code performance on commodity clusters. Accordingly, this work presents a discussion along with a detailed description of two techniques: external and internal blocking, for data access optimization. These techniques have been tested on steady, unsteady, laminar, and turbulent test cases and the results are presented. The critical hardware parameters which influenced the code performance were identified. A detailed study investigating the effect of these parameters on the code performance was conducted and the results are presented. The modified version of the code was also ported to the current state-of-art architectures with successful results

University of Kentucky

The distributed ASCI supercomputer project

The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project

VU Research Portal

Pure OAI Repository

International Migration, Integration and Social Cohesion online publications

Technologies and tools for high-performance distributed computing. Final report

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Scalable data clustering using GPUs

Author: Pangborn Andrew D.
Publication venue: RIT Scholar Works
Publication date: 01/05/2010
Field of study

The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA\u27s CUDA framework and Tesla architecture, were investigated as a low-cost, high performance solution to a number of clustering algorithms. C-means and Expectation Maximization with Gaussian mixture models were implemented using the CUDA framework. The algorithm implementations use a hybrid of CUDA, OpenMP, and MPI to scale to many GPUs on multiple nodes in a high performance computing environment. This framework is envisioned as part of a larger cloud-based workflow service where biologists can apply multiple algorithms and parameter sweeps to their data sets and quickly receive a thorough set of results that can be further analyzed by experts. Improvements over previous GPU-accelerated implementations range from 1.42x to 21x for C-means and 3.72x to 5.65x for the Gaussian mixture model on non-trivial data sets. Using a single NVIDIA GTX 260 speedups are on average 90x for C-means and 74x for Gaussians with flow cytometry files compared to optimized C code running on a single core of a modern Intel CPU. Using the TeraGrid Lincoln high performance cluster at NCSA C-means achieves 42% parallel efficiency and a CPU speedup of 4794x with 128 Tesla C1060 GPUs. The Gaussian mixture model achieves 72% parallel efficiency and a CPU speedup of 6286x

RIT Scholar Works

PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES

Author: Kristipati Pavan K.
Publication venue: UKnowledge
Publication date: 01/01/2008
Field of study

This thesis focuses on optimizing the performance of an in-house, structured, 2D CFD code – GHOST, on commodity cluster architectures. The basic philosophy of the work is to optimize the cache usage of the code by implementing efficient coding techniques without changing the underlying numerical algorithm. Various optimization techniques that were implemented and the resulting changes in performance have been presented. Two techniques, external and internal blocking that were implemented earlier to tune the performance of this code have been reviewed. What follows is further tuning effort in order to circumvent the problems associated with using the blocking techniques. Later, to establish the universality of the optimization techniques, testing has been done on more complicated test case. All the techniques presented in this thesis have been tested on steady, laminar test cases. It has been proved that optimized versions of the code achieve better performances on variety of commodity cluster architectures chosen in this study

University of Kentucky

Total Exchange Performance Prediction on Grid Environments: modeling and algorithmic issues

Author: Jeannot Emmanuel
Steffenel Luiz Angelo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2007
Field of study

ISBN 978-0-387-72497-3 (Print) 978-0-387-72498-0 (Online) Copyright : 2008International audienceOne of the most important collective communication patterns used in scientific applications is the complete exchange, also called All-to-All. Although efficient algorithms have been studied for specific networks, general solutions like those available in well-known MPI distributions (e.g. the MPI Alltoall operation) are strongly influenced by the congestion of network resources. In this paper we address the problem of modeling the performance of Total Exchange communication operations in grid environments. Because traditional performance models are unable to predict the real completion time of an All-to-All operation, we try to cope with this problem by identifying the factors that can interfere in both local and distant transmissions. We observe that the traditional MPI Alltoall implementation is not suited for grid environments, as it is both inefficient and hard to model. We focus therefore in an alternative algorithm for the total exchange redistribution problem. In our approach we perform communications in two different phases, aiming to minimize the number of communication steps through the wide-area network. This reduction has a direct impact on the performance modeling of the MPI Alltoall operation, as we minimize the factors that interfere with wide-area communications. Hence, we are able to define an accurate performance modeling of a total exchange between two clusters

INRIA a CCSD electronic archive server

Scaling-up reinforcement learning using parallelization and symbolic planning

Author: Grounds Matthew Jon
Publication venue: University of York
Publication date: 01/01/2007
Field of study

EThOS - Electronic Theses Online ServiceGBUnited Kingdo

White Rose E-theses Online

OpenGrey Repository