377 research outputs found
Assessing the Utility of a Personal Desktop Cluster
The computer workstation, introduced by Sun Microsystems in 1982, was the tool of
choice for scientists and engineers as an interactive computing environment for the development
of scientific codes. However, by the mid-1990s, the performance of workstations
began to lag behind high-end commodity PCs. This, coupled with the disappearance of
BSD-based operating systems in workstations and the emergence of Linux as an opensource
operating system for PCs, arguably led to the demise of the workstation as we
knew it.
Around the same time, computational scientists started to leverage PCs running
Linux to create a commodity-based (Beowulf) cluster that provided dedicated compute
cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large
supercomputers, i.e., supercomputing for the few. However, as the cluster movement
has matured, with respect to cluster hardware and open-source software, these clusters
have become much more like their large-scale supercomputing brethren — a shared
datacenter resource that resides in a machine room.
Consequently, the above observations, when coupled with the ever-increasing performance
gap between the PC and cluster supercomputer, provide the motivation for a
personal desktop cluster workstation — a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation
1 “pizza box” workstation. In this paper, we present the hardware and software
architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop cluster that achieves 14 Gflops on Linpack but sips only 150-180 watts of power, resulting in a performance-power ratio that is over 300% better than our test SMP platform
Parallel cloth simulation using OpenMp and CUDA
The widespread availability of parallel computing architectures has lead to research regarding algorithms and techniques that best exploit available parallelism. In addition to the CPU parallelism available; the GPU has emerged as a parallel computational device. The goal of this study was to explore the combined use of CPU and GPU parallelism by developing a hybrid parallel CPU/GPU cloth simulation application. In order to evaluate the benefits of the hybrid approach, the application was first developed in sequential CPU form, followed by a parallel CPU form. The application uses Backward Euler implicit time integration to solve the differential equations of motion associated with the physical system. The Conjugate Gradient (CG) algorithm is used to determine the solution vector for the system of equations formed by the Backward Euler approach. The matrix/vector, vector/vector, and vector/scalar operations required by CG are handled by calls to BLAS level 1 and level 2 functions. In the sequential CPU and parallel CPU versions, the Intel Math Kernel Library implementation of BLAS is used. In the hybrid parallel CPU/GPU version, the Nvidia CUDA based BLAS implementation (CUBLAS) is used. In the parallel CPU and hybrid implementations, OpenMP directives are used to parallelize the force application loop that traverses the list of forces acting on the system. Runtimes were collected for each version of the application while simulating cloth meshes with particle resolutions of 20x20, 40x40, and 60x60. The performance of each version was compared at each mesh resolution. The level of performance degradation experienced when transitioning to the larger mesh sizes was also determined. The hybrid parallel CPU/GPU implementation yielded the highest frame rate for the 40x40 and 60x60 meshes. The parallel CPU implementation yielded the highest frame rate for the 20x20 mesh. The performance of the hybrid parallel CPU/GPU implementation degraded the least as it transitioned to the two larger mesh sizes. The results of this study will potentially lead to further research regarding the use of GPUs to perform the matrix/vector operations associated with the CG algorithm under more complex cloth simulation scenarios
Impact of communication times on mixed CPU/GPU applications scheduling using KAAPI
National audienceHigh Performance Computing machines use more and more Graphical Processing Units as they are very efficient for homogeneous computation such as matrix operations. However before using these accelerators, one has to transfer data from the processor to them. Such a transfer can be slow. In this report, our aim is to study the impact of communication times on the makespan of a scheduling. Indeed, with a better anticipation of these communications, we could use the GPUs even more efficiently. More precisely, we will focus on machines with one or more GPUs and on applications with a low ratio of computations over communications. During this study, we have implemented two offline scheduling algorithms within XKAAPI's runtime. Then we have led an experimental study, combining these algorithms to highlight the impact of communication times. Finally our study has shown that, by using communication aware scheduling algorithms, we can reduce substantially the makespan of an application. Our experiments have shown a reduction of this makespan up to on a machine with several GPUs executing homogeneous computations
INGEN's advanced IT facilities: The least you need to know
The facilities described in this document were made possible in part through funding from Indiana University, the Indiana University Office of the Vice President for
Information Technology, the State of Indiana, Shared University Research Grants from IBM, Inc., and from the Lilly Endowment through their support o f the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc
University Information Technology Services' Advanced IT Facilities: The least every researcher needs to know
This is an archived document containing instructions for using IU's advanced IT facilities ca. 2003. A version of this document updated in 2011 is available from http://hdl.handle.net/2022/13620. Further versions are forthcoming.This document is designed to be read as a printed document, and designed to permit anyone at all familiar with computers and the Internet to start at the beginning, get a general overview of UITS' advanced IT facilities and what they offer, and then read the detailed portions of the document that are of interest. In many cases, examples are provided, as well as directions on how to download sample files. And in some cases there is information that one is best off really not learning – for example the process of logging into IU's IBM supercomputer the first time involves setup steps that should be followed, keystroke by keystroke, from the directions presented herein, and then promptly forgotten.
This document is intended to be a starting point, not a comprehensive guide. As such it should get any reader off to a good start, but then point the reader in the direction of consulting staff and online resources that will permit the reader to get additional help and information as needed.
Most of all, this document is provided for the convenience of researchers, who may peruse this information at their leisure. Our hope and expectation is that consultants in UITS will provide extensive help and programming assistance to IU researchers who wish to make use of these excellent IT facilities.The facilities described in this document were made possible in part through funding from Indiana University, the Indiana University Office of the Vice President for Information Technology, the State of Indiana, Shared University Research Grants from IBM, Inc., the National Science Foundation under Grant No. 0116050 and Grant CDA- 9601632, and from the Lilly Endowment through their support of the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc
The X-Files: Investigating Alien Performance in a Thin-client World
Many scientific applications use the X11 window environment; an open source
windows GUI standard employing a client/server architecture. X11 promotes:
distributed computing, thin-client functionality, cheap desktop displays,
compatibility with heterogeneous servers, remote services and administration,
and greater maturity than newer web technologies. This paper details the
author's investigations into close encounters with alien performance in
X11-based seismic applications running on a 200-node cluster, backed by 2 TB of
mass storage. End-users cited two significant UFOs (Unidentified Faulty
Operations) i) long application launch times and ii) poor interactive response
times. The paper is divided into three major sections describing Close
Encounters of the 1st Kind: citings of UFO experiences, the 2nd Kind: recording
evidence of a UFO, and the 3rd Kind: contact and analysis. UFOs do exist and
this investigation presents a real case study for evaluating workload analysis
and other diagnostic tools.Comment: 13 pages; Invited Lecture at the High Performance Computing
Conference, University of Tromso, Norway, June 27-30, 199
Recommended from our members
Hardware implementations of computer-generated holography: a review
Computer-generated holography (CGH) is a technique to generate holographic interference patterns. One of the major issues related to computer hologram generation is the massive computational power required. Hardware accelerators are used to accelerate this process. Previous publications targeting hardware platforms lack performance comparisons between different architectures and do not provide enough information for the evaluation of the suitability of recent hardware platforms for CGH algorithms. We aim to address these limitations and present a comprehensive review of CGH-related hardware implementations
A Toolkit for Simulation of Desktop Grid Environment
Peer to Peers, clusters and grids enable a combination of heterogeneous distributed recourses to resolve problems in different fields such as science, engineering and commerce. Organizations within the world wide grid environment network are offering geographically distributed resources which are administrated by schedulers and policies. Studying the resources behavior is time consuming due to their unique behavior and uniqueness. In this type of environment it is nearly impossible to prove the effectiveness of a scheduling algorithm. Hence the main objective of this study is to develop a desktop grid simulator toolkit for measuring and modeling scheduler algorithm performance. The selected methodology for the application development is based on prototyping methodology. The prototypes will be developed using JAVA language united with a MySQL database. Core functionality of the simulator are job generation, volunteer generation, simulating algorithms, generating graphical charts and generating reports.
A simulator for desktop grid environment has been developed using Java as the implementation language due to its wide popularity. The final system has been developed after a successful delivery of two prototypes. Despite the implementation of the mentioned core functionalities of a desktop grid simulator, advanced features such as viewing real-time graphical charts, generating PDF reports of the simulation result and exporting the final result as CSV files has been also included among the other features
- …