7 research outputs found
Parallel programming environment for OpenMP
We present our effort to provide a comprehensive parallel programming environment for the OpenMP parallel directive language. This environment includes a parallel programming methodology for the OpenMP programming model and a set of tools ( Ursa Minor and InterPol) that support this methodology. Our toolset provides automated and interactive assistance to parallel programmers in time-consuming tasks of the proposed methodology. The features provided by our tools include performance and program structure visualization, interactive optimization, support for performance modeling, and performance advising for finding and correcting performance problems. The presented evaluation demonstrates that our environment offers significant support in general parallel tuning efforts and that the toolset facilitates many common tasks in OpenMP parallel programming in an efficient manner
Tiled Algorithms for Matrix Computations on Multicore Architectures
The current computer architecture has moved towards the multi/many-core
structure. However, the algorithms in the current sequential dense numerical
linear algebra libraries (e.g. LAPACK) do not parallelize well on
multi/many-core architectures. A new family of algorithms, the tile algorithms,
has recently been introduced to circumvent this problem. Previous research has
shown that it is possible to write efficient and scalable tile algorithms for
performing a Cholesky factorization, a (pseudo) LU factorization, and a QR
factorization. The goal of this thesis is to study tiled algorithms in a
multi/many-core setting and to provide new algorithms which exploit the current
architecture to improve performance relative to current state-of-the-art
libraries while maintaining the stability and robustness of these libraries.Comment: PhD Thesis, 2012 http://math.ucdenver.ed
On the Automatic Parallelization of the Perfect Benchmarks
This paper presents the results of the Cedar Hand-Parallelization Experiment, conducted from 1989 through 1992 within the Center for Supercomputing Research and Development (CSRD) at the University of Illinois. In this experiment we manually transformed the Perfect Benchmarks R fl into parallel program versions. In doing so, we used techniques that may be automated in an optimizing compiler. We then ran these programs on the Cedar multiprocessor (built at CSRD during the 1980s) and measured the speed improvement due to each technique
On the Automatic Parallelization of the Perfect Benchmarks R Rudolf Eigenmann y Jay Hoe inger
This paper presents the results of the Cedar Hand-Parallelization Experiment, conducte