Search CORE

856 research outputs found

Quantitative performance evaluation of SCI memory hierarchies

Author: Hexsel Roberto A.
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Edinburgh Research Archive

Literature Review For Networking And Communication Technology

Author: Bassiouni Michael
Georgiopoulos Michael
Thompson Jack
Publication venue: Institute for Simulation and Training, University of Central Florida
Publication date: 01/01/1989
Field of study

Report documents the results of a literature search performed in the area of networking and communication technology

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Divide-and-conquer algorithms for multiprocessors

Author: Mukkavilli Lakshmankumar
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1991
Field of study

During the past decade there has been a tremendous surge in understanding the nature of parallel computation. A number of parallel computers are commercially available. However, there are some problems in developing application programs on these computers;This dissertation considers various issues involved in implementing parallel algorithms on Multiple Instruction Multiple Data (MIMD) machines with a bounded number of processors. Strategies for implementing divide-and-conquer algorithms on MIMD machines are proposed. Results linking time complexity, communication complexity and the complexity of divide-and-combine functions of divide-and-conquer algorithms are analyzed. An efficient criterion for partitioning a parallel program is proposed and a method for obtaining a closed form expression for time complexity of a parallel program in terms of problem size and number of processors is developed

Digital Repository @ Iowa State University (ISU)

GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics

Author: Aubert
Bagla
Bryan
Campbell
Collins
Frigo
Fryxell
Gingold
Godunov
Hallman
Hockney
Hsi-Yu Schive
Klypin
Kravtsov
Landau
Martin
NVIDIA
O'Shea
Pen
Press
Ricker
Tzihong Chiueh
Woo
Yu-Chih Tsai
Publication venue: 'IOP Publishing'
Publication date: 24/12/2009
Field of study

We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh Refinement code), which has adopted a novel approach to improve the performance of adaptive mesh refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is made to diminish by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely-baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with 8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included. Accepted for publication in ApJ

arXiv.org e-Print Archive

CiteSeerX

Crossref

National Taiwan University Repository

Convergence theories of distributed iterative processes: a survey

Author
Publication venue: Laboratory for Information and Decision Systems, Massachusetts Institute of Technology
Publication date: 01/01/1984
Field of study

Bibliography: p. 43-45."October, 1984." (Revision of LIDS-P-1342)" NSF-ECS-8217668" " ONR/N00014-75-C-1183" " ONR-N00014-77-C-0532(NR041-519)" ONR N00014-84-K-0519"Dimitri P. Bertsekas, John N. Tsitsiklis, Michael Athans

DSpace@MIT

Methodology for modeling high performance distributed and parallel systems

Author: Kushwaha Rakesh
Publication venue: Digital Commons @ NJIT
Publication date: 31/10/1993
Field of study

Performance modeling of distributed and parallel systems is of considerable importance to the high performance computing community. To achieve high performance, proper task or process assignment and data or file allocation among processing sites is essential. This dissertation describes an elegant approach to model distributed and parallel systems, which combines the optimal static solutions for data allocation with dynamic policies for task assignment. A performance-efficient system model is developed using analytical tools and techniques. The system model is accomplished in three steps. First, the basic client-server model which allows only data transfer is evaluated. A prediction and evaluation method is developed to examine the system behavior and estimate performance measures. The method is based on known product form queueing networks. The next step extends the model so that each site of the system behaves as both client and server. A data-allocation strategy is designed at this stage which optimally assigns the data to the processing sites. The strategy is based on flow deviation technique in queueing models. The third stage considers process-migration policies. A novel on-line adaptive load-balancing algorithm is proposed which dynamically migrates processes and transfers data among different sites to minimize the job execution cost. The gradient-descent rule is used to optimize the cost function, which expresses the cost of process execution at different processing sites. The accuracy of the prediction method and the effectiveness of the analytical techniques is established by the simulations. The modeling procedure described here is general and applicable to any message-passing distributed and parallel system. The proposed techniques and tools can be easily utilized in other related areas such as networking and operating systems. This work contributes significantly towards the design of distributed and parallel systems where performance is critical

Digital Commons @ New Jersey Institute of Technology (NJIT)

Design and analysis of numerical algorithms for the solution of linear systems on parallel and distributed architectures

Author: Rosni Abdullah (7169939)
Publication venue
Publication date: 01/01/1997
Field of study

The increasing availability of parallel computers is having a very significant impact on all aspects of scientific computation, including algorithm research and software development in numerical linear algebra. In particular, the solution of linear systems, which lies at the heart of most calculations in scientific computing is an important computation found in many engineering and scientific applications. In this thesis, well-known parallel algorithms for the solution of linear systems are compared with implicit parallel algorithms or the Quadrant Interlocking (QI) class of algorithms to solve linear systems. These implicit algorithms are (2x2) block algorithms expressed in explicit point form notation. [Continues.

Loughborough University Institutional Repository

Activities of the Institute for Computer Applications in Science and Engineering (ICASE)

Author
Publication venue
Publication date
Field of study

Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science during the period October 1, 1984 through March 31, 1985 is summarized

NASA Technical Reports Server

Relaxing Synchronization in Distributed Simulated Annealing

Author: Hong Chul-Eui
McMillin Bruce M.
Publication venue: Scholars\u27 Mine
Publication date: 01/10/1992
Field of study

Simulated annealing is an attractive, but expensive, heuristic for approximating the solution to combinatorial optimization problems. Since simulated annealing is a general purpose method, it can be applied to the broad range of NP-complete problems such as the traveling salesman problem, graph theory, and cell placement with a careful control of the cooling schedule. Attempts to parallelize simulated annealing, particularly on distributed memory multicomputers, are hampered by the algorithm’s requirement of a globally consistent system state. In a multicomputer, maintaining the global state S involves explicit message traffic and is a critical performance bottleneck. One way to mitigate this bottleneck is to amortize the overhead of these state updates over as many parallel state changes as possible. By using this technique, errors in the actual cost C(S) of a particular state S will be introduced into the annealing process. This dissertation places analytically derived bounds on the cost error in order to assure convergence to the correct result. The resulting parallel Simulated Annealing algorithm dynamically changes the frequency of global updates as a function of the annealing control parameter, i.e. temperature. Implementation results on an Intel iPSC/2 are reported

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine