856 research outputs found
Literature Review For Networking And Communication Technology
Report documents the results of a literature search performed in the area of networking and communication technology
Divide-and-conquer algorithms for multiprocessors
During the past decade there has been a tremendous surge in understanding the nature of parallel computation. A number of parallel computers are commercially available. However, there are some problems in developing application programs on these computers;This dissertation considers various issues involved in implementing parallel algorithms on Multiple Instruction Multiple Data (MIMD) machines with a bounded number of processors. Strategies for implementing divide-and-conquer algorithms on MIMD machines are proposed. Results linking time complexity, communication complexity and the complexity of divide-and-combine functions of divide-and-conquer algorithms are analyzed. An efficient criterion for partitioning a parallel program is proposed and a method for obtaining a closed form expression for time complexity of a parallel program in terms of problem size and number of processors is developed
GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh
Refinement code), which has adopted a novel approach to improve the performance
of adaptive mesh refinement (AMR) astrophysical simulations by a large factor
with the use of the graphic processing unit (GPU). The AMR implementation is
based on a hierarchy of grid patches with an oct-tree data structure. We adopt
a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a
multi-level relaxation scheme for the Poisson solver. Both solvers have been
implemented in GPU, by which hundreds of patches can be advanced in parallel.
The computational overhead associated with the data transfer between CPU and
GPU is carefully reduced by utilizing the capability of asynchronous memory
copies in GPU, and the computing time of the ghost-zone values for each patch
is made to diminish by overlapping it with the GPU computations. We demonstrate
the accuracy of the code by performing several standard test problems in
astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster
system. We measure the performance of the code by performing purely-baryonic
cosmological simulations in different hardware implementations, in which
detailed timing analyses provide comparison between the computations with and
without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are
demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with
8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included.
Accepted for publication in ApJ
Convergence theories of distributed iterative processes: a survey
Bibliography: p. 43-45."October, 1984." (Revision of LIDS-P-1342)" NSF-ECS-8217668" " ONR/N00014-75-C-1183" " ONR-N00014-77-C-0532(NR041-519)" ONR N00014-84-K-0519"Dimitri P. Bertsekas, John N. Tsitsiklis, Michael Athans
Methodology for modeling high performance distributed and parallel systems
Performance modeling of distributed and parallel systems is of considerable importance to the high performance computing community. To achieve high performance, proper task or process assignment and data or file allocation among processing sites is essential. This dissertation describes an elegant approach to model distributed and parallel systems, which combines the optimal static solutions for data allocation with dynamic policies for task assignment. A performance-efficient system model is developed using analytical tools and techniques.
The system model is accomplished in three steps. First, the basic client-server model which allows only data transfer is evaluated. A prediction and evaluation method is developed to examine the system behavior and estimate performance measures. The method is based on known product form queueing networks. The next step extends the model so that each site of the system behaves as both client and server. A data-allocation strategy is designed at this stage which optimally assigns the data to the processing sites. The strategy is based on flow deviation technique in queueing models. The third stage considers process-migration policies. A novel on-line adaptive load-balancing algorithm is proposed which dynamically migrates processes and transfers data among different sites to minimize the job execution cost. The gradient-descent rule is used to optimize the cost function, which expresses the cost of process execution at different processing sites.
The accuracy of the prediction method and the effectiveness of the analytical techniques is established by the simulations. The modeling procedure described here is general and applicable to any message-passing distributed and parallel system. The proposed techniques and tools can be easily utilized in other related areas such as networking and operating systems. This work contributes significantly towards the design of distributed and parallel systems where performance is critical
Design and analysis of numerical algorithms for the solution of linear systems on parallel and distributed architectures
The increasing availability of parallel computers is having a very significant impact on
all aspects of scientific computation, including algorithm research and software
development in numerical linear algebra. In particular, the solution of linear systems,
which lies at the heart of most calculations in scientific computing is an important
computation found in many engineering and scientific applications.
In this thesis, well-known parallel algorithms for the solution of linear systems are
compared with implicit parallel algorithms or the Quadrant Interlocking (QI) class of
algorithms to solve linear systems. These implicit algorithms are (2x2) block
algorithms expressed in explicit point form notation. [Continues.
Activities of the Institute for Computer Applications in Science and Engineering (ICASE)
Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science during the period October 1, 1984 through March 31, 1985 is summarized
Relaxing Synchronization in Distributed Simulated Annealing
Simulated annealing is an attractive, but expensive, heuristic for approximating the solution to combinatorial optimization problems. Since simulated annealing is a general purpose method, it can be applied to the broad range of NP-complete problems such as the traveling salesman problem, graph theory, and cell placement with a careful control of the cooling schedule.
Attempts to parallelize simulated annealing, particularly on distributed memory multicomputers, are hampered by the algorithm’s requirement of a globally consistent system state. In a multicomputer, maintaining the global state S involves explicit message traffic and is a critical performance bottleneck. One way to mitigate this bottleneck is to amortize the overhead of these state updates over as many parallel state changes as possible. By using this technique, errors in the actual cost C(S) of a particular state S will be introduced into the annealing process.
This dissertation places analytically derived bounds on the cost error in order to assure convergence to the correct result. The resulting parallel Simulated Annealing algorithm dynamically changes the frequency of global updates as a function of the annealing control parameter, i.e. temperature. Implementation results on an Intel iPSC/2 are reported
- …