651 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response
A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership
Recommended from our members
Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures
The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu
Overview of Large-Scale Computing: The Past, the Present, and the Future
published_or_final_versio
Doctor of Philosophy
dissertationPartial differential equations (PDEs) are widely used in science and engineering to model phenomena such as sound, heat, and electrostatics. In many practical science and engineering applications, the solutions of PDEs require the tessellation of computational domains into unstructured meshes and entail computationally expensive and time-consuming processes. Therefore, efficient and fast PDE solving techniques on unstructured meshes are important in these applications. Relative to CPUs, the faster growth curves in the speed and greater power efficiency of the SIMD streaming processors, such as GPUs, have gained them an increasingly important role in the high-performance computing area. Combining suitable parallel algorithms and these streaming processors, we can develop very efficient numerical solvers of PDEs. The contributions of this dissertation are twofold: proposal of two general strategies to design efficient PDE solvers on GPUs and the specific applications of these strategies to solve different types of PDEs. Specifically, this dissertation consists of four parts. First, we describe the general strategies, the domain decomposition strategy and the hybrid gathering strategy. Next, we introduce a parallel algorithm for solving the eikonal equation on fully unstructured meshes efficiently. Third, we present the algorithms and data structures necessary to move the entire FEM pipeline to the GPU. Fourth, we propose a parallel algorithm for solving the levelset equation on fully unstructured 2D or 3D meshes or manifolds. This algorithm combines a narrowband scheme with domain decomposition for efficient levelset equation solving
Parallel implementation of the finite element method on shared memory multiprocessors
PhD ThesisThe work presented in this thesis concerns parallel methods for finite element
analysis. The research has been funded by British Gas and some of the presented
material involves work on their software. Practical problems involving the finite
element method can use a large amount of processing power and the execution
times can be very large. It is consequently important to investigate the possibilities
for the parallel implementation of the method. The research has been carried out
on an Encore Multimax, a shared memory multiprocessor with 14 identical CPU's.
We firstly experimented on autoparallelising a large British Gas finite element
program (GASP4) using Encore's parallelising Fortran compiler (epf). The par-
allel program generated by epj proved not to be efficient. The main reasons are
the complexity of the code and small grain parallelism. Since the program is hard
to analyse for the compiler at high levels, only small grain parallelism has been
inserted automatically into the code. This involves a great deal of low level syn-
chronisations which produce large overheads and cause inefficiency. A detailed
analysis of the autoparallelised code has been made with a view to determining
the reasons for the inefficiency. Suggestions have also been made about writing
programs such that they are suitable for efficient autoparallelisation.
The finite element method consists of the assembly of a stiffness matrix and
the solution of a set of simultaneous linear equations. A sparse representation of
the stiffness matrix has been used to allow experimentation on large problems.
Parallel assembly techniques for the sparse representation have been developed.
Some of these methods have proved to be very efficient giving speed ups that are
near ideal.
For the solution phase, we have used the preconditioned conjugate gradient
method (PCG). An incomplete LU factorization ofthe stiffness matrix with no fill-
in (ILU(O)) has been found to be an effective preconditioner. The factors can be
obtained at a low cost. We have parallelised all the steps of the PCG method. The
main bottleneck is the triangular solves (preconditioning operations) at each step.
Two parallel methods of triangular solution have been implemented. One is based
on level scheduling (row-oriented parallelism) and the other is a new approach
called independent columns (column-oriented parallelism). The algorithms have
been tested for row and red-black orderings of the nodal unknowns in the finite
element meshes considered.
The best speed ups obtained are 7.29 (on 12 processors) for level scheduling
and 7.11 (on 12 processors) for independent columns. Red-black ordering gives
rise to better parallel performance than row ordering in general. An analysis of
methods for the improvement of the parallel efficiency has been made.British Ga
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
- …