12 research outputs found
Recommended from our members
DOLIB: Distributed Object Library
This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon
Recommended from our members
A New Shared-Memory Programming Paradigm for Molecular Dynamics Simulations on the Intel Paragon
This report describes the use of shared memory emulation with DOLIB (Distributed Object Library) to simplify parallel programming on the Intel Paragon. A molecular dynamics application is used as an example to illustrate the use of the DOLIB shared memory library. SOTON PAR, a parallel molecular dynamics code with explicit message-passing using a Lennard-Jones 6-12 potential, is rewritten using DOLIB primitives. The resulting code has no explicit message primitives and resembles a serial code. The new code can perform dynamic load balancing and achieves better performance than the original parallel code with explicit message-passing
Recommended from our members
DONIO: Distributed Object Network I/O Library
This report describes the use and implementation of DONIO (Distributed Object Network I/O), a library of routines that provide fast file I/O capabilities in the Intel iPSC/860 and Paragon distributed memory parallel environments. DONIO caches a copy of the file in memory distributed across all processors. Disk I/O routines (such as read, write, and lseek) are replaced by calls to DONIO routines, which translate these operations into message communication to update the cached data. Experiments on the Intel Paragon show that the cost of concurrent disk I/O using DONIO for large files can be 15-30 times smaller than using standard disk I/O
Recommended from our members
Are Bilinear Quadrilaterals Better Than Linear Triangles?
This paper compares the theoretical effectiveness of bilinear approximation over quadrilaterals with linear approximation over triangles. Anisotropic mesh transformation is used to generate asymptotically optimally efficient meshes for piecewise linear interpolation over triangles and bilinear interpolation over quadrilaterals. For approximating a convex function, although bilinear quadrilaterals are more efficient, linear triangles are more accurate and may be preferred in finite element computations; whereas for saddle-shaped functions, quadrilaterals may offer a higher order approximation on a well-designed mesh. A surprising finding is different grid orientations may yield an order of magnitude improvement in approximation accuracy
A SBS-BD based solver for domain decomposition in BE methods
In boundary element methods (BEM), subregioning may be needed either to model complex solids (with cracks, stiffeners, layers, inclusions, etc.) or simply to decompose a problem by computational reasons (e.g. for parallelization). Since the development of the first BEM codes, many attempts have been made to efficiently devise generic boundary-element subregioning techniques. Crucial points are how to profit from the sparsity of the global matrix, and how to deal with traction discontinuities. In this work, the most fundamental steps for efficiently devising reliable and efficient subregioning algorithms are discussed. The subregion-by-subregion (SBS) algorithm and the preconditioning of the embedded Krylov solver are addressed. Besides the BiCG solver, the BiCGSTAB(l) is newly incorporated into the BE-SBS code. The 3D microstructural analysis of carbon-nanotube-reinforced composites (CNT composites) is considered to verify the performance of the algorithm. Numerical results showing the efficiency of the preconditioned solvers studied are presented
Recommended from our members
Two variants of minimum discarded fill ordering
It is well known that the ordering of the unknowns can have a significant effect on the convergence of Preconditioned Conjugate Gradient (PCG) methods. There has been considerable experimental work on the effects of ordering for regular finite difference problems. In many cases, good results have been obtained with preconditioners based on diagonal, spiral or natural row orderings. However, for finite element problems having unstructured grids or grids generated by a local refinement approach, it is difficult to define many of the orderings for more regular problems. A recently proposed Minimum Discarded Fill (MDF) ordering technique is effective in finding high quality Incomplete LU (ILU) preconditioners, especially for problems arising from unstructured finite element grids. Testing indicates this algorithm can identify a rather complicated physical structure in an anisotropic problem and orders the unknowns in the preferred'' direction. The MDF technique may be viewed as the numerical analogue of the minimum deficiency algorithm in sparse matrix technology. At any stage of the partial elimination, the MDF technique chooses the next pivot node so as to minimize the amount of discarded fill. In this work, two efficient variants of the MDF technique are explored to produce cost-effective high-order ILU preconditioners. The Threshold MDF orderings combine MDF ideas with drop tolerance techniques to identify the sparsity pattern in the ILU preconditioners. These techniques identify an ordering that encourages fast decay of the entries in the ILU factorization. The Minimum Update Matrix (MUM) ordering technique is a simplification of the MDF ordering and is closely related to the minimum degree algorithm. The MUM ordering is especially for large problems arising from Navier-Stokes problems. Some interesting pictures of the orderings are presented using a visualization tool. 22 refs., 4 figs., 7 tabs