657 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
A Jacobi-based algorithm for computing symmetric eigenvalues and eigenvectors in a two-dimensional mesh
The paper proposes an algorithm for computing symmetric eigenvalues and eigenvectors that uses a one-sided Jacobi approach and is targeted to a multicomputer in which nodes can be arranged as a two-dimensional mesh with an arbitrary number of rows and columns. The algorithm is analysed through simple analytical models of execution time, which show that an adequate choice of the mesh configuration (number of rows and columns) can improve performance significantly, with respect to a one-dimensional configuration, which is the most frequently considered scenario in current proposals. This improvement is especially noticeable in large systems.Peer ReviewedPostprint (published version
Activities of the Institute for Computer Applications in Science and Engineering
Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science during the period April 1, 1985 through October 2, 1985 is summarized
Adaptive BDDC in Three Dimensions
The adaptive BDDC method is extended to the selection of face constraints in
three dimensions. A new implementation of the BDDC method is presented based on
a global formulation without an explicit coarse problem, with massive
parallelism provided by a multifrontal solver. Constraints are implemented by a
projection and sparsity of the projected operator is preserved by a generalized
change of variables. The effectiveness of the method is illustrated on several
engineering problems.Comment: 28 pages, 9 figures, 9 table
Hypercube algorithms on mesh connected multicomputers
A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algorithm that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: (a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; (b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and (c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, the paper only focuses on the particular case of meshes.Peer ReviewedPostprint (published version
Statistical Numerical Methods for Eigenvalue Problem. Parallel Implementation
MSC subject classification: 65C05, 65U05.The problem of evaluating the smallest eigenvalue of real symmetric matrices using statistical numerical methods is considered
Summary of research in applied mathematics, numerical analysis, and computer sciences
The major categories of current ICASE research programs addressed include: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effective numerical methods; computational problems in engineering and physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and computer systems and software, especially vector and parallel computers
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics, part 2
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallellize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet and perhaps will never be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient than explicit codes when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system
On-chip implementation of multiprocessor networks and switch fabrics
On-chip implementation of multiprocessor systems needs to planarise the interconnect networks onto the silicon floorplan. Compared with traditional ASIC/SoC architectures, Multiprocessor Systems on Chips (MPSoC) node processors are homogeneous, and MPSoC network topologies are regular. Therefore, traditional ASIC floorplanning methodologies that perform macro placement are not suitable for MPSoC designs. We propose an automated MPSoC physical planning methodology. REGULAY can generate an optimal floorplan for different topologies under different design constraints. Compared with traditional floorplanning approaches, REGULAY shows significant advantages in reducing the total interconnect wirelength while preserving the regularity and hierarchy of the network topology
- …