20,540 research outputs found

    Software trace cache

    Get PDF
    We explore the use of compiler optimizations, which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance; the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized, codes have some special characteristics that make them more amenable for high-performance instruction fetch. They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality.Peer ReviewedPostprint (published version

    Partitioning strategy for efficient nonlinear finite element dynamic analysis on multiprocessor computers

    Get PDF
    A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning strategy in which the response of the unsymmetric and antisymmetric response vectors (modes), each obtained by using only a fraction of the degrees of freedom of the original finite element model. The three key elements of the procedure which result in high degree of concurrency throughout the solution process are: (1) mixed (or primitive variable) formulation with independent shape functions for the different fields; (2) operator splitting or restructuring of the discrete equations at each time step to delineate the symmetric and antisymmetric vectors constituting the response; and (3) two level iterative process for generating the response of the structure. An assessment is made of the effectiveness of the procedure on the CRAY X-MP/4 computers

    Parallel computing for the finite element method

    Full text link
    A finite element method is presented to compute time harmonic microwave fields in three dimensional configurations. Nodal-based finite elements have been coupled with an absorbing boundary condition to solve open boundary problems. This paper describes how the modeling of large devices has been made possible using parallel computation, New algorithms are then proposed to implement this formulation on a cluster of workstations (10 DEC ALPHA 300X) and on a CRAY C98. Analysis of the computation efficiency is performed using simple problems. The electromagnetic scattering of a plane wave by a perfect electric conducting airplane is finally given as example

    Alternating-Direction Line-Relaxation Methods on Multicomputers

    Get PDF
    We study the multicom.puter performance of a three-dimensional Navier–Stokes solver based on alternating-direction line-relaxation methods. We compare several multicomputer implementations, each of which combines a particular line-relaxation method and a particular distributed block-tridiagonal solver. In our experiments, the problem size was determined by resolution requirements of the application. As a result, the granularity of the computations of our study is finer than is customary in the performance analysis of concurrent block-tridiagonal solvers. Our best results were obtained with a modified half-Gauss–Seidel line-relaxation method implemented by means of a new iterative block-tridiagonal solver that is developed here. Most computations were performed on the Intel Touchstone Delta, but we also used the Intel Paragon XP/S, the Parsytec SC-256, and the Fujitsu S-600 for comparison

    Adapting the interior point method for the solution of linear programs on high performance computers

    Get PDF
    In this paper we describe a unified algorithmic framework for the interior point method (IPM) of solving Linear Programs (LPs) which allows us to adapt it over a range of high performance computer architectures. We set out the reasons as to why IPM makes better use of high performance computer architecture than the sparse simplex method. In the inner iteration of the IPM a search direction is computed using Newton or higher order methods. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system and the design of data structures to take advantage of coarse grain parallel and massively parallel computer architectures are considered in detail. Finally, we present experimental results of solving NETLIB test problems on examples of these architectures and put forward arguments as to why integration of the system within sparse simplex is beneficial
    • …
    corecore