2,009 research outputs found

    A Global Communication Optimization Technique Based on Data-Flow Analysis and Linear Algebra

    Get PDF
    Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this paper, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and works on structured programs with conditional statements and nested loops but without arbitrary goto statements. The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distributions and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (an average of 32% reduction), the volume of the data communicated (an average of 37% reduction), and the execution time (an average of 26% reduction)

    Non-BPS Brane Dynamics And Dual Tensor Gauge Theory

    Get PDF
    The action for the long wavelength oscillations of a non-BPS p=3 brane embedded in N=1, D=5 superspace is determined by means of the coset method. The D=4 world volume Nambu-Goldstone boson of broken translation invariance and the two D=4 world volume Weyl spinor Goldstinos of the completely broken supersymmetry describe the excitations of the brane into the broken space and superspace directions. The resulting action is an invariant synthesis of the Akulov-Volkov and Nambu-Goto actions. The D=4 antisymmetric tensor gauge theory action dual to the p=3 brane action is determined.Comment: 15 pages, no figure

    A Hoare-like logic of asserted single-pass instruction sequences

    Get PDF
    We present a formal system for proving the partial correctness of a single-pass instruction sequence as considered in program algebra by decomposition into proofs of the partial correctness of segments of the single-pass instruction sequence concerned. The system is similar to Hoare logics, but takes into account that, by the presence of jump instructions, segments of single-pass instruction sequences may have multiple entry points and multiple exit points. It is intended to support a sound general understanding of the issues with Hoare-like logics for low-level programming languages.Comment: 22 pages, the preliminaries have textual overlaps with the preliminaries in arXiv:1402.4950 [cs.LO] and earlier papers; introduction and conclusions rewritten, explanatory remarks added; introduction partly rewritten; 24 pages, clarifying examples adde

    Development of symbolic algorithms for certain algebraic processes

    Get PDF
    This study investigates the problem of computing the exact greatest common divisor of two polynomials relative to an orthogonal basis, defined over the rational number field. The main objective of the study is to design and implement an effective and efficient symbolic algorithm for the general class of dense polynomials, given the rational number defining terms of their basis. From a general algorithm using the comrade matrix approach, the nonmodular and modular techniques are prescribed. If the coefficients of the generalized polynomials are multiprecision integers, multiprecision arithmetic will be required in the construction of the comrade matrix and the corresponding systems coefficient matrix. In addition, the application of the nonmodular elimination technique on this coefficient matrix extensively applies multiprecision rational number operations. The modular technique is employed to minimize the complexity involved in such computations. A divisor test algorithm that enables the detection of an unlucky reduction is a crucial device for an effective implementation of the modular technique. With the bound of the true solution not known a priori, the test is devised and carefully incorporated into the modular algorithm. The results illustrate that the modular algorithm illustrate its best performance for the class of relatively prime polynomials. The empirical computing time results show that the modular algorithm is markedly superior to the nonmodular algorithms in the case of sufficiently dense Legendre basis polynomials with a small GCD solution. In the case of dense Legendre basis polynomials with a big GCD solution, the modular algorithm is significantly superior to the nonmodular algorithms in higher degree polynomials. For more definitive conclusions, the computing time functions of the algorithms that are presented in this report have been worked out. Further investigations have also been suggested
    corecore