2,053 research outputs found
A Global Communication Optimization Technique Based on Data-Flow Analysis and Linear Algebra
Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this paper, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and works on structured programs with conditional statements and nested loops but without arbitrary goto statements. The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distributions and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (an average of 32% reduction), the volume of the data communicated (an average of 37% reduction), and the execution time (an average of 26% reduction)
Non-BPS Brane Dynamics And Dual Tensor Gauge Theory
The action for the long wavelength oscillations of a non-BPS p=3 brane
embedded in N=1, D=5 superspace is determined by means of the coset method. The
D=4 world volume Nambu-Goldstone boson of broken translation invariance and the
two D=4 world volume Weyl spinor Goldstinos of the completely broken
supersymmetry describe the excitations of the brane into the broken space and
superspace directions. The resulting action is an invariant synthesis of the
Akulov-Volkov and Nambu-Goto actions. The D=4 antisymmetric tensor gauge theory
action dual to the p=3 brane action is determined.Comment: 15 pages, no figure
A Hoare-like logic of asserted single-pass instruction sequences
We present a formal system for proving the partial correctness of a
single-pass instruction sequence as considered in program algebra by
decomposition into proofs of the partial correctness of segments of the
single-pass instruction sequence concerned. The system is similar to Hoare
logics, but takes into account that, by the presence of jump instructions,
segments of single-pass instruction sequences may have multiple entry points
and multiple exit points. It is intended to support a sound general
understanding of the issues with Hoare-like logics for low-level programming
languages.Comment: 22 pages, the preliminaries have textual overlaps with the
preliminaries in arXiv:1402.4950 [cs.LO] and earlier papers; introduction and
conclusions rewritten, explanatory remarks added; introduction partly
rewritten; 24 pages, clarifying examples adde
Recommended from our members
Efficient recursion termination for function-free horn logic
We present an efficient scheme to terminate infinite recursion in function-free Horn logic. In [BW84], Brough and Walker show that a preorder linear resolution with a goal termination strategy is incomplete, i.e. it must miss some answers. Their theory is true if left-recursion is allowed. The crucial assumption underlying Brough and Walker's theory is that the order of literals in a clause should not be altered. This assumption, however, is not necessary in programs that do not contain any extra-logical features such as the 'cut' symbol of Prolog. This is because the order of literals does not affect the correctness of such programs, only their efficiency. In this paper, we show that left-recursion can always be eliminated. The idea is to transform loops of the input set into safe loops, that are left-recursion free. Consequently, the goal termination strategy guarantees to always terminate properly with all possible answers; thus, it is complete in the domain of safe loops. We further show that all rules in a safe loop can be transformed into rules that begin with a base literal. This permits the implementation of a simple scheme to carry out the goal termination strategy more efficiently. The basic idea of this scheme is to distribute the history containing all executed goals over assertions, rather than maintaining it as a centralized data structure. This reduces the amount of work performed during execution
Development of symbolic algorithms for certain algebraic processes
This study investigates the problem of computing the exact greatest common divisor of two polynomials relative to an orthogonal basis, defined over the rational number field. The main objective of the study is to design and implement an effective and efficient symbolic algorithm for the general class of dense polynomials, given the rational number defining terms of their basis. From a general algorithm using the comrade matrix approach, the nonmodular and modular techniques are prescribed. If the coefficients of the generalized polynomials are multiprecision integers, multiprecision arithmetic will be required in the construction of the comrade matrix and the corresponding systems coefficient matrix. In addition, the application of the nonmodular elimination technique on this coefficient matrix extensively applies multiprecision rational number operations. The modular technique is employed to minimize the complexity involved in such computations. A divisor test algorithm that enables the detection of an unlucky reduction is a crucial device for an effective implementation of the modular technique. With the bound of the true solution not known a priori, the test is devised and carefully incorporated into the modular algorithm. The results illustrate that the modular algorithm illustrate its best performance for the class of relatively prime polynomials. The empirical computing time results show that the modular algorithm is markedly superior to the nonmodular algorithms in the case of sufficiently dense Legendre basis polynomials with a small GCD solution. In the case of dense Legendre basis polynomials with a big GCD solution, the modular algorithm is significantly superior to the nonmodular algorithms in higher degree polynomials. For more definitive conclusions, the computing time functions of the algorithms that are presented in this report have been worked out. Further investigations have also been suggested
- …