2,778 research outputs found

    Non-Strict Independence-Based Program Parallelization Using Sharing and Freeness Information.

    Get PDF
    The current ubiquity of multi-core processors has brought renewed interest in program parallelization. Logic programs allow studying the parallelization of programs with complex, dynamic data structures with (declarative) pointers in a comparatively simple semantic setting. In this context, automatic parallelizers which exploit and-parallelism rely on notions of independence in order to ensure certain efficiency properties. “Non-strict” independence is a more relaxed notion than the traditional notion of “strict” independence which still ensures the relevant efficiency properties and can allow considerable more parallelism. Non-strict independence cannot be determined solely at run-time (“a priori”) and thus global analysis is a requirement. However, extracting non-strict independence information from available analyses and domains is non-trivial. This paper provides on one hand an extended presentation of our classic techniques for compile-time detection of non-strict independence based on extracting information from (abstract interpretation-based) analyses using the now well understood and popular Sharing + Freeness domain. This includes algorithms for combined compile-time/run-time detection which involve special run-time checks for this type of parallelism. In addition, we propose herein novel annotation (parallelization) algorithms, URLP and CRLP, which are specially suited to non-strict independence. We also propose new ways of using the Sharing + Freeness information to optimize how the run-time environments of goals are kept apart during parallel execution. Finally, we also describe the implementation of these techniques in our parallelizing compiler and recall some early performance results. We provide as well an extended description of our pictorial representation of sharing and freeness information

    An abstract machine for restricted and-parallel execution of logic programs

    Full text link
    Although the sequential execution speed of logic programs has been greatly improved by the concepts introduced in the Warren Abstract Machine (WAM), parallel execution represents the only way to increase this speed beyond the natural limits of sequential systems. However, most proposed parallel logic programming execution models lack the performance optimizations and storage efficiency of sequential systems. This paper presents a parallel abstract machine which is an extension of the WAM and is thus capable of supporting ANDParallelism without giving up the optimizations present in sequential implementations. A suitable instruction set, which can be used as a target by a variety of logic programming languages, is also included. Special instructions are provided to support a generalized version of "Restricted AND-Parallelism" (RAP), a technique which reduces the overhead traditionally associated with the run-time management of variable binding conflicts to a series of simple run-time checks, which select one out of a series of compiled execution graphs

    Relating goal scheduling, precedence, and memory management in and-parallel execution of logic programs

    Full text link
    The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models

    Programming with global analysis

    Get PDF
    Global data-flow analysis of (constraint) logic programs, which is generally based on abstract interpretation [7], is reaching a comparatively high level of maturity. A natural question is whether it is time for its routine incorporation in standard compilers, something which, beyond a few experimental systems, has not happened to date. Such incorporation arguably makes good sense only if: • the range of applications of global analysis is large enough to justify the additional complication in the compiler, and • global analysis technology can deal with all the features of "practical" languages (e.g., the ISO-Prolog built-ins) and "scales up" for large programs. We present a tutorial overview of a number of concepts and techniques directly related to the issues above, with special emphasis on the first one. In particular, we concéntrate on novel uses of global analysis during program development and debugging, rather than on the more traditional application área of program optimization. The idea of using abstract interpretation for validation and diagnosis has been studied in the context of imperative programming [2] and also of logic programming. The latter work includes issues such as using approximations to reduce the burden posed on programmers by declarative debuggers [6, 3] and automatically generating and checking assertions [4, 5] (which includes the more traditional type checking of strongly typed languages, such as Gódel or Mercury [1, 8, 9]) We also review some solutions for scalability including modular analysis, incremental analysis, and widening. Finally, we discuss solutions for dealing with meta-predicates, side-effects, delay declarations, constraints, dynamic predicates, and other such features which may appear in practical languages. In the discussion we will draw both from the literature and from our experience and that of others in the development and use of the CIAO system analyzer. In order to emphasize the practical aspects of the solutions discussed, the presentation of several concepts will be illustrated by examples run on the CIAO system, which makes extensive use of global analysis and assertions

    Introduction to the 26th International Conference on Logic Programming Special Issue

    Get PDF
    This is the preface to the 26th International Conference on Logic Programming Special IssueComment: 6 page

    Existence for the steady problem of a mixture of two power-law fluids

    Full text link
    The steady problem resulting from a mixture of two distinct fluids of power-law type is analyzed in this work. Mathematically, the problem results from the superposition of two power laws, one for a constant power-law index with other for a variable one. For the associated boundary-value problem, we prove the existence of very weak solutions, provided the variable power-law index is bounded from above by the constant one. This result requires the lowest possible assumptions on the variable power-law index and, as a particular case, extends the existence result by Ladyzhenskaya dated from 1969 to the case of a variable exponent and for all zones of the pseudoplastic region. In a distinct result, we extend a classical theorem on the existence of weak solutions to the case of our problem.Comment: 22. arXiv admin note: substantial text overlap with arXiv:1203.679

    Memory performance of and-parallel prolog on shared-memory architectures

    Get PDF
    The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology
    • …
    corecore