1,229 research outputs found

    A semantics and implementation of a causal logic programming language

    Get PDF
    The increasingly widespread availability of multicore and manycore computers demands new programming languages that make parallel programming dramatically easier and less error prone. This paper describes a semantics for a new class of declarative programming languages that support massive amounts of implicit parallelism

    Improving Implicit Parallelism

    Get PDF
    We propose a new technique for exploiting the inherent parallelism in lazy functional programs. Known as implicit parallelism, the goal of writing a sequential program and having the compiler improve its performance by determining what can be executed in parallel has been studied for many years. Our technique abandons the idea that a compiler should accomplish this feat in ‘one shot’ with static analysis and instead allow the compiler to improve upon the static analysis using iterative feedback. We demonstrate that iterative feedback can be relatively simple when the source language is a lazy purely functional programming language. We present three main contributions to the field: the auto- matic derivation of parallel strategies from a demand on a structure, and two new methods of feedback-directed auto-parallelisation. The first method treats the runtime of the program as a black box and uses the ‘wall-clock’ time as a fitness function to guide a heuristic search on bitstrings representing the parallel setting of the program. The second feedback approach is profile directed. This allows the compiler to use profile data that is gathered by the runtime system as the pro- gram executes. This allows the compiler to determine which threads are not worth the overhead of creating them. Our results show that the use of feedback-directed compilation can be a good source of refinement for the static analysis techniques that struggle to account for the cost of a computation. This lifts the burden of ‘is this parallelism worthwhile?’ away from the static phase of compilation and to the runtime, which is better equipped to answer the question

    Exploiting implicit parallelism in SPARC instruction execution

    Get PDF
    One way to increase the performance of a processing unit is to exploit implicit parallelism. Exploiting this parallelism requires a processor to dynamically select instructions in a serial instruction stream which can be executed in parallel. As operations are computed concurrently, an execution speedup will occur. This thesis studies how effectively implicit parallelism could be exploited in the Scalable Pro cessor Architecture (SPARC)[9], a reduced instruction set architecture developed by Sun Microsystems. First an analysis of SPARC instruction traces will determine the optimal speedup that would be realized by a processor with infinite resources. Next, an analytical model of a parallelizing processor will be developed and used to predict the effects of limited resources on optimal speedup. Lastly, a SPARC simulator will be employed to determine the actual speedup of resource limited configurations, and the results will be correlated with the analytical model

    Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    Get PDF
    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications ???under the covers??? while maintaining sequential semantics externally. This thesis develops a novel approach for thinking about parallelism, by casting the problem of parallelization in terms of instruction criticality. Using this approach, parallelism in a program region is readily identified when certain conditions about fetch-criticality are satisfied by the region. The thesis formalizes this approach by developing a criticality-driven model of task-based parallelization. The model can accurately predict the parallelism that would be exposed by potential task choices by capturing a wide set of sources of parallelism as well as costs to parallelization. The criticality-driven model enables the development of two key components for Implicit Parallelization: a task selection policy, and a bottleneck analysis tool. The task selection policy can partition a single-threaded program into tasks that will profitably execute concurrently on a multicore architecture in spite of the costs associated with enforcing data-dependences and with task-related actions. The bottleneck analysis tool gives feedback to the programmers about data-dependences that limit parallelism. In particular, there are several ???accidental dependences??? that can be easily removed with large improvements in parallelism. These tools combine into a systematic methodology for performance tuning in Implicit Parallelization. Finally, armed with the criticality-driven model, the thesis revisits several architectural design decisions, and finds several encouraging ways forward to increase the scope of Implicit Parallelization.unpublishednot peer reviewe

    Or-Parallel Prolog Execution on Clusters of Multicores

    Get PDF
    Logic Programming languages, such as Prolog, provide an excellent framework for the parallel execution of logic programs. In particular, the inherent non-determinism in the way logic programs are structured makes Prolog very attractive for the exploitation of implicit parallelism. One of the most noticeable sources of implicit parallelism in Prolog programs is or-parallelism. Or-parallelism arises from the simultaneous evaluation of a subgoal call against the clauses that match that call. Arguably, the most successful model for or-parallelism is environment copying, that has been efficiently used in the implementation of or-parallel Prolog systems both on shared memory and distributed memory architectures. Nowadays, multicores and clusters of multicores are becoming the norm and, although, many parallel Prolog systems have been developed in the past, to the best of our knowledge, none of them was specially designed to explore the combination of shared with distributed memory architectures. Motivated by our past experience, in designing and developing parallel Prolog systems based on environment copying, we propose a novel computational model to efficiently exploit implicit parallelism from large scale real-world applications specialized for the novel architectures based on clusters of multicores

    Threads and Or-Parallelism Unified

    Full text link
    One of the main advantages of Logic Programming (LP) is that it provides an excellent framework for the parallel execution of programs. In this work we investigate novel techniques to efficiently exploit parallelism from real-world applications in low cost multi-core architectures. To achieve these goals, we revive and redesign the YapOr system to exploit or-parallelism based on a multi-threaded implementation. Our new approach takes full advantage of the state-of-the-art fast and optimized YAP Prolog engine and shares the underlying execution environment, scheduler and most of the data structures used to support YapOr's model. Initial experiments with our new approach consistently achieve almost linear speedups for most of the applications, proving itself as a good alternative for exploiting implicit parallelism in the currently available low cost multi-core architectures.Comment: 17 pages, 21 figures, International Conference on Logic Programming (ICLP 2010
    corecore