1,229 research outputs found
A semantics and implementation of a causal logic programming language
The increasingly widespread availability of multicore and manycore computers demands new programming languages that make parallel programming dramatically easier and less error prone. This paper describes a semantics for a new class of declarative programming languages that support massive amounts of implicit parallelism
Improving Implicit Parallelism
We propose a new technique for exploiting the inherent parallelism in lazy functional programs. Known as implicit parallelism, the goal of writing a sequential program and having the compiler improve its performance by determining what can be executed in parallel has been studied for many years. Our technique abandons the idea that a compiler should accomplish this feat in ‘one shot’ with static analysis and instead allow the compiler to improve upon the static analysis using iterative feedback.
We demonstrate that iterative feedback can be relatively simple when the source language is a lazy purely functional programming language. We present three main contributions to the field: the auto- matic derivation of parallel strategies from a demand on a structure, and two new methods of feedback-directed auto-parallelisation. The first method treats the runtime of the program as a black box and uses the ‘wall-clock’ time as a fitness function to guide a heuristic search on bitstrings representing the parallel setting of the program. The second feedback approach is profile directed. This allows the compiler to use profile data that is gathered by the runtime system as the pro- gram executes. This allows the compiler to determine which threads are not worth the overhead of creating them.
Our results show that the use of feedback-directed compilation can be a good source of refinement for the static analysis techniques that struggle to account for the cost of a computation. This lifts the burden of ‘is this parallelism worthwhile?’ away from the static phase of compilation and to the runtime, which is better equipped to answer the question
Exploiting implicit parallelism in SPARC instruction execution
One way to increase the performance of a processing unit is to exploit implicit parallelism. Exploiting this parallelism requires a processor to dynamically select instructions in a serial instruction stream which can be executed in parallel. As operations are computed concurrently, an execution speedup will occur. This thesis studies how effectively implicit parallelism could be exploited in the Scalable Pro cessor Architecture (SPARC)[9], a reduced instruction set architecture developed by Sun Microsystems. First an analysis of SPARC instruction traces will determine the optimal speedup that would be realized by a processor with infinite resources. Next, an analytical model of a parallelizing processor will be developed and used to predict the effects of limited resources on optimal speedup. Lastly, a SPARC simulator will be employed to determine the actual speedup of resource limited configurations, and the results will be correlated with the analytical model
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism
The shift of the microprocessor industry towards multicore architectures has
placed a huge burden on the programmers by requiring explicit parallelization
for performance. Implicit Parallelization is an alternative that could ease the
burden on programmers by parallelizing applications ???under the covers??? while
maintaining sequential semantics externally. This thesis develops a novel
approach for thinking about parallelism, by casting the problem of
parallelization in terms of instruction criticality. Using this approach,
parallelism in a program region is readily identified when certain conditions
about fetch-criticality are satisfied by the region. The thesis formalizes this
approach by developing a criticality-driven model of task-based
parallelization. The model can accurately predict the parallelism that would be
exposed by potential task choices by capturing a wide set of sources of
parallelism as well as costs to parallelization.
The criticality-driven model enables the development of two key components for
Implicit Parallelization: a task selection policy, and a bottleneck analysis
tool. The task selection policy can partition a single-threaded program into
tasks that will profitably execute concurrently on a multicore architecture in
spite of the costs associated with enforcing data-dependences and with
task-related actions. The bottleneck analysis tool gives feedback to the
programmers about data-dependences that limit parallelism. In particular, there
are several ???accidental dependences??? that can be easily removed with large
improvements in parallelism. These tools combine into a systematic methodology
for performance tuning in Implicit Parallelization. Finally, armed with the
criticality-driven model, the thesis revisits several architectural design
decisions, and finds several encouraging ways forward to increase the scope of
Implicit Parallelization.unpublishednot peer reviewe
Or-Parallel Prolog Execution on Clusters of Multicores
Logic Programming languages, such as Prolog, provide an excellent framework for the parallel execution of logic programs. In particular, the inherent non-determinism in the way logic programs are structured makes Prolog very attractive for the exploitation of implicit parallelism. One of the most noticeable sources of implicit parallelism in Prolog programs is or-parallelism. Or-parallelism arises from the simultaneous evaluation of a subgoal call against the clauses that match that call. Arguably, the most successful model for or-parallelism is environment copying, that has been efficiently used in the implementation of or-parallel Prolog systems both on shared memory and distributed memory architectures. Nowadays, multicores and clusters of multicores are becoming the norm and, although, many parallel Prolog systems have been developed in the past, to the best of our knowledge, none of them was specially designed to explore the combination of shared with distributed memory architectures. Motivated by our past experience, in designing and developing parallel Prolog systems based on environment copying, we propose a novel computational model to efficiently exploit implicit parallelism from large scale real-world applications specialized for the novel architectures based on clusters of multicores
Threads and Or-Parallelism Unified
One of the main advantages of Logic Programming (LP) is that it provides an
excellent framework for the parallel execution of programs. In this work we
investigate novel techniques to efficiently exploit parallelism from real-world
applications in low cost multi-core architectures. To achieve these goals, we
revive and redesign the YapOr system to exploit or-parallelism based on a
multi-threaded implementation. Our new approach takes full advantage of the
state-of-the-art fast and optimized YAP Prolog engine and shares the underlying
execution environment, scheduler and most of the data structures used to
support YapOr's model. Initial experiments with our new approach consistently
achieve almost linear speedups for most of the applications, proving itself as
a good alternative for exploiting implicit parallelism in the currently
available low cost multi-core architectures.Comment: 17 pages, 21 figures, International Conference on Logic Programming
(ICLP 2010
- …