209 research outputs found
Incremental copying garbage collection for WAM-based Prolog systems
The design and implementation of an incremental copying heap garbage
collector for WAM-based Prolog systems is presented. Its heap layout consists
of a number of equal-sized blocks. Other changes to the standard WAM allow
these blocks to be garbage collected independently. The independent collection
of heap blocks forms the basis of an incremental collecting algorithm which
employs copying without marking (contrary to the more frequently used mark©
or mark&slide algorithms in the context of Prolog). Compared to standard
semi-space copying collectors, this approach to heap garbage collection lowers
in many cases the memory usage and reduces pause times. The algorithm also
allows for a wide variety of garbage collection policies including generational
ones. The algorithm is implemented and evaluated in the context of hProlog.Comment: 33 pages, 22 figures, 5 tables. To appear in Theory and Practice of
Logic Programming (TPLP
Optimizing the SICStus Prolog virtual machine instruction set
The Swedish Institute of Computer Science (SICS) is the vendor of SICStus Prolog.
To decrease execution time and reduce space requirements, variants of SICStus
Prolog's virtual instruction set were investigated. Semi-automatic ways of finding
candidate sets of instructions to combine or specialize were developed and used.
Several virtual machines were implemented and the relationship between improvements
by combinations and by specializations were investigated. The benefits of specializations
and combinations of instructions to the performance of the emulator is on the
average of the order of 10%. The code size reduction is 15%
Heap Reference Analysis Using Access Graphs
Despite significant progress in the theory and practice of program analysis,
analysing properties of heap data has not reached the same level of maturity as
the analysis of static and stack data. The spatial and temporal structure of
stack and static data is well understood while that of heap data seems
arbitrary and is unbounded. We devise bounded representations which summarize
properties of the heap data. This summarization is based on the structure of
the program which manipulates the heap. The resulting summary representations
are certain kinds of graphs called access graphs. The boundedness of these
representations and the monotonicity of the operations to manipulate them make
it possible to compute them through data flow analysis.
An important application which benefits from heap reference analysis is
garbage collection, where currently liveness is conservatively approximated by
reachability from program variables. As a consequence, current garbage
collectors leave a lot of garbage uncollected, a fact which has been confirmed
by several empirical studies. We propose the first ever end-to-end static
analysis to distinguish live objects from reachable objects. We use this
information to make dead objects unreachable by modifying the program. This
application is interesting because it requires discovering data flow
information representing complex semantics. In particular, we discover four
properties of heap data: liveness, aliasing, availability, and anticipability.
Together, they cover all combinations of directions of analysis (i.e. forward
and backward) and confluence of information (i.e. union and intersection). Our
analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates
referees' comment
Description and Optimization of Abstract Machines in a Dialect of Prolog
In order to achieve competitive performance, abstract machines for Prolog and
related languages end up being large and intricate, and incorporate
sophisticated optimizations, both at the design and at the implementation
levels. At the same time, efficiency considerations make it necessary to use
low-level languages in their implementation. This makes them laborious to code,
optimize, and, especially, maintain and extend. Writing the abstract machine
(and ancillary code) in a higher-level language can help tame this inherent
complexity. We show how the semantics of most basic components of an efficient
virtual machine for Prolog can be described using (a variant of) Prolog. These
descriptions are then compiled to C and assembled to build a complete bytecode
emulator. Thanks to the high level of the language used and its closeness to
Prolog, the abstract machine description can be manipulated using standard
Prolog compilation and optimization techniques with relative ease. We also show
how, by applying program transformations selectively, we obtain abstract
machine implementations whose performance can match and even exceed that of
state-of-the-art, highly-tuned, hand-crafted emulators.Comment: 56 pages, 46 figures, 5 tables, To appear in Theory and Practice of
Logic Programming (TPLP
Parallel backtracking with answer memoing for independent and-parallelism
Goal-level Independent and-parallelism (IAP) is exploited by scheduling for simultaneous execution two or more goals which will not interfere with each other at run time. This can be done safely even if such goals can produce mĂșltiple answers. The most successful IAP implementations to date have used recomputation of answers and sequentially ordered backtracking. While in principie simplifying the
implementation, recomputation can be very inefficient if the granularity of the parallel goals is large enough and they produce several answers, while sequentially ordered backtracking limits parallelism.
And, despite the expected simplification, the implementation of the classic schemes has proved to involve complex engineering, with the consequent difficulty for system maintenance and extensiĂłn, while still frequently running into the well-known trapped goal and garbage slot problems. This work presents an alternative parallel backtracking model for IAP and its implementation. The model features parallel out-of-order (i.e., non-chronological) backtracking and relies on answer memoization to reuse and combine answers. We show that this approach can bring significant performance advantages.
Also, it can bring some simplification to the important engineering task involved in implementing the backtracking mechanism of previous approaches
Region-based memory management for Mercury programs
Region-based memory management (RBMM) is a form of compile time memory
management, well-known from the functional programming world. In this paper we
describe our work on implementing RBMM for the logic programming language
Mercury. One interesting point about Mercury is that it is designed with strong
type, mode, and determinism systems. These systems not only provide Mercury
programmers with several direct software engineering benefits, such as
self-documenting code and clear program logic, but also give language
implementors a large amount of information that is useful for program analyses.
In this work, we make use of this information to develop program analyses that
determine the distribution of data into regions and transform Mercury programs
by inserting into them the necessary region operations. We prove the
correctness of our program analyses and transformation. To execute the
annotated programs, we have implemented runtime support that tackles the two
main challenges posed by backtracking. First, backtracking can require regions
removed during forward execution to be "resurrected"; and second, any memory
allocated during a computation that has been backtracked over must be recovered
promptly and without waiting for the regions involved to come to the end of
their life. We describe in detail our solution of both these problems. We study
in detail how our RBMM system performs on a selection of benchmark programs,
including some well-known difficult cases for RBMM. Even with these difficult
cases, our RBMM-enabled Mercury system obtains clearly faster runtimes for 15
out of 18 benchmarks compared to the base Mercury system with its Boehm runtime
garbage collector, with an average runtime speedup of 24%, and an average
reduction in memory requirements of 95%. In fact, our system achieves optimal
memory consumption in some programs.Comment: 74 pages, 23 figures, 11 tables. A shorter version of this paper,
without proofs, is to appear in the journal Theory and Practice of Logic
Programming (TPLP
Automatic goal distribution strategies for the execution of committed choice logic languages on distributed memory parallel computers
There has been much research interest in efficient implementations of the Committed
Choice Non-Deterministic (CCND) logic languages on parallel computers. To take
full advantage of the speed gains of parallel computers, methods need to be found
to automatically distribute goals over the machine processors, ideally with as little
involvement from the user as possible.In this thesis we explore some automatic goal distribution strategies for the execuÂŹ
tion of the CCND languages on commercially available distributed memory parallel
computers.There are two facets to the goal distribution strategies we have chosen to explore:DEMAND DRIVEN: An idle processor requests work from other processors. We describe
two strategies in this class: one in which an idle processor asks only neighbouring
processors for spare work, the nearest-neighbour strategy; and one where an idle
processor may ask any other processor in the machine for spare work, the allprocessors strategy.WEIGHTS: Using a program analysis technique devised by Tick, weights are attached to
goals; the weights can be used to order the goals so that they can be executed
and distributed out in weighted order, possibly increasing performance.We describe a framework in which to implement and analyse goal distribution strategies, and then go on to describe experiments with demand driven strategies, both with
and without weights. The experiments were made using two of our own implementations of Flat Guarded Horn Clauses â an interpreter and a WAM-like system â
executing on a MEIKO T800 Transputer Array configured in a 2-D mesh topology.Analysis of the results show that the all-processors strategies are promising (AP-NW),
adding weights had little positive effect on performance, and that nearest-neighbours
strategies can reduce performance due to bad load balancing.We also describe some preliminary experiments for a variant of the AP-NW strategy:
goals which suspend on one variable are sent to the processor that controls that variable,
the processes-to-data strategy. And we briefly look at some preliminary results of
executing programs on large numbers of processors (> 30)
- âŠ