171 research outputs found
The ciao modular, standalone compiler and its generic program processing library
Ciao Prolog incorporates a module system which allows sepárate compilation and sensible creation of standalone executables. We describe some of the main aspects of the Ciao modular compiler, ciaoc, which takes advantage of the characteristics of the Ciao Prolog module system to automatically perform sepárate and incremental compilation and efficiently build small, standalone executables with competitive run-time performance, ciaoc can also detect statically a larger number of programming errors. We also present a generic code processing library for handling modular programs, which provides an important part of the functionality of ciaoc. This library allows the development of program analysis and transformation tools in a way that is to some extent orthogonal to the details of module system design, and has been used in the implementation of ciaoc and other Ciao system tools. We also describe the different types of executables which can be generated by the
Ciao compiler, which offer different tradeoffs between executable size, startup time, and portability, depending, among other factors, on the linking regime used (static, dynamic, lazy, etc.). Finally, we provide experimental data which illustrate these tradeoffs
The ciao module system: A new module system for prolog
Ciao Prolog incorporates a module system which allows sepárate compilation and sensible creation of standalone executables. We describe some of the main aspects of the Ciao modular compiler, ciaoc, which takes advantage of the characteristics of the Ciao Prolog module system to automatically perform sepárate and incremental compilation and efficiently build small, standalone executables with competitive run-time performance, ciaoc can also detect statically a larger number of programming errors. We also present a generic code processing library for handling modular programs, which provides an important part of the functionality of ciaoc. This library allows the development of program analysis and transformation tools in a way that is to some extent orthogonal to the details of module system design, and has been used in the implementation of ciaoc and other Ciao system tools. We also describe the different types of executables which can be generated by the
Ciao compiler, which offer different tradeoffs between executable size, startup time, and portability, depending, among other factors, on the linking regime used (static,
dynamic, lazy, etc.). Finally, we provide experimental data which illustrate these tradeoffs
Self-distributing computation
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (leaves 54-55).In this thesis, I propose a new model for distributing computational work in a parallel or distributed system. This model relies on exposing the topology and performance characteristics of the underlying architecture to the application. Responsibility for task distribution is divided between a run-time system, which determines when tasks should be distributed or consolidated, and the application, which specifies to the runtime system its first-choice distribution based on a representation of the current state of the underlying architecture. Discussing my experience in implementing this model as a Java-based simulator, I argue for the advantages of this approach as they relate to performance on changing architectures and ease of programming.by Thomas R. Woodfin.M.Eng
Portability and performance in heterogeneous many core Systems
Dissertação de mestrado em InformáticaCurrent computing systems have a multiplicity of computational resources with different
architectures, such as multi-core CPUs and GPUs. These platforms are known as
heterogeneous many-core systems (HMS) and as computational resources evolve they
are o ering more parallelism, as well as becoming more heterogeneous. Exploring these
devices requires the programmer to be aware of the multiplicity of associated architectures,
computing models and development framework. Portability issues, disjoint
memory address spaces, work distribution and irregular workload patterns are major
examples that need to be tackled in order to e ciently explore the computational
resources of an HMS.
This dissertation goal is to design and evaluate a base architecture that enables
the identi cation and preliminary evaluation of the potential bottlenecks and limitations
of a runtime system that addresses HMS. It proposes a runtime system that
eases the programmer burden of handling all the devices available in a heterogeneous
system. The runtime provides a programming and execution model with a uni ed
address space managed by a data management system. An API is proposed in order
to enable the programmer to express applications and data in an intuitive way. Four
di erent scheduling approaches are evaluated that combine di erent data partitioning
mechanisms with di erent work assignment policies and a performance model is used
to provide some performance insights to the scheduler.
The runtime e ciency was evaluated with three di erent applications - matrix multiplication,
image convolution and n-body Barnes-Hut simulation - running in multicore
CPUs and GPUs.
In terms of productivity the results look promising, however, combining scheduling
and data partitioning revealed some ine ciencies that compromise load balancing and
needs to be revised, as well as the data management system that plays a crucial role in
such systems. Performance model driven decisions were also evaluated which revealed
that the accuracy of a performance model is also a compromising component
A framework for efficient execution of data parallel irregular applications on heterogeneous systems
Exploiting the computing power of the diversity of resources available on heterogeneous
systems is mandatory but a very challenging task. The diversity of architectures, execution
models and programming tools, together with disjoint address spaces and di erent
computing capabilities, raise a number of challenges that severely impact on application
performance and programming productivity. This problem is further compounded in the
presence of data parallel irregular applications.
This paper presents a framework that addresses development and execution of data
parallel irregular applications in heterogeneous systems. A uni ed task-based programming
and execution model is proposed, together with inter and intra-device scheduling,
which, coupled with a data management system, aim to achieve performance scalability
across multiple devices, while maintaining high programming productivity. Intradevice
scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels,
which, by allowing dynamic generation and rescheduling of new work units, enable
balancing irregular workloads and increase resource utilization.
Results show that regular and irregular applications scale well with the number of
devices, while requiring minimal programming e ort. Consumer-producer kernels are
able to sustain signi cant performance gains as long as the workload per basic work
unit is enough to compensate overheads associated with intra-device scheduling. This
not being the case, consumer kernels can still be used for the irregular application.
Comparisons with an alternative framework, StarPU, which targets regular workloads,
consistently demonstrate signi cant speedups. This is, to the best of our knowledge, the
rst published integrated approach that successfully handles irregular workloads over
heterogeneous systems.This work is funded by National Funds through the FCT - Fundação para a Ciência
e a Tecnologia (Portuguese Foundation for Science and Technology) and by ERDF -
European Regional Development Fund through the COMPETE Programme (operational
programme for competitiveness) within projects PEst-OE/EEI/UI0752/2014
and FCOMP-01-0124-FEDER-010067. Also by the School of Engineering, Universidade
do Minho within project P2SHOCS - Performance Portability on Scalable
Heterogeneous Computing Systems
Profiling large-scale lazy functional programs
The LOLITA natural language processing system is an example of one of the ever increasing number of large-scale systems written entirely in a functional programming language. The system consists of over 50,000 lines of Haskell code and is able to perform a number of tasks such as semantic and pragmatic analysis of text, context scanning and query analysis. Such a system is more useful if the results are calculated in real-time, therefore the efficiency of such a system is paramount. For the past three years we have used profiling tools supplied with the Haskell compilers GHC and HBC to analyse and reason about our programming solutions and have achieved good results; however, our experience has shown that the profiling life-cycle is often too long to make a detailed analysis of a large system possible, and the profiling results are often misleading. A profiling system is developed which allows three types of functionality not previously found in a profiler for lazy functional programs. Firstly, the profiler is able to produce results based on an accurate method of cost inheritance. We have found that this reduces the possibility of the programmer obtaining misleading profiling results. Secondly, the programmer is able to explore the results after the execution of the program. This is done by selecting and deselecting parts of the program using a post-processor. This greatly reduces the analysis time as no further compilation, execution or profiling of the program is needed. Finally, the new profiling system allows the user to examine aspects of the run-time call structure of the program. This is useful in the analysis of the run-time behaviour of the program. Previous attempts at extending the results produced by a profiler in such a way have failed due to the exceptionally high overheads. Exploration of the overheads produced by the new profiling scheme show that typical overheads in profiling the LOLITA system are: a 10% increase in compilation time; a 7% increase in executable size and a 70% run-time overhead. These overheads mean a considerable saving in time in the detailed analysis of profiling a large, lazy functional program
A model for interactive computation : applications to speech research
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 157-159).by Michael K. McCandless.Ph.D
- …