17 research outputs found

    Skeletons for parallel image processing: an overview of the SKiPPER project

    Get PDF
    International audienceThis paper is a general overview of the SKIPPER project, run at Blaise Pascal University between 1996 and 2002. The main goal of the SKIPPER project was to demonstrate the appli- cability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This project has produced several versions of a full-fledged integrated pa- rallel programming environment (PPE). These PPEs have been used to implement realistic vi- sion applications, such as road following or vehicle tracking for assisted driving, on embedded parallel platforms embarked on semi-autonomous vehicles. All versions of SKIPPER share a common front-end and repertoire of skeletons--presented in previous papers--but differ in the techniques used for implementing skeletons. This paper focuses on these implementation issues, by making a comparative survey, according to a set of four criteria (efficiency, expres- sivity, portability, predictability), of these implementation techniques. It also gives an account of the lessons we have learned, both when dealing with these implementation issues and when using the resulting tools for prototyping vision applications

    Engineering the performance of parallel applications

    Get PDF

    Supporting high-level, high-performance parallel programming with library-driven optimization

    Get PDF
    Parallel programming is a demanding task for developers partly because achieving scalable parallel speedup requires drawing upon a repertoire of complex, algorithm-specific, architecture-aware programming techniques. Ideally, developers of programming tools would be able to build algorithm-specific, high-level programming interfaces that hide the complex architecture-aware details. However, it is a monumental undertaking to develop such tools from scratch, and it is challenging to provide reusable functionality for developing such tools without sacrificing the hosted interface’s performance or ease of use. In particular, to get high performance on a cluster of multicore computers without requiring developers to manually place data and computation onto processors, it is necessary to combine prior methods for shared memory parallelism with new methods for algorithm-aware distribution of computation and data across the cluster. This dissertation presents Triolet, a programming language and compiler for high-level programming of parallel loops for high-performance execution on clusters of multicore computers. Triolet adopts a simple, familiar programming interface based on traversing collections of data. By incorporating semantic knowledge of how traversals behave, Triolet achieves efficient parallel execution and communication. Moreover, Triolet’s performance on sequential loops is comparable to that of low-level C code, ranging from seven percent slower to 2.8× slower on tested benchmarks. Triolet’s design demonstrates that it is possible to decouple the design of a compiler from the implementation of parallelism without sacrificing performance or ease of use: parallel and sequential loops are implemented as library code and compiled to efficient code by an optimizing compiler that is unaware of parallelism beyond the scope of a single thread. All handling of parallel work partitioning, data partitioning, and scheduling is embodied in library code. During compilation, library code is inlined into a program and specialized to yield customized parallel loops. Experimental results from a 128-core cluster (with 8 nodes and 16 cores per node) show that loops in Triolet outperform loops in Eden, a similar high-level language. Triolet achieves significant parallel speedup over sequential C code, with performance ranging from slightly faster to 4.3× slower than manually parallelized C code on compute-intensive loops. Thus, Triolet demonstrates that a library of container traversal functions can deliver cluster-parallel performance comparable to manually parallelized C code without requiring programmers to manage parallelism. This programming approach opens the potential for future research into parallel programming frameworks

    A Tiger Compiler for the Cell Broadband Engine Architecture

    Get PDF
    The modern computing industry tends to build integrated circuits with multiple energy-efficient cores instead of ramping up the clock speed for each single processing unit. While each core may not run as fast as the single core model, such architecture allows more jobs to be handled in parallel and also provides better overall performance. Asymmetric Multiprocessing, also known as Heterogeneous Multiprocessing, involves multiple processors that differ architecturally from one another, especially where each processor has its own memory space. Under power limitations, this design could provide better performance than that attained through symmetric multiprocessing. However, the heterogeneous nature adds difficulty to programming. Each specific architecture requires its own program code. Programmers also need to explicitly transfer code and data between processors. This study describes the implementation of a compiler of the pedagogic Tiger language for the Cell Broadband Engine, an asymmetric multiprocessing platform jointly developed by Sony, Toshiba and IBM. The problem above is solved by introducing multiple backends for the Tiger language, along with a remote call stub (RCS) generator. Functions are compiled into different architectures, and calls across architectures are linked automatically through the stubs. RCS takes care of the execution context switch and hides details of the argument data/return value transfer. TigC simplifies the programming and building procedures. It also provides a high-level view of the whole program execution for future optimization because all of the source files are processed by a single compiler. As an example of this procedure, the possible optimization of data transfer during remote calls is investigated here

    Proceedings of Monterey Workshop 2001 Engineering Automation for Sofware Intensive System Integration

    Get PDF
    The 2001 Monterey Workshop on Engineering Automation for Software Intensive System Integration was sponsored by the Office of Naval Research, Air Force Office of Scientific Research, Army Research Office and the Defense Advance Research Projects Agency. It is our pleasure to thank the workshop advisory and sponsors for their vision of a principled engineering solution for software and for their many-year tireless effort in supporting a series of workshops to bring everyone together.This workshop is the 8 in a series of International workshops. The workshop was held in Monterey Beach Hotel, Monterey, California during June 18-22, 2001. The general theme of the workshop has been to present and discuss research works that aims at increasing the practical impact of formal methods for software and systems engineering. The particular focus of this workshop was "Engineering Automation for Software Intensive System Integration". Previous workshops have been focused on issues including, "Real-time & Concurrent Systems", "Software Merging and Slicing", "Software Evolution", "Software Architecture", "Requirements Targeting Software" and "Modeling Software System Structures in a fastly moving scenario".Office of Naval ResearchAir Force Office of Scientific Research Army Research OfficeDefense Advanced Research Projects AgencyApproved for public release, distribution unlimite

    A unified model for hardware/software codesign

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 179-188).Embedded systems are almost always built with parts implemented in both hardware and software. Market forces encourage such systems to be developed with dierent hardware-software decompositions to meet dierent points on the price-performance-power curve. Current design methodologies make the exploration of dierent hardware-software decompositions difficult because such exploration is both expensive and introduces signicant delays in time-to-market. This thesis addresses this problem by introducing, Bluespec Codesign Language (BCL), a united language model based on guarded atomic actions for hardware-software codesign. The model provides an easy way of specifying which parts of the design should be implemented in hardware and which in software without obscuring important design decisions. In addition to describing BCL's operational semantics, we formalize the equivalence of BCL programs and use this to mechanically verify design refinements. We describe the partitioning of a BCL program via computational domains and the compilation of dierent computational domains into hardware and software, respectively.by Nirav Dave.Ph.D
    corecore