1,647 research outputs found

    05101 Abstracts Collection -- Scheduling for Parallel Architectures: Theory, Applications, Challenges

    Get PDF
    From 06.03.05 to 11.03.05, the Dagstuhl Seminar 05101 ``Scheduling for Parallel Architectures: Theory, Applications, Challenges\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general

    The hArtes Tool Chain

    Get PDF
    This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

    ASAM : Automatic Architecture Synthesis and Application Mapping; dl. 3.2: Instruction set synthesis

    Get PDF
    No abstract

    Autotuning for Automatic Parallelization on Heterogeneous Systems

    Get PDF

    Enhanced applicability of loop transformations

    Get PDF

    Automatic parallelization for embedded multi-core systems using high level cost models

    Get PDF
    Nowadays, embedded and cyber-physical systems are utilized in nearly all operational areas in order to support and enrich peoples' everyday life. To cope with the demands imposed by modern embedded systems, the employment of MPSoC devices is often the most profitable solution. However, many embedded applications are still written in a sequential way. In order to benefit from the multiple cores available on those devices, the application code has to be divided into concurrently executed tasks. Since performing this partitioning manually is an error-prone and also time-consuming job, many automatic parallelization approaches were developed in the past. Most of these existing approaches were developed in the context of high-performance and desktop computers so that their applicability to embedded devices is limited. Many new challenges arise if applications should be ported to embedded MPSoCs in an efficient way. Therefore, novel parallelization techniques were developed in the context of this thesis that are tailored towards special requirements demanded by embedded multi-core devices. All approaches presented in this thesis are based on sophisticated parallelization techniques employing high-level cost models to estimate the benefit of parallel execution. This enables the creation of well-balanced tasks, which is essential if applications should be parallelized efficiently. In addition, several other requirements of embedded devices are covered, like the consideration of multiple objectives simultaneously. As a result, beneficial trade-offs between several objectives, like, e.g., energy consumption and execution time can be found enabling the extraction of solutions which are highly optimized for a specific application scenario. To be applicable to many embedded application domains, approaches extracting different kinds of parallelism were also developed. The structure of the global parallelization approach facilitates the combination of different approaches in a plug-and-play fashion. Thus, the advantages of multiple parallelization techniques can easily be combined. Finally, in addition to parallelization approaches for homogeneous MPSoCs, optimized ones for heterogeneous devices were also developed in this thesis since the trend towards heterogeneous multi-core architectures is inexorable. To the best of the author's knowledge, most of these objectives and especially their combination were not covered by existing parallelization frameworks, so far. By combining all of them, a parallelization framework that is well optimized for embedded multi-core devices was developed in the context of this thesis

    StreamJIT: A Commensal Compiler for High-Performance Stream Programming

    Get PDF
    There are many domain libraries, but despite the performance benefits of compilation, domain-specific languages are comparatively rare due to the high cost of implementing an optimizing compiler. We propose commensal compilation, a new strategy for compiling embedded domain-specific languages by reusing the massive investment in modern language virtual machine platforms. Commensal compilers use the host language's front-end, use host platform APIs that enable back-end optimizations by the host platform JIT, and use an autotuner for optimization selection. The cost of implementing a commensal compiler is only the cost of implementing the domain-specific optimizations. We demonstrate the concept by implementing a commensal compiler for the stream programming language StreamJIT atop the Java platform. Our compiler achieves performance 2.8 times better than the StreamIt native code (via GCC) compiler with considerably less implementation effort.United States. Dept. of Energy. Office of Science (X-Stack Award DE-SC0008923)Intel Corporation (Science and Technology Center for Big Data)SMART3 Graduate Fellowshi

    Polyhedral+Dataflow Graphs

    Get PDF
    This research presents an intermediate compiler representation that is designed for optimization, and emphasizes the temporary storage requirements and execution schedule of a given computation to guide optimization decisions. The representation is expressed as a dataflow graph that describes computational statements and data mappings within the polyhedral compilation model. The targeted applications include both the regular and irregular scientific domains. The intermediate representation can be integrated into existing compiler infrastructures. A specification language implemented as a domain specific language in C++ describes the graph components and the transformations that can be applied. The visual representation allows users to reason about optimizations. Graph variants can be translated into source code or other representation. The language, intermediate representation, and associated transformations have been applied to improve the performance of differential equation solvers, or sparse matrix operations, tensor decomposition, and structured multigrid methods
    • …
    corecore