91,929 research outputs found

    Dynamic code mapping for limited local memory systems

    Full text link
    Abstract—This paper presents heuristics for dynamic man-agement of application code on limited local memories present in high-performance multi-core processors. Previous techniques formulate the problem using call graphs, which do not capture the temporal ordering of functions. In addition, they only use a conservative estimate of the interference cost between functions to obtain a mapping. As a result previous techniques are unable to achieve efficient code mapping. Techniques proposed in this paper overcome both these limitations and achieve superior code mapping. Experimental results from executing benchmarks from MiBench onto the Cell processor in the Sony Playstation 3 demonstrate upto 29 % and average 12 % performance improve-ment, at tolerable compile-time overhead. I

    Runtime protection via dataflow flattening

    Get PDF
    Software running on an open architecture, such as the PC, is vulnerable to inspection and modification. Since software may process valuable or sensitive information, many defenses against data analysis and modification have been proposed. This paper complements existing work and focuses on hiding data location throughout program execution. To achieve this, we combine three techniques: (i) periodic reordering of the heap, (ii) migrating local variables from the stack to the heap and (iii) pointer scrambling. By essentialy flattening the dataflow graph of the program, the techniques serve to complicate static dataflow analysis and dynamic data tracking. Our methodology can be viewed as a data-oriented analogue of control-flow flattening techniques. Dataflow flattening is useful in practical scenarios like DRM, information-flow protection, and exploit resistance. Our prototype implementation compiles C programs into a binary for which every access to the heap is redirected through a memory management unit. Stack-based variables may be migrated to the heap, while pointer accesses and arithmetic may be scrambled and redirected. We evaluate our approach experimentally on the SPEC CPU2006 benchmark suit

    Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

    Full text link
    Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach
    corecore