Search CORE

91,929 research outputs found

Dynamic code mapping for limited local memory systems

Author: Aviral Shrivastava
Ke Bai
Seung Chul Jung
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Abstract—This paper presents heuristics for dynamic man-agement of application code on limited local memories present in high-performance multi-core processors. Previous techniques formulate the problem using call graphs, which do not capture the temporal ordering of functions. In addition, they only use a conservative estimate of the interference cost between functions to obtain a mapping. As a result previous techniques are unable to achieve efficient code mapping. Techniques proposed in this paper overcome both these limitations and achieve superior code mapping. Experimental results from executing benchmarks from MiBench onto the Cell processor in the Sony Playstation 3 demonstrate upto 29 % and average 12 % performance improve-ment, at tolerable compile-time overhead. I

CiteSeerX

Runtime protection via dataﬂow flattening

Author: Anckaert Bertrand
Jakubowski Mariusz H
Saw Chit Wei
Venkatesan Ramarathnam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Software running on an open architecture, such as the PC, is vulnerable to inspection and modiﬁcation. Since software may process valuable or sensitive information, many defenses against data analysis and modiﬁcation have been proposed. This paper complements existing work and focuses on hiding data location throughout program execution. To achieve this, we combine three techniques: (i) periodic reordering of the heap, (ii) migrating local variables from the stack to the heap and (iii) pointer scrambling. By essentialy flattening the dataflow graph of the program, the techniques serve to complicate static dataflow analysis and dynamic data tracking. Our methodology can be viewed as a data-oriented analogue of control-flow flattening techniques. Dataflow flattening is useful in practical scenarios like DRM, information-flow protection, and exploit resistance. Our prototype implementation compiles C programs into a binary for which every access to the heap is redirected through a memory management unit. Stack-based variables may be migrated to the heap, while pointer accesses and arithmetic may be scrambled and redirected. We evaluate our approach experimentally on the SPEC CPU2006 benchmark suit

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Author: Ciorba Florina M.
Eleliemy Ahmed
Publication venue
Publication date: 01/01/2019
Field of study

Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach

arXiv.org e-Print Archive

edoc