Search CORE

6 research outputs found

Code scheduling for optimizing parallelism and data locality

Author: Kandemir M.
Kultursay E.
Muralidhara S.P.
Ozturk O.
Yemliha T.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

As chip multiprocessors proliferate, programming support for these devices is likely to receive a lot of attention in the near future. Parallelism and data locality are two critical issues in a chip multiprocessor environment. Unfortunately, most of the published work in the literature focuses only on one of these problems, and this can prevent one from achieving the best possible performance. The main goal of this paper is to propose and evaluate a compiler-directed code parallelization scheme, which considers both parallelism and data locality at the same time. Our compiler captures the inherent parallelism and data reuse in the application code being analyzed using a novel representation called the locality-parallelism graph (LPG). Our partitioning/scheduling algorithm assigns the nodes of this graph to the processors in the architecture and schedules them for execution. We implemented this algorithm and evaluated its effectiveness using a set of benchmark codes. The results collected so far indicate that our approach improves overall execution latency significantly. In this paper, we also introduce an ILP (Integer Linear Programming) based formulation of the problem, and implement the schedule obtained by the ILP solver. The results indicate that our approach gets within 4% of the ILP solution. © 2010 Springer-Verlag

Crossref

Bilkent University Institutional Repository

Performance and Memory Space Optimizations for Embedded Systems

Author: Yemliha Taylan
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/2011
Field of study

Embedded systems have three common principles: real-time performance, low power consumption, and low price (limited hardware). Embedded computers use chip multiprocessors (CMPs) to meet these expectations. However, one of the major problems is lack of efficient software support for CMPs; in particular, automated code parallelizers are needed. The aim of this study is to explore various ways to increase performance, as well as reducing resource usage and energy consumption for embedded systems. We use code restructuring, loop scheduling, data transformation, code and data placement, and scratch-pad memory (SPM) management as our tools in different embedded system scenarios. The majority of our work is focused on loop scheduling. Main contributions of our work are: We propose a memory saving strategy that exploits the value locality in array data by storing arrays in a compressed form. Based on the compressed forms of the input arrays, our approach automatically determines the compressed forms of the output arrays and also automatically restructures the code. We propose and evaluate a compiler-directed code scheduling scheme, which considers both parallelism and data locality. It analyzes the code using a locality parallelism graph representation, and assigns the nodes of this graph to processors.We also introduce an Integer Linear Programming based formulation of the scheduling problem. We propose a compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. The method is to distribute loop iterations across parallel processors in an SPM-conscious manner. The compiler identifies potential SPM hits and misses, and distributes loop iterations such that the processors have close execution times. We present an SPM management technique using Markov chain based data access. We propose a compiler directed integrated code and data placement scheme for 2-D mesh based CMP architectures. Using a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data, it assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. We present a memory bank aware dynamic loop scheduling scheme for array intensive applications.The goal is to minimize the number of memory banks needed for executing the group of loop iterations

Syracuse University Research Facility and Collaborative Environment

Nested-Loops Tiling for Parallelization and Locality Optimization

Author: Hamzei Mohammad
Parsa Saeed
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 06/07/2017
Field of study

Data locality improvement and nested loops parallelization are two complementary and competing approaches for optimizing loop nests that constitute a large portion of computation times in scientific and engineering programs. While there are effective methods for each one of these, prior studies have paid less attention to address these two simultaneously. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate locality conscious loop transformation to partition the loop iteration space into outer parallel tiled loops. The approach is based on the polyhedral model to achieve a multidimensional affine scheduling as a transformation that result the largest groups of tilable loops with maximum coarse grain parallelism, as far as possible. Furthermore, tiles will be scheduled on processor cores to exploit maximum data reuse through scheduling tiles with high volume of data sharing on the same core consecutively or on different cores with shared cache at around the same time

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Reorganització del runtime Nanos++

Author: Bosch Pons Jaume
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/06/2015
Field of study

Aquest projecte proposa una reorganització del sistema de gestió de les dependències entre tasques a Nanos++ (runtime del model de programació OmpSs) amb l'objectiu de reduir la contenció d'accés a aquestes estructures, augmentant el paral·lelisme i obtenint un millor aprofitament dels recursos.The goal of this Project is reorganize the dependencies management model between tasks in Nanos++ (OmpSs runtime). The motivation is reduce the access contention of dependencies graph, increasing parallelism and achieving better use of computing resources

UPCommons. Portal del coneixement obert de la UPC

Reorganització del runtime Nanos++

Author: Bosch Pons Jaume
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/06/2015
Field of study

Code Scheduling for Optimizing Parallelism and Data Locality

Author
Publication venue
Publication date
Field of study

As embedded chip multiprocessors proliferate, programming support for these devices is likely to receive a lot of attention in the near future. Parallelism and data locality are two critical issues in a chip multiprocessor environment. These capture the usage of available computation resources and available memory hierarchy, respectively. In order to achieve good performance in a chip multiprocessor based embedded system, an optimizing compiler has to exploit both parallelism and locality. Unfortunately, most of the published work in the literature focuses only on one of these problems, and this can prevent one from achieving the best possible performance. The main goal of this paper is to propose and evaluate a compiler-directed code parallelization scheme, which considers both parallelism and data locality at the same time. Our compiler captures the inherent parallelism and data reuse in the application code being analyzed using a novel representation called the locality-parallelism graph (LPG). Our partitioning/scheduling algorithm assigns the nodes of this graph to the processors in the architecture and schedules them for execution. We implemented this algorithm and evaluated its effectiveness using a set of benchmark codes. The results collected so far indicate that our approach improves overall execution latency significantly. In this paper, we also introduce an ILP (Integer Linear Programming) based formulation of the problem, and implement the schedule obtained by the ILP solver. The results indicate that our approach gets within 4 % of the ILP solution. 1

CiteSeerX