Search CORE

516 research outputs found

Automatic Sequential to Parallel Code Conversion

Author: . M. N. Babu
. Sudhakar Sah
. Vinay Vaidya
Athavale Aditi
Pawar Prasad
Rajguru Chaitanya
Ranadive Priti
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 16/09/2014
Field of study

The way software programs are being written has been redefined since the introduction of multicore processors. Software developers have started writing parallel programs that are robust and scalable. This would ensure use of processor power being made available in the form of multiple cores. Though this trend is increasing, there are legacy applications that have been developed over the past few decades. Most of these applications are inherently sequential making no use of multithreading or parallel programming. If such applications are ported to execute on the multicore hardware as they are then optimal usage of all cores is not guaranteed. Such applications would ideally utilize only one core and the other cores would remain idle, unless the operating system supports some parallelism while scheduling. Hence there is a need to convert such legacy sequential codes to their parallel versions so that multicore hardware is exploited to the fullest. In this paper we present a tool that we have developed to automatically convert a sequential C code to parallel code. This Sequential to Parallel (S2P) tool is still in the development phase. We also discuss other parallelization tools available today, compare such tools with S2P tool and present our performance analysis results on different kind of multicore hardware

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

Parallelizing irregular and pointer-based computations automatically: perspectives from logic and constraint programming

Author: Hermenegildo Manuel V.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

Irregular computations pose sorne of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. Starting in the mid 80s there has been significant progress in the development of parallelizing compilers for logic programming (and more recently, constraint programming) resulting in quite capable parallelizers. The typical applications of these paradigms frequently involve irregular computations, and make heavy use of dynamic data structures with pointers, since logical variables represent in practice a well-behaved form of pointers. This arguably makes the techniques used in these compilers potentially interesting. In this paper, we introduce in a tutoríal way, sorne of the problems faced by parallelizing compilers for logic and constraint programs and provide pointers to sorne of the significant progress made in the area. In particular, this work has resulted in a series of achievements in the areas of inter-procedural pointer aliasing analysis for independence detection, cost models and cost analysis, cactus-stack memory management, techniques for managing speculative and irregular computations through task granularity control and dynamic task allocation such as work-stealing schedulers), etc

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Recommended from our members

An efficient global resource constrained technique for exploiting instruction level parallelism

Author: Nicolau Alexandru
Novack Steven
Publication venue: eScholarship, University of California
Publication date: 21/01/1992
Field of study

A new Global Resource-constrained Percolation (GRiP) scheduling technique is presented for exploiting instruction level parallelism. Other techniques that have been proposed either have been prohibitively expensive in terms of computation or have limited parallelism. The GRiP technique has been implemented and simulation results are presented

eScholarship - University of California

A Survey on Compiler Autotuning using Machine Learning

Author: Ashouri Amir H.
Cavazos John
Killian William
Palermo Gianluca
Silvano Cristina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2018
Field of study

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Efficient Generation of Parallel Spin-images Using Dynamic Loop Scheduling

Author: Ciorba Florina M.
Eleliemy Ahmed
Mohammed Ali
Publication venue
Publication date: 01/01/2017
Field of study

High performance computing (HPC) systems underwent a significant increase in their processing capabilities. Modern HPC systems combine large numbers of homogeneous and heterogeneous computing resources. Scalability is, therefore, an essential aspect of scientific applications to efficiently exploit the massive parallelism of modern HPC systems. This work introduces an efficient version of the parallel spin-image algorithm (PSIA), called EPSIA. The PSIA is a parallel version of the spin-image algorithm (SIA). The (P)SIA is used in various domains, such as 3D object recognition, categorization, and 3D face recognition. EPSIA refers to the extended version of the PSIA that integrates various well-known dynamic loop scheduling (DLS) techniques. The present work: (1) Proposes EPSIA, a novel flexible version of PSIA; (2) Showcases the benefits of applying DLS techniques for optimizing the performance of the PSIA; (3) Assesses the performance of the proposed EPSIA by conducting several scalability experiments. The performance results are promising and show that using well-known DLS techniques, the performance of the EPSIA outperforms the performance of the PSIA by a factor of 1.2 and 2 for homogeneous and heterogeneous computing resources, respectively

arXiv.org e-Print Archive

Crossref

edoc

Python Programmers Have GPUs Too: Automatic Python Loop Parallelization with Staged Dependence Analysis

Author: Jacob Dejice
Trinder Phil
Singer Jeremy
Publication venue
Publication date: 20/10/2019
Field of study

Python is a popular language for end-user software development in many application domains. End-users want to harness parallel compute resources effectively, by exploiting commodity manycore technology including GPUs. However, existing approaches to parallelism in Python are esoteric, and generally seem too complex for the typical end-user developer. We argue that implicit, or automatic, parallelization is the best way to deliver the benefits of manycore to end-users, since it avoids domain-specific languages, specialist libraries, complex annotations or restrictive language subsets. Auto-parallelization fits the Python philosophy, provides effective performance, and is convenient for non-expert developers. Despite being a dynamic language, we show that Python is a suitable target for auto-parallelization. In an empirical study of 3000+ open-source Python notebooks, we demonstrate that typical loop behaviour ‘in the wild’ is amenable to auto-parallelization. We show that staging the dependence analysis is an effective way to maximize performance. We apply classical dependence analysis techniques, then leverage the Python runtime’s rich introspection capabilities to resolve additional loop bounds and variable types in a just-in-time manner. The parallel loop nest code is then converted to CUDA kernels for GPU execution. We achieve orders of magnitude speedup over baseline interpreted execution and some speedup (up to 50x, although not consistently) over CPU JIT-compiled execution, across 12 loop-intensive standard benchmarks

OPUS Augsburg

Enlighten

Python Programmers Have GPUs Too: Automatic Python Loop Parallelization with Staged Dependence Analysis

Author: Abadi Martín
Beazley David
Pedregosa Fabian
Pope-Carter Finnegan
Rubinsteyn Alex
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/10/2019
Field of study

Crossref

Enlighten

CRAUL: Compiler and Run-Time Integration for Adaptation under Load

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1999
Field of study

Crossref

Evaluation of OpenMP for the Cyclops multithreaded architecture

Author: Almasi George
Ayguadé Parra Eduard
Cascaval Calin
Castaños José G.
Labarta Mancho Jesús José
Martorell Bofill Xavier
Martínez Francisco
Moreira José E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2003
Field of study

Multithreaded architectures have the potential of tolerating large memory and functional unit latencies and increase resource utilization. The Blue Gene/Cyclops architecture, being developed at the IBM T. J. Watson Research Center, is one such systems that offers massive intra-chip parallelism. Although the BG/C architecture was initially designed to execute specific applications, we believe that it can be effectively used on a broad range of parallel numerical applications. Programming such applications for this unconventional design requires a significant porting effort when using the basic built-in mechanisms for thread management and synchronization. In this paper, we describe the implementation of an OpenMP environment for parallelizing applications, currently under development at the CEPBA-IBM Research Institute, targeting BG/C. The environment is evaluated with a set of simple numerical kernels and a subset of the NAS OpenMP benchmarks. We identify issues that were not initially considered in the design of the BG/C architecture to support a programming model such as OpenMP. We also evaluate features currently offered by the BG/C architecture that should be considered in the implementation of an efficient OpenMP layer for massive intra-chip parallel architectures.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC