3,301 research outputs found
Federal Corporate Law, Federalism, and the Federal Courts
To parallelize a sequential source code, a parallelization strategy must be defined that transforms the sequential source code into an equivalent parallel version. Since parallelizing compilers can sometimes transform sequential loops and other well-structured codes into parallel ones automatically, we are interested in finding a solution to parallelize semi-automatically codes that compilers are not able to parallelize automatically, mostly because of weakness of classical data and control dependence analysis, in order to simplify the process of transforming the codes for programmers.Invasive Interactive Parallelization (IIP) hypothesizes that by using anintelligent system that guides the user through an interactive process one can boost parallelization in the above direction. The intelligent system's guidance relies on a classical code analysis and pre-defined parallelizing transformation sequences. To support its main hypothesis, IIP suggests to encode parallelizing transformation sequences in terms of IIP parallelization strategies that dictate default ways to parallelize various code patterns by using facts which have been obtained both from classical source code analysis and directly from the user.In this project, we investigate how automated reasoning can supportthe IIP method in order to parallelize a sequential code with an acceptable performance but faster than manual parallelization. We have looked at two special problem areas: Divide and conquer algorithms and loops in the source codes. Our focus is on parallelizing four sequential legacy C programs such as: Quick sort, Merge sort, Jacobi method and Matrix multipliation and summation for both OpenMP and MPI environment by developing an interactive parallelizing assistance tool that provides users with the assistanceneeded for parallelizing a sequential source code
Automatic parallelization of irregular and pointer-based computations: perspectives from logic and constraint programming
Irregular computations pose some of the most interesting and
challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. In the past decade there has been significant progress in the development of parallelizing compilers for logic programming and, more recently, constraint programming. The typical applications of these paradigms frequently involve irregular computations, which arguably makes the techniques used in these compilers potentially interesting. In this paper we
introduce in a tutorial way some of the problems faced by parallelizing compilers for logic and constraint programs. These include the need for inter-procedural pointer aliasing analysis for independence detection and having to manage speculative and irregular computations through task
granularity control and dynamic task allocation. We also provide pointers to some of the progress made in these áreas. In the associated talk we demónstrate representatives of several generations of these parallelizing compilers
Recommended from our members
Applying an abstract data structure description approach to parallelizing scientific pointer programs
Even though impressive progress has been made in the area of parallelizing scientific programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. Unlike arrays which have a small number of well-defined properties that can be utilized by a parallelizing compiler, pointer data structures are used to implement a wide variety of structures that exhibit a much more diverse set of properties. The complexity and diversity of such properties means that, in general, scientific programs with pointer data structures cannot be effectively analyzed by an optimizing and parallelizing compiler.In order to provide a system in which the compiler can fully utilize the properties of different types of pointer data structures, we have developed a mechanism for the Abstract Description of Data Structures (ADDS). With our approach, the programmer can explicitly describe important properties such as dimensionality of the pointer data structure, independence of dimensions, and direction of traversal. These abstract descriptions of pointer data structures are then used by the compiler to guide analysis, optimization, and parallelization.In this paper we summarize the ADDS approach through the use of numerous examples of data structures used in scientific computations, we illustrate how such declarations are natural and non-tedious to specify, and we show how the ADDS declarations can be used to improve compile-time analysis. In order to demonstrate the viability of our approach, we show how such techniques can be used to parallelize an important class of scientific codes which naturally use recursive pointer data structures. In particular, we use our approach to develop the parallelization of an N-body simulation that is based on a relatively complicated pointer data structure, and we report the speedup results for a Sequent multiprocessor
A compiler extension for parallelizing arrays automatically on the cell heterogeneous processor
This paper describes the approaches taken to extend an array
programming language compiler using a Virtual SIMD Machine (VSM)
model for parallelizing array operations on Cell Broadband Engine heterogeneous
machine. This development is part of ongoing work at the
University of Glasgow for developing array compilers that are beneficial
for applications in many areas such as graphics, multimedia, image processing
and scientific computation. Our extended compiler, which is built
upon the VSM interface, eases the parallelization processes by allowing
automatic parallelisation without the need for any annotations or process
directives. The preliminary results demonstrate significant improvement
especially on data-intensive applications
PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs
We motivate the design and implementation of a platform-neutral compute
intermediate language (PENCIL) for productive and performance-portable
accelerator programming
Parallelization of a Six Degree of Freedom Entry Vehicle Trajectory Simulation Using OpenMP and OpenACC
The art and science of writing parallelized software, using methods such as Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC), is dominated by computer scientists. Engineers and non-computer scientists looking to apply these techniques to their project applications face a steep learning curve, especially when looking to adapt their original single threaded software to run multi-threaded on graphics processing units (GPUs). There are significant changes in mindset that must occur; such as how to manage memory, the organization of instructions, and the use of if statements (also known as branching). The purpose of this work is twofold: 1) to demonstrate the applicability of parallelized coding methodologies, OpenMP and OpenACC, to tasks outside of the typical large scale matrix mathematics; and 2) to discuss, from an engineers perspective, the lessons learned from parallelizing software using these computer science techniques. This work applies OpenMP, on both multi-core central processing units (CPUs) and Intel Xeon Phi 7210, and OpenACC on GPUs. These parallelization techniques are used to tackle the simulation of thousands of entry vehicle trajectories through the integration of six degree of freedom (DoF) equations of motion (EoM). The forces and moments acting on the entry vehicle, and used by the EoM, are estimated using multiple models of varying levels of complexity. Several benchmark comparisons are made on the execution of six DoF trajectory simulation: single thread Intel Xeon E5-2670 CPU, multi-thread CPU using OpenMP, multi-thread Xeon Phi 7210 using OpenMP, and multi-thread NVIDIA Tesla K40 GPU using OpenACC. These benchmarks are run on the Pleiades Supercomputer Cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC), and a Xeon Phi 7210 node at NASA Langley Research Center (LaRC)
Recommended from our members
Percolation scheduling for non-VLIW machines
Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps
- …