582 research outputs found
Recommended from our members
Fine grain software pipelining of non-vectorizable nested loops
This paper presents a new technique to parallelize nested loops at the statement level. It transforms sequential nested loops, either vectorizable or not, into parallel ones. Previously, the wavefront method was used to parallelize non-vectorizable nested loops. However, in order to reduce the complexity of parallelization, the wavefront method regards an iteration as an unbreakable scheduling unit and draws parallelism through iteration overlapping. Our technique takes a statement rather than an iteration as the scheduling unit and exploits parallelism by overlapping the statements in all dimensions. In this paper, we show how this finer grain parallelization can be achieved with reasonable computational complexity, and the effectiveness of the resulting method in exploiting parallelism
Supercomputer optimizations for stochastic optimal control applications
Supercomputer optimizations for a computational method of solving stochastic, multibody, dynamic programming problems are presented. The computational method is valid for a general class of optimal control problems that are nonlinear, multibody dynamical systems, perturbed by general Markov noise in continuous time, i.e., nonsmooth Gaussian as well as jump Poisson random white noise. Optimization techniques for vector multiprocessors or vectorizing supercomputers include advanced data structures, loop restructuring, loop collapsing, blocking, and compiler directives. These advanced computing techniques and superconducting hardware help alleviate Bellman's curse of dimensionality in dynamic programming computations, by permitting the solution of large multibody problems. Possible applications include lumped flight dynamics models for uncertain environments, such as large scale and background random aerospace fluctuations
Parallelizing Julia with a Non-Invasive DSL
Computational scientists often prototype software using productivity
languages that offer high-level programming abstractions. When higher
performance is needed, they are obliged to rewrite their code in a
lower-level efficiency language. Different solutions have been
proposed to address this trade-off between productivity and
efficiency. One promising approach is to create embedded
domain-specific languages that sacrifice generality for productivity
and performance, but practical experience with DSLs points to some
road blocks preventing widespread adoption. This paper proposes a
non-invasive domain-specific language that makes as few visible
changes to the host programming model as possible. We present ParallelAccelerator,
a library and compiler for high-level, high-performance scientific
computing in Julia. ParallelAccelerator\u27s programming model is aligned with existing
Julia programming idioms. Our compiler exposes the implicit
parallelism in high-level array-style programs and compiles them to
fast, parallel native code. Programs can also run in "library-only"
mode, letting users benefit from the full Julia environment and
libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary
Automatic Parallelization of Database Queries
Although automatic parallelization of conventional language programs is now widely accepted, relatively little emphasis has been placed on automatic parallelization of database query programs (sometimes referred to as “multiple queries” ). In this paper, we discuss the unique problems associated with automatic parallelization of database programs. From this discussion, we derive a complete approach to automatic parallelization of database programs. Beside integrating a number of existing techniques, our approach relies heavily on several new concepts, including the concepts of “algorithm-level” analysis and hybrid static/dynamic scheduling
Recommended from our members
Percolation scheduling for non-VLIW machines
Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps
A theoretical foundation for program transformations to reduce cache thrashing due to true data sharing
AbstractCache thrashing due to true data sharing can degrade the performance of parallel programs significantly. Our previous work showed that parallel task alignment via program transformations can be quite effective for the reduction of such cache thrashing. In this paper, we present a theoretical foundation for such program transformations. Based on linear algebra and the theory of numbers, our work analyzes the data dependences among the tasks created by a fork-join parallel program and determines at compile time how these tasks should be assigned to processors in order to reduce cache thrashing due to true data sharing. Our analysis and program transformations can be easily performed by compilers for parallel computers
- …