11,446 research outputs found
Recommended from our members
Percolation scheduling with resource constraints
This paper presents a new approach to resource-constrained compiler extraction of fine-grain parallelism, targeted towards VLIW supercomputers, and in particular, the IBM VLIW (Very Large Instruction Word) processor. The algorithms described integrate resource limitations into Percolation Scheduling—a global parallelization technique—to deal with resource constraints, without sacrificing the generality and completeness of Percolation Scheduling in the process. This is in sharp contrast with previous approaches which either applied only to conditional-free code, or drastically limited the parallelization process by imposing relatively local heuristic resource constraints early in the scheduling process
Recommended from our members
Computer-aided programming for multiprocessing systems
As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. This report discusses parallel models of computation and tools for computer-aided programming (CAP). Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. In particular, a CAP tool, named Hypertool, is described here. It performs scheduling and handles the communication primitive insertion automatically so that many errors are eliminated. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs. Experiments have shown that up to a 300% performance improvement can be achieved by computer-aided programming
A review of parallel computing for large-scale remote sensing image mosaicking
Interest in image mosaicking has been spurred by a wide variety of research and management needs. However, for large-scale applications, remote sensing image mosaicking usually requires significant computational capabilities. Several studies have attempted to apply parallel computing to improve image mosaicking algorithms and to speed up calculation process. The state of the art of this field has not yet been summarized, which is, however, essential for a better understanding and for further research of image mosaicking parallelism on a large scale. This paper provides a perspective on the current state of image mosaicking parallelization for large scale applications. We firstly introduce the motivation of image mosaicking parallel for large scale application, and analyze the difficulty and problem of parallel image mosaicking at large scale such as scheduling with huge number of dependent tasks, programming with multiple-step procedure, dealing with frequent I/O operation. Then we summarize the existing studies of parallel computing in image mosaicking for large scale applications with respect to problem decomposition and parallel strategy, parallel architecture, task schedule strategy and implementation of image mosaicking parallelization. Finally, the key problems and future potential research directions for image mosaicking are addressed
Polly's Polyhedral Scheduling in the Presence of Reductions
The polyhedral model provides a powerful mathematical abstraction to enable
effective optimization of loop nests with respect to a given optimization goal,
e.g., exploiting parallelism. Unexploited reduction properties are a frequent
reason for polyhedral optimizers to assume parallelism prohibiting dependences.
To our knowledge, no polyhedral loop optimizer available in any production
compiler provides support for reductions. In this paper, we show that
leveraging the parallelism of reductions can lead to a significant performance
increase. We give a precise, dependence based, definition of reductions and
discuss ways to extend polyhedral optimization to exploit the associativity and
commutativity of reduction computations. We have implemented a
reduction-enabled scheduling approach in the Polly polyhedral optimizer and
evaluate it on the standard Polybench 3.2 benchmark suite. We were able to
detect and model all 52 arithmetic reductions and achieve speedups up to
2.21 on a quad core machine by exploiting the multidimensional
reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho
- …