Search CORE

33,032 research outputs found

One-pass transformations of attributed program trees

Author: Alblas H.
Publication venue
Publication date: 01/01/1987
Field of study

The classical attribute grammar framework can be extended by allowing the specification of tree transformation rules. A tree transformation rule consists of an input template, an output template, enabling conditions which are predicates on attribute instances of the input template, and re-evaluation rules which define the values of attribute instances of the output template. A tree transformation may invalidate attribute instances which are needed for additional transformations.\ud \ud In this paper we investigate whether consecutive tree transformations and attribute re-evaluations are safely possible during a single pass over the derivation tree. This check is made at compiler generation time rather than at compilation time.\ud \ud A graph theoretic characterization of attribute dependencies is given, showing in which cases the recomputation of attribute instances can be done in parallel with tree transformations

University of Twente Research Information

Source-to-Source Transformations for Parallel Optimizations in STAPL

Author: Kelley Brian M
Publication venue
Publication date: 10/06/2019
Field of study

Programs that use the STAPL C++ parallel programming library express their control and data flow explicitly through the use of skeletons. Skeletons can be simple parallel operations like map and reduce, or the result of composing several skeletons. Composition is implemented by tracking the dependencies among individual data elements in the STAPL runtime system. However, the operations and dependencies within a compose skeleton can be determined at compile time from the C++ abstract syntax tree. This enables the use of source-to-source transformations to fuse the composed skeletons. Transformations can also be used to replace skeletons entirely with equivalent code. Both transformations greatly reduce STAPL runtime overhead, and zip fusion also allows a compiler to optimize the work functions as a single unit. We present a Clang compiler plugin and wrapper that automatically perform these transformations, and demonstrate its ability to improve performance

Loo.py: transformation-based code generation for GPUs and CPUs

Author: Asanovic K.
Ellson J.
Rubinsteyn A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2014
Field of study

Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype to high-performance implementation can occur in a gradual, machine-assisted form

arXiv.org e-Print Archive

Crossref

Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search

Author: Bekris Kostas E.
Boularias Abdeslam
Mitash Chaitanya
Publication venue
Publication date: 23/10/2017
Field of study

This work proposes a process for efficiently searching over combinations of individual object 6D pose hypotheses in cluttered scenes, especially in cases involving occlusions and objects resting on each other. The initial set of candidate object poses is generated from state-of-the-art object detection and global point cloud registration techniques. The best-scored pose per object by using these techniques may not be accurate due to overlaps and occlusions. Nevertheless, experimental indications provided in this work show that object poses with lower ranks may be closer to the real poses than ones with high ranks according to registration techniques. This motivates a global optimization process for improving these poses by taking into account scene-level physical interactions between objects. It also implies that the Cartesian product of candidate poses for interacting objects must be searched so as to identify the best scene-level hypothesis. To perform the search efficiently, the candidate poses for each object are clustered so as to reduce their number but still keep a sufficient diversity. Then, searching over the combinations of candidate object poses is performed through a Monte Carlo Tree Search (MCTS) process that uses the similarity between the observed depth image of the scene and a rendering of the scene given the hypothesized pose as a score that guides the search procedure. MCTS handles in a principled way the tradeoff between fine-tuning the most promising poses and exploring new ones, by using the Upper Confidence Bound (UCB) technique. Experimental results indicate that this process is able to quickly identify in cluttered scenes physically-consistent object poses that are significantly closer to ground truth compared to poses found by point cloud registration methods.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Incremental tree height reduction for code compaction

Author: Nicolau Alexandru
Potasman Roni
Publication venue: eScholarship, University of California
Publication date: 04/03/1990
Field of study

This paper introduces a new Tree Height Reduction (THR) technique for code compaction. THR, which is well known parallelizing method, has two interesting properties: while known compilation techniques can get constant factor of speed-up, THR has speed-up of O(n/logn). Furthermore, THR is able to compact code which seems, at first, uncompactable (due to data dependencies). The algorithm presented is incremental, local (so in each step, it is checking the the current operation and its predecessor rather than the whole expression tree to see whether compaction is possible) and applicable beyond basic block limits. THR is applied after all other optimization techniques, none of which change the semantics of the code, have been applied. THR is changing the semantics of the code, thus preserving, of course, the correctness of the intermediate and final values. Also, the reduction is controlled according to the resources available - so in case the compaction is feasible but there are not enough resources - it moves to the next operation. The algorithm produces compacted code suited for any tightly coupled multiprocessors (e.g. Very Long Instruction Word {or VLIW) machines). To our knowledge, it is the first local and incremental THR algorithm working across basic blocks boundaries published so far for code compaction

eScholarship - University of California