33,032 research outputs found
One-pass transformations of attributed program trees
The classical attribute grammar framework can be extended by allowing the specification of tree transformation rules. A tree transformation rule consists of an input template, an output template, enabling conditions which are predicates on attribute instances of the input template, and re-evaluation rules which define the values of attribute instances of the output template. A tree transformation may invalidate attribute instances which are needed for additional transformations.\ud
\ud
In this paper we investigate whether consecutive tree transformations and attribute re-evaluations are safely possible during a single pass over the derivation tree. This check is made at compiler generation time rather than at compilation time.\ud
\ud
A graph theoretic characterization of attribute dependencies is given, showing in which cases the recomputation of attribute instances can be done in parallel with tree transformations
Source-to-Source Transformations for Parallel Optimizations in STAPL
Programs that use the STAPL C++ parallel programming library express their control and data flow explicitly through the use of skeletons. Skeletons can be simple parallel operations like map and reduce, or the result of composing several skeletons. Composition is implemented by tracking the dependencies among individual data elements in the STAPL runtime system. However, the operations and dependencies within a compose skeleton can be determined at compile time from the C++ abstract syntax tree. This enables the use of source-to-source transformations to fuse the composed skeletons. Transformations can also be used to replace skeletons entirely with equivalent code. Both transformations greatly reduce STAPL runtime overhead, and zip fusion also allows a compiler to optimize the work functions as a single unit. We present a Clang compiler plugin and wrapper that automatically perform these transformations, and demonstrate its ability to improve performance
Loo.py: transformation-based code generation for GPUs and CPUs
Today's highly heterogeneous computing landscape places a burden on
programmers wanting to achieve high performance on a reasonably broad
cross-section of machines. To do so, computations need to be expressed in many
different but mathematically equivalent ways, with, in the worst case, one
variant per target machine.
Loo.py, a programming system embedded in Python, meets this challenge by
defining a data model for array-style computations and a library of
transformations that operate on this model. Offering transformations such as
loop tiling, vectorization, storage management, unrolling, instruction-level
parallelism, change of data layout, and many more, it provides a convenient way
to capture, parametrize, and re-unify the growth among code variants. Optional,
deep integration with numpy and PyOpenCL provides a convenient computing
environment where the transition from prototype to high-performance
implementation can occur in a gradual, machine-assisted form
Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search
This work proposes a process for efficiently searching over combinations of
individual object 6D pose hypotheses in cluttered scenes, especially in cases
involving occlusions and objects resting on each other. The initial set of
candidate object poses is generated from state-of-the-art object detection and
global point cloud registration techniques. The best-scored pose per object by
using these techniques may not be accurate due to overlaps and occlusions.
Nevertheless, experimental indications provided in this work show that object
poses with lower ranks may be closer to the real poses than ones with high
ranks according to registration techniques. This motivates a global
optimization process for improving these poses by taking into account
scene-level physical interactions between objects. It also implies that the
Cartesian product of candidate poses for interacting objects must be searched
so as to identify the best scene-level hypothesis. To perform the search
efficiently, the candidate poses for each object are clustered so as to reduce
their number but still keep a sufficient diversity. Then, searching over the
combinations of candidate object poses is performed through a Monte Carlo Tree
Search (MCTS) process that uses the similarity between the observed depth image
of the scene and a rendering of the scene given the hypothesized pose as a
score that guides the search procedure. MCTS handles in a principled way the
tradeoff between fine-tuning the most promising poses and exploring new ones,
by using the Upper Confidence Bound (UCB) technique. Experimental results
indicate that this process is able to quickly identify in cluttered scenes
physically-consistent object poses that are significantly closer to ground
truth compared to poses found by point cloud registration methods.Comment: 8 pages, 4 figure
Recommended from our members
Incremental tree height reduction for code compaction
This paper introduces a new Tree Height Reduction (THR) technique for code compaction. THR, which is well known parallelizing method, has two interesting properties: while known compilation techniques can get constant factor of speed-up, THR has speed-up of O(n/logn). Furthermore, THR is able to compact code which seems, at first, uncompactable (due to data dependencies). The algorithm presented is incremental, local (so in each step, it is checking the the current operation and its predecessor rather than the whole expression tree to see whether compaction is possible) and applicable beyond basic block limits. THR is applied after all other optimization techniques, none of which change the semantics of the code, have been applied. THR is changing the semantics of the code, thus preserving, of course, the correctness of the intermediate and final values. Also, the reduction is controlled according to the resources available - so in case the compaction is feasible but there are not enough resources - it moves to the next operation. The algorithm produces compacted code suited for any tightly coupled multiprocessors (e.g. Very Long Instruction Word {or VLIW) machines). To our knowledge, it is the first local and incremental THR algorithm working across basic blocks boundaries published so far for code compaction
- …