7 research outputs found

    04301 Abstracts Collection -- Cache-Oblivious and Cache-Aware Algorithms

    Get PDF
    The Dagstuhl Seminar 04301 ``Cache-Oblivious and Cache-Aware Algorithms\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl, from 18.07.2004 to 23.07.2004. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    External-memory search trees with fast insertions

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 65-68).This thesis provides both experimental and theoretical contributions regarding external-memory dynamic search trees with fast insertions. The first contribution is the implementation of the buffered repository B-tree, a data structure that provably outperforms B-trees for updates at the cost of a constant factor decrease in query performance. This thesis also describes the cache-oblivious lookahead array, which outperforms B-trees for updates at a logarithmic cost in query performance, and does so without knowing the cache parameters of the system it is being run on. The buffered repository B-tree is an external-memory search tree that can be tuned for a tradeoff between queries and updates. Specifically, for any E [1/ lg B, 1] this data structure achieves O((1/EBl-E)(1 + logB(N/B))) block transfers for INSERT and DELETE and 0((/(1 + logB(N/B))) block transfers for SEARCH. The update complexity is amortized and is O((1/e)(1 + logB(N/B))) in the worst case. Using the value = 1/2, I was able to achieve a 17 times increase in insertion performance at the cost of only a 3 times decrease in search performance on a database with 12-byte items on a disk with a 4-kilobyte block size.(cont.) This thesis also shows how to build a cache-oblivious data structure, the cache-oblivious lookahead array, which achieves the same bounds as the buffered repository B'-tree in the case where e = 1/ lg B. Specifically, it achieves an update complexity of O((1/B) log(N/B)) and a query complexity of O(log(N/B)) block transfers. This is the first data structure to achieve these bounds cache-obliviously. The research involving the cache-oblivious lookahead array represents joint work with Michael A. Bender, Jeremy Fineman, and Bradley C. Kuszmaul.by Jelani Nelson.M.Eng

    Cache-oblivious data structures for orthogonal range searching

    No full text
    We develop cache-oblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyper-rectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linear-size data structure that answers d-dimensional queries in O((N/B)1-1/d + T/B) memory transfers, where B is the block size of any two levels of a multilevel memory hierarchy. A point can be inserted into or deleted from this data structure in O(log2B N) memory transfers. We also develop a static structure for the twodimensional case that answers queries in O(logB N + T /B) memory transfers using O(N log22 N) space. The analysis of the latter structure requires that B = 22 c for some nonnegative integer constant c

    Scaling Robot Motion Planning to Multi-core Processors and the Cloud

    Get PDF
    Imagine a world in which robots safely interoperate with humans, gracefully and efficiently accomplishing everyday tasks. The robot's motions for these tasks, constrained by the design of the robot and task at hand, must avoid collisions with obstacles. Unfortunately, planning a constrained obstacle-free motion for a robot is computationally complex---often resulting in slow computation of inefficient motions. The methods in this dissertation speed up this motion plan computation with new algorithms and data structures that leverage readily available parallel processing, whether that processing power is on the robot or in the cloud, enabling robots to operate safer, more gracefully, and with improved efficiency. The contributions of this dissertation that enable faster motion planning are novel parallel lock-free algorithms, fast and concurrent nearest neighbor searching data structures, cache-aware operation, and split robot-cloud computation. Parallel lock-free algorithms avoid contention over shared data structures, resulting in empirical speedup proportional to the number of CPU cores working on the problem. Fast nearest neighbor data structures speed up searching in SO(3) and SE(3) metric spaces, which are needed for rigid body motion planning. Concurrent nearest neighbor data structures improve searching performance on metric spaces common to robot motion planning problems, while providing asymptotic wait-free concurrent operation. Cache-aware operation avoids long memory access times, allowing the algorithm to exhibit superlinear speedup. Split robot-cloud computation enables robots with low-power CPUs to react to changing environments by having the robot compute reactive paths in real-time from a set of motion plan options generated in a computationally intensive cloud-based algorithm. We demonstrate the scalability and effectiveness of our contributions in solving motion planning problems both in simulation and on physical robots of varying design and complexity. Problems include finding a solution to a complex motion planning problem, pre-computing motion plans that converge towards the optimal, and reactive interaction with dynamic environments. Robots include 2D holonomic robots, 3D rigid-body robots, a self-driving 1/10 scale car, articulated robot arms with and without mobile bases, and a small humanoid robot.Doctor of Philosoph

    木を用いた構造化並列プログラミング

    Get PDF
    High-level abstractions for parallel programming are still immature. Computations on complicated data structures such as pointer structures are considered as irregular algorithms. General graph structures, which irregular algorithms generally deal with, are difficult to divide and conquer. Because the divide-and-conquer paradigm is essential for load balancing in parallel algorithms and a key to parallel programming, general graphs are reasonably difficult. However, trees lead to divide-and-conquer computations by definition and are sufficiently general and powerful as a tool of programming. We therefore deal with abstractions of tree-based computations. Our study has started from Matsuzaki’s work on tree skeletons. We have improved the usability of tree skeletons by enriching their implementation aspect. Specifically, we have dealt with two issues. We first have implemented the loose coupling between skeletons and data structures and developed a flexible tree skeleton library. We secondly have implemented a parallelizer that transforms sequential recursive functions in C into parallel programs that use tree skeletons implicitly. This parallelizer hides the complicated API of tree skeletons and makes programmers to use tree skeletons with no burden. Unfortunately, the practicality of tree skeletons, however, has not been improved. On the basis of the observations from the practice of tree skeletons, we deal with two application domains: program analysis and neighborhood computation. In the domain of program analysis, compilers treat input programs as control-flow graphs (CFGs) and perform analysis on CFGs. Program analysis is therefore difficult to divide and conquer. To resolve this problem, we have developed divide-and-conquer methods for program analysis in a syntax-directed manner on the basis of Rosen’s high-level approach. Specifically, we have dealt with data-flow analysis based on Tarjan’s formalization and value-graph construction based on a functional formalization. In the domain of neighborhood computations, a primary issue is locality. A naive parallel neighborhood computation without locality enhancement causes a lot of cache misses. The divide-and-conquer paradigm is known to be useful also for locality enhancement. We therefore have applied algebraic formalizations and a tree-segmenting technique derived from tree skeletons to the locality enhancement of neighborhood computations.電気通信大学201
    corecore