394 research outputs found

    MASSIVELY PARALLEL ALGORITHMS FOR POINT CLOUD BASED OBJECT RECOGNITION ON HETEROGENEOUS ARCHITECTURE

    Get PDF
    With the advent of new commodity depth sensors, point cloud data processing plays an increasingly important role in object recognition and perception. However, the computational cost of point cloud data processing is extremely high due to the large data size, high dimensionality, and algorithmic complexity. To address the computational challenges of real-time processing, this work investigates the possibilities of using modern heterogeneous computing platforms and its supporting ecosystem such as massively parallel architecture (MPA), computing cluster, compute unified device architecture (CUDA), and multithreaded programming to accelerate the point cloud based object recognition. The aforementioned computing platforms would not yield high performance unless the specific features are properly utilized. Failing that the result actually produces an inferior performance. To achieve the high-speed performance in image descriptor computing, indexing, and matching in point cloud based object recognition, this work explores both coarse and fine grain level parallelism, identifies the acceptable levels of algorithmic approximation, and analyzes various performance impactors. A set of heterogeneous parallel algorithms are designed and implemented in this work. These algorithms include exact and approximate scalable massively parallel image descriptors for descriptor computing, parallel construction of k-dimensional tree (KD-tree) and the forest of KD-trees for descriptor indexing, parallel approximate nearest neighbor search (ANNS) and buffered ANNS (BANNS) on the KD-tree and the forest of KD-trees for descriptor matching. The results show that the proposed massively parallel algorithms on heterogeneous computing platforms can significantly improve the execution time performance of feature computing, indexing, and matching. Meanwhile, this work demonstrates that the heterogeneous computing architectures, with appropriate architecture specific algorithms design and optimization, have the distinct advantages of improving the performance of multimedia applications

    Matching non-uniformity for program optimizations on heterogeneous many-core systems

    Get PDF
    As computing enters an era of heterogeneity and massive parallelism, it exhibits a distinct feature: the deepening non-uniform relations among the computing elements in both hardware and software. Besides traditional non-uniform memory accesses, much deeper non-uniformity shows in a processor, runtime, and application, exemplified by the asymmetric cache sharing, memory coalescing, and thread divergences on multicore and many-core processors. Being oblivious to the non-uniformity, current applications fail to tap into the full potential of modern computing devices.;My research presents a systematic exploration into the emerging property. It examines the existence of such a property in modern computing, its influence on computing efficiency, and the challenges for establishing a non-uniformity--aware paradigm. I propose several techniques to translate the property into efficiency, including data reorganization to eliminate non-coalesced accesses, asynchronous data transformations for locality enhancement and a controllable scheduling for exploiting non-uniformity among thread blocks. The experiments show much promise of these techniques in maximizing computing throughput, especially for programs with complex data access patterns

    General Purpose Flow Visualization at the Exascale

    Get PDF
    Exascale computing, i.e., supercomputers that can perform 1018 math operations per second, provide significant opportunity for improving the computational sciences. That said, these machines can be difficult to use efficiently, due to their massive parallelism, due to the use of accelerators, and due to the diversity of accelerators used. All areas of the computational science stack need to be reconsidered to address these problems. With this dissertation, we consider flow visualization, which is critical for analyzing vector field data from simulations. We specifically consider flow visualization techniques that use particle advection, i.e., tracing particle trajectories, which presents performance and implementation challenges. The dissertation makes four primary contributions. First, it synthesizes previous work on particle advection performance and introduces a high-level analytical cost model. Second, it proposes an approach for performance portability across accelerators. Third, it studies expected speedups based on using accelerators, including the importance of factors such as duration, particle count, data set, and others. Finally, it proposes an exascale-capable particle advection system that addresses diversity in many dimensions, including accelerator type, parallelism approach, analysis use case, underlying vector field, and more

    Development of a Motion Planner based on Nonlinear Optimization for Free-Floating Robots

    Get PDF
    While the launch of spacecraft and satellites is increasing rapidly, the accumulation of debris in space creates an obstacle for future space missions. Free-floating robots are used to tackle the issue of grasping the tumbling, non-cooperative space debris and stabilizing them. Free-floating robots are manipulator arms attached to the base of an unactuated satellite. The core problem is calculating the robot's trajectory for grasping as the debris has unknown parameters (momentum, pose, et cetera) within a specific time window. The use of an optimization-based motion planner is considered in this research as it provides the criterion for robustness and stability. Trajectory planning is formulated as an Optimal Control Problem with various motion constraints and a cost function that has to be minimized. The drawback of this technique is the time consumption. Parallelization techniques, using multi-core CPU and GPU, will be analyzed to reduce the time. Comparison between different techniques for solution of the equation of motion for the non-holonomic system, optimization methodologies will also be carried out. The methodology undertaken is as follows: The Optimal Control problem is analyzed, and parts that could parallelize within the Optimal Control Framework will be determined. OpenMP-based parallel functions for CPU processing and CUDA-based parallel functions for GPU processing are to be developed. The speedup gain, if any, for the developed parallel functions will be examined. Different optimization methodologies will be implemented and compared. The trajectory optimization problem will be drawn, with new task space. The parallel routines developed, which provide a significant improvement over the state-of-the-art techniques, will be implemented for the motion planning problem. The statistical analysis of the results from the motion planner will be carried out

    A computing origami: Optimized code generation for emerging parallel platforms

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore