58 research outputs found

    Acceleration of stereo-matching on multi-core CPU and GPU

    Get PDF
    This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding robot with real-time and high resolution requirements for the vision system. The performance analysis shows that the parallelised stereo-matching algorithm has been significantly accelerated, maintaining 12x and 176x speed-up respectively for multi-core CPU and GPU, compared with non-SIMD singlethread CPU. To analyse the origin of the speed-up and gain deeper understanding about the choice of the optimal hardware, the algorithm was broken into key sub-tasks and the performance was tested for four different hardware architectures

    Acceleration of stereo-matching on multi-core CPU and GPU

    Get PDF
    This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding robot with real-time and high resolution requirements for the vision system. The performance analysis shows that the parallelised stereo-matching algorithm has been significantly accelerated, maintaining 12x and 176x speed-up respectively for multi-core CPU and GPU, compared with non-SIMD singlethread CPU. To analyse the origin of the speed-up and gain deeper understanding about the choice of the optimal hardware, the algorithm was broken into key sub-tasks and the performance was tested for four different hardware architectures

    Navigating the Landscape for Real-time Localisation and Mapping for Robotics, Virtual and Augmented Reality

    Get PDF
    Visual understanding of 3D environments in real-time, at low power, is a huge computational challenge. Often referred to as SLAM (Simultaneous Localisation and Mapping), it is central to applications spanning domestic and industrial robotics, autonomous vehicles, virtual and augmented reality. This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists in selecting and configuring the appropriate algorithm and the appropriate hardware, and compilation pathway, to meet their performance, accuracy, and energy consumption goals. The major contributions we present are (1) tools and methodology for systematic quantitative evaluation of SLAM algorithms, (2) automated, machine-learning-guided exploration of the algorithmic and implementation design space with respect to multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches, and (4) tools for delivering, where appropriate, accelerated, adaptive SLAM solutions in a managed, JIT-compiled, adaptive runtime context.Comment: Proceedings of the IEEE 201

    Paralleizing AwSpPCA for robust facial recognition using CUDA

    Get PDF
    This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2014.Cataloged from PDF version of thesis report.This paper was conducted to analyze the performance benefits of parallelizing the Adaptive Weighted Sub-patterned Principle Component Analysis (Aw SP PCA) algorithm, given that the algorithm is implemented so as to retain the accuracy from its serialized version. The serialized execution of this algorithm is analyzed first and then compared against its parallel implementation, both compiled and run on the same computer. Throughout this paper, the methodology is to undergo a step by step procedure which can clearly outline and describe the problems faced when trying to parallelize this algorithm. It will also describe where, how and why parallelizing procedures were used. The results of the research have shown that while not all parts of the algorithm can be implemented in parallel in the first place, some of the sections that can be parallelized does not necessarily yield a considerable amount of benefits. Also, it was seen that not all sections scale well with problem size, meaning that some portions of the algorithm can be left in its serialized state without much loss in time. The sections which can be parallelized were discussed in detail. Some changes were also made to certain variables to ensure the best accuracy possible. Finally, through analysis and experimentation, a speedup of 2.76 was achieved, with a recognition accuracy of 92.6%.Syed Amer ZawadAshfaque AliB. Computer Science and Engineerin

    Mobile robot tank with GPU assistance

    Get PDF
    Robotic research has been costly and tremendously time consuming; this is due to the cost of sensors, motors, computational unit, physical construction, and fabrication. There are also a lot of man hours clocked in for algorithm design, software development, debugging and optimizing. Robotic vision input usually consists of 2 or more color cameras to construct a 3D virtual space. Additional cameras can also be added to enrich the detailed virtual 3D environment, however, the computational complexity increases when more cameras are added. This is due to not only the additional processing power that is required running the software but also the complexity of stitching multiple cameras together to form a sensor. Another method of creating the 3D virtual space is the utilization of range finder sensors. These types of sensors are usually relatively expensive and still require complex algorithms for calibration and correlation to real life distances. Sensing of a robot position can be facilitated by the addition of accelerometers and gyroscope sensors. A significant robotic design is robot interaction. One type of interaction is through verbal exchange. Such interaction requires an audio input receiver and transmitter on the robot. In order to achieve acceptable recognitions, different types of audio receivers may be implemented and many receivers are required to be deployed. Depending on the environment, noise cancellation hardware and software may be needed to enhance the verbal interaction performance. In this thesis different audio sensors are evaluated and implemented with Microsoft Speech Platform. Any robotic research requires a proper control algorithm and logic process. As a result, the majority of these control algorithms rely heavily on mathematics. High performance computational processing power is needed to process all the raw data in real-time. However, any high performance computation proportionally consumes more energy, so to conserve battery life on the robot, many robotic researchers implement remote computation by processing the raw data remotely. This implementation has one major drawback: in order to transmit raw data remotely, a robust communication infrastructure is needed. Without a robust communication the robot will suffers in failures due to the sheer amount of raw data in communication. I am proposing a solution to the computation problem by harvesting the General Purpose Graphic Processing Unit (GPU)’s computational power for complex mathematical raw data processing. This approach utilizes many-cores parallelism for multithreading real-time computation with a minimum of 10x the computational flops of traditional Central Processing Unit (CPU). By shifting the computation on the GPU the entire computation will be done locally on the robot itself to eliminate the need of any external communication system for remote data processing. While the GPU is used to perform image processing for the robot, the CPU is allowed to dedicate all of its processing power to run the other functions of the robot. Computer vision has been an interesting topic for a while; it utilizes complex mathematical techniques and algorithms in an attempt to achieve image processing in real-time. Open Source Computer Vision Library (OpenCV) is a library consisting of pre-programed image processing functions for several different languages, developed by Intel. This library greatly reduces the amount of development time. Microsoft Kinect for XBOX has all the sensors mentioned above in a single convenient package with first party SDK for software development. I perform the robotic implementation utilizing Microsoft Kinect for XBOX as the primary sensor, OpenCV for image processing functions and NVidia GPU to off-load complex mathematical raw data processing for the robot. This thesis’s objective is to develop and implement a low cost autonomous robot tank with Microsoft Kinect for XBOX as the primary sensor, a custom onboard Small Form Factor (SFF) High Performance Computer (HPC) with NVidia GPU assistance for the primary computation, and OpenCV library for image processing functions

    Mobile Robots

    Get PDF
    The objective of this book is to cover advances of mobile robotics and related technologies applied for multi robot systems' design and development. Design of control system is a complex issue, requiring the application of information technologies to link the robots into a single network. Human robot interface becomes a demanding task, especially when we try to use sophisticated methods for brain signal processing. Generated electrophysiological signals can be used to command different devices, such as cars, wheelchair or even video games. A number of developments in navigation and path planning, including parallel programming, can be observed. Cooperative path planning, formation control of multi robotic agents, communication and distance measurement between agents are shown. Training of the mobile robot operators is very difficult task also because of several factors related to different task execution. The presented improvement is related to environment model generation based on autonomous mobile robot observations

    MASSIVELY PARALLEL ALGORITHMS FOR POINT CLOUD BASED OBJECT RECOGNITION ON HETEROGENEOUS ARCHITECTURE

    Get PDF
    With the advent of new commodity depth sensors, point cloud data processing plays an increasingly important role in object recognition and perception. However, the computational cost of point cloud data processing is extremely high due to the large data size, high dimensionality, and algorithmic complexity. To address the computational challenges of real-time processing, this work investigates the possibilities of using modern heterogeneous computing platforms and its supporting ecosystem such as massively parallel architecture (MPA), computing cluster, compute unified device architecture (CUDA), and multithreaded programming to accelerate the point cloud based object recognition. The aforementioned computing platforms would not yield high performance unless the specific features are properly utilized. Failing that the result actually produces an inferior performance. To achieve the high-speed performance in image descriptor computing, indexing, and matching in point cloud based object recognition, this work explores both coarse and fine grain level parallelism, identifies the acceptable levels of algorithmic approximation, and analyzes various performance impactors. A set of heterogeneous parallel algorithms are designed and implemented in this work. These algorithms include exact and approximate scalable massively parallel image descriptors for descriptor computing, parallel construction of k-dimensional tree (KD-tree) and the forest of KD-trees for descriptor indexing, parallel approximate nearest neighbor search (ANNS) and buffered ANNS (BANNS) on the KD-tree and the forest of KD-trees for descriptor matching. The results show that the proposed massively parallel algorithms on heterogeneous computing platforms can significantly improve the execution time performance of feature computing, indexing, and matching. Meanwhile, this work demonstrates that the heterogeneous computing architectures, with appropriate architecture specific algorithms design and optimization, have the distinct advantages of improving the performance of multimedia applications

    Improving GPU performance : reducing memory conflicts and latency

    Get PDF
    corecore