30 research outputs found

    Adaptive Methods for Point Cloud and Mesh Processing

    Point clouds and 3D meshes are widely used in numerous applications ranging from games to virtual reality to autonomous vehicles. This dissertation proposes several approaches for noise removal and calibration of noisy point cloud data and 3D mesh sharpening methods. Order statistic filters have been proven to be very successful in image processing and other domains as well. Different variations of order statistics filters originally proposed for image processing are extended to point cloud filtering in this dissertation. A brand-new adaptive vector median is proposed in this dissertation for removing noise and outliers from noisy point cloud data. The major contributions of this research lie in four aspects: 1) Four order statistic algorithms are extended, and one adaptive filtering method is proposed for the noisy point cloud with improved results such as preserving significant features. These methods are applied to standard models as well as synthetic models, and real scenes, 2) A hardware acceleration of the proposed method using Microsoft parallel pattern library for filtering point clouds is implemented using multicore processors, 3) A new method for aerial LIDAR data filtering is proposed. The objective is to develop a method to enable automatic extraction of ground points from aerial LIDAR data with minimal human intervention, and 4) A novel method for mesh color sharpening using the discrete Laplace-Beltrami operator is proposed. Median and order statistics-based filters are widely used in signal processing and image processing because they can easily remove outlier noise and preserve important features. This dissertation demonstrates a wide range of results with median filter, vector median filter, fuzzy vector median filter, adaptive mean, adaptive median, and adaptive vector median filter on point cloud data. The experiments show that large-scale noise is removed while preserving important features of the point cloud with reasonable computation time. Quantitative criteria (e.g., complexity, Hausdorff distance, and the root mean squared error (RMSE)), as well as qualitative criteria (e.g., the perceived visual quality of the processed point cloud), are employed to assess the performance of the filters in various cases corrupted by different noisy models. The adaptive vector median is further optimized for denoising or ground filtering aerial LIDAR data point cloud. The adaptive vector median is also accelerated on multi-core CPUs using Microsoft Parallel Patterns Library. In addition, this dissertation presents a new method for mesh color sharpening using the discrete Laplace-Beltrami operator, which is an approximation of second order derivatives on irregular 3D meshes. The one-ring neighborhood is utilized to compute the Laplace-Beltrami operator. The color for each vertex is updated by adding the Laplace-Beltrami operator of the vertex color weighted by a factor to its original value. Different discretizations of the Laplace-Beltrami operator have been proposed for geometrical processing of 3D meshes. This work utilizes several discretizations of the Laplace-Beltrami operator for sharpening 3D mesh colors and compares their performance. Experimental results demonstrated the effectiveness of the proposed algorithms

    VSLAM and Navigation System of Unmanned Ground Vehicle Based on RGB-D Camera

    In this thesis, ROS (Robot Operating System) is used as the software platform and a simple unmanned ground vehicle that is designed and constructed by myself is used as the hardware platform. The most critical issues in the navigation technology of unmanned ground vehicles in unknown environments -SLAM (Simultaneous Localization and Mapping) and autonomous navigation technology are studied. Through the analysis of the principle and structure of visual SLAM, a visual simultaneous localization and mapping algorithm is build. Moreover, accelerate the visual SLAM algorithm through hardware replacement and software algorithm optimization. RealSense D435 is used as the camera of the VSLAM sensor. The algorithm extracts the features from the data of depth camera and calculates the odometry information of the unmanned vehicle through the features matching of the adjacent image. Then update the vehicle’s location and map data using the odometry information. Under the condition that the visual SLAM algorithm works normally, this thesis also uses the 3D map generated to derive the real-time 2D projection map. So as to apply it to the navigation algorithm. Then this thesis realize autonomous navigation and avoids the obstacle function of unmanned vehicle by controlling the driving speed and direction of the vehicle through the navigation algorithm using the 2D projection map. Unmanned ground vehicle path planning is mainly two parts: local path planning and global path planning. Global path planning is mainly used to plan the optimal path to the destination. Local path planning is mainly used to control the speed and direction of the UGV. This thesis analyzes and compares Dijkstra’s algorithm and A* algorithm. Considering the compatible to ROS, Dijkstra’s algorithm is finally used as the global path-planning algorithm. DWA (Dynamic Window Approach) algorithm is used as Local path planning. Under the control of the Dijkstra’s algorithm and the DWA algorithm, unmanned ground vehicles can automatically plan the optimal path to the target point and avoid obstacles. This thesis also designed and constructed a simple unmanned ground vehicle as an experimental platform and design a simple control method basing on differential wheeled unmanned ground vehicle and finally realized the autonomous navigation of unmanned ground vehicles and the function of avoiding obstacles through visual SLAM algorithm and autonomous navigation algorithm. Finally, the main work and deficiencies of this thesis are summarized. And the prospects and difficulties of the research field of unmanned ground vehicles are presented

    Feature Learning for RGB-D Data

    RGB-D data has turned out to be a very useful representation for solving fundamental computer vision problems. It takes the advantages of the color images that provide appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. RGB-D image/video can facilitate a wide range of application areas, such as computer vision, robotics, construction and medical imaging. Furthermore, how to fuse RGB information and depth information is still a problem in computer vision. It is not enough to simply concatenate RGB data and depth data together. A new fusion method could better fuse RGB images and depth images. It still needs more powerful algorithms on this. In this thesis, to explore more advantages of RGB-D data, we use some popular RGB-D datasets for deep feature learning algorithms evaluation, hyper-parameter optimization, local multi-modal feature learning, RGB-D data fusion and recognizing RGB information from RGB-D images: i)With the success of Deep Neural Network in computer vision, deep features from fused RGB-D data can be proved to gain better results than RGB data only. However, different deep learning algorithms show different performance on different RGB-D datasets. Through large-scale experiments to comprehensively evaluate the performance of deep feature learning models for RGB-D image/ video classification, we obtain the conclusion that RGB-D fusion methods using CNNs always outperform other selected methods (DBNs, SDAE and LSTM). On the other side, since LSTM can learn from experience to classify, process and predict time series, it achieved better performances than DBN and SDAE in video classification tasks. ii) Hyper-parameter optimization can help researchers quickly choose an initial set of hyper-parameters for a new coming classification task, thus reducing the number of trials in terms of hyper-parameter space. We present a simple and efficient framework for improving the efficiency and accuracy of hyper-parameter optimization by considering the classification complexity of a particular dataset. We verify this framework on three real-world RGB-D datasets. After the analysis of experiments, we confirm that our framework can provide deeper insights into the relationship between dataset classification tasks and hyperparameters optimization, thus quickly choosing an accurate initial set of hyper-parameters for a new coming classification task. iii) We propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. Experiments are conducted on two popular datasets to thoroughly test the performance of our method, which show that our method with local multi-modal CNNs greatly outperforms state-of-the-art approaches. Our method has the potential to improve RGB-D scene understanding. Some extended evaluation shows that CNNs trained using a scene-centric dataset is able to achieve an improvement on scene benchmarks compared to a network trained using an object-centric dataset. iv) We propose a novel method for RGB-D data fusion. We project raw RGB-D data into a complex space and then jointly extract features from the fused RGB-D images. Besides three observations about the fusion methods, the experimental results also show that our method achieves competing performance against the classical SIFT. v) We propose a novel method called adaptive Visual-Depth Embedding (aVDE) which learns the compact shared latent space between two representations of labeled RGB and depth modalities in the source domain first. Then the shared latent space can help the transfer of the depth information to the unlabeled target dataset. At last, aVDE matches features and reweights instances jointly across the shared latent space and the projected target domain for an adaptive classifier. This method can utilize the additional depth information in the source domain and simultaneously reduce the domain mismatch between the source and target domains. On two real-world image datasets, the experimental results illustrate that the proposed method significantly outperforms the state-of-the-art methods

    Scene understanding by robotic interactive perception

    This thesis presents a novel and generic visual architecture for scene understanding by robotic interactive perception. This proposed visual architecture is fully integrated into autonomous systems performing object perception and manipulation tasks. The proposed visual architecture uses interaction with the scene, in order to improve scene understanding substantially over non-interactive models. Specifically, this thesis presents two experimental validations of an autonomous system interacting with the scene: Firstly, an autonomous gaze control model is investigated, where the vision sensor directs its gaze to satisfy a scene exploration task. Secondly, autonomous interactive perception is investigated, where objects in the scene are repositioned by robotic manipulation. The proposed visual architecture for scene understanding involving perception and manipulation tasks has four components: 1) A reliable vision system, 2) Camera-hand eye calibration to integrate the vision system into an autonomous robot’s kinematic frame chain, 3) A visual model performing perception tasks and providing required knowledge for interaction with scene, and finally, 4) A manipulation model which, using knowledge received from the perception model, chooses an appropriate action (from a set of simple actions) to satisfy a manipulation task. This thesis presents contributions for each of the aforementioned components. Firstly, a portable active binocular robot vision architecture that integrates a number of visual behaviours are presented. This active vision architecture has the ability to verge, localise, recognise and simultaneously identify multiple target object instances. The portability and functional accuracy of the proposed vision architecture is demonstrated by carrying out both qualitative and comparative analyses using different robot hardware configurations, feature extraction techniques and scene perspectives. Secondly, a camera and hand-eye calibration methodology for integrating an active binocular robot head within a dual-arm robot are described. For this purpose, the forward kinematic model of the active robot head is derived and the methodology for calibrating and integrating the robot head is described in detail. A rigid calibration methodology has been implemented to provide a closed-form hand-to-eye calibration chain and this has been extended with a mechanism to allow the camera external parameters to be updated dynamically for optimal 3D reconstruction to meet the requirements for robotic tasks such as grasping and manipulating rigid and deformable objects. It is shown from experimental results that the robot head achieves an overall accuracy of fewer than 0.3 millimetres while recovering the 3D structure of a scene. In addition, a comparative study between current RGB-D cameras and our active stereo head within two dual-arm robotic test-beds is reported that demonstrates the accuracy and portability of our proposed methodology. Thirdly, this thesis proposes a visual perception model for the task of category-wise objects sorting, based on Gaussian Process (GP) classification that is capable of recognising objects categories from point cloud data. In this approach, Fast Point Feature Histogram (FPFH) features are extracted from point clouds to describe the local 3D shape of objects and a Bag-of-Words coding method is used to obtain an object-level vocabulary representation. Multi-class Gaussian Process classification is employed to provide a probability estimate of the identity of the object and serves the key role of modelling perception confidence in the interactive perception cycle. The interaction stage is responsible for invoking the appropriate action skills as required to confirm the identity of an observed object with high confidence as a result of executing multiple perception-action cycles. The recognition accuracy of the proposed perception model has been validated based on simulation input data using both Support Vector Machine (SVM) and GP based multi-class classifiers. Results obtained during this investigation demonstrate that by using a GP-based classifier, it is possible to obtain true positive classification rates of up to 80\%. Experimental validation of the above semi-autonomous object sorting system shows that the proposed GP based interactive sorting approach outperforms random sorting by up to 30\% when applied to scenes comprising configurations of household objects. Finally, a fully autonomous visual architecture is presented that has been developed to accommodate manipulation skills for an autonomous system to interact with the scene by object manipulation. This proposed visual architecture is mainly made of two stages: 1) A perception stage, that is a modified version of the aforementioned visual interaction model, 2) An interaction stage, that performs a set of ad-hoc actions relying on the information received from the perception stage. More specifically, the interaction stage simply reasons over the information (class label and associated probabilistic confidence score) received from perception stage to choose one of the following two actions: 1) An object class has been identified with high confidence, so remove from the scene and place it in the designated basket/bin for that particular class. 2) An object class has been identified with less probabilistic confidence, since from observation and inspired from the human behaviour of inspecting doubtful objects, an action is chosen to further investigate that object in order to confirm the object’s identity by capturing more images from different views in isolation. The perception stage then processes these views, hence multiple perception-action/interaction cycles take place. From an application perspective, the task of autonomous category based objects sorting is performed and the experimental design for the task is described in detail

    Sensor Network Based Collision-Free Navigation and Map Building for Mobile Robots

    Safe robot navigation is a fundamental research field for autonomous robots including ground mobile robots and flying robots. The primary objective of a safe robot navigation algorithm is to guide an autonomous robot from its initial position to a target or along a desired path with obstacle avoidance. With the development of information technology and sensor technology, the implementations combining robotics with sensor network are focused on in the recent researches. One of the relevant implementations is the sensor network based robot navigation. Moreover, another important navigation problem of robotics is safe area search and map building. In this report, a global collision-free path planning algorithm for ground mobile robots in dynamic environments is presented firstly. Considering the advantages of sensor network, the presented path planning algorithm is developed to a sensor network based navigation algorithm for ground mobile robots. The 2D range finder sensor network is used in the presented method to detect static and dynamic obstacles. The sensor network can guide each ground mobile robot in the detected safe area to the target. Furthermore, the presented navigation algorithm is extended into 3D environments. With the measurements of the sensor network, any flying robot in the workspace is navigated by the presented algorithm from the initial position to the target. Moreover, in this report, another navigation problem, safe area search and map building for ground mobile robot, is studied and two algorithms are presented. In the first presented method, we consider a ground mobile robot equipped with a 2D range finder sensor searching a bounded 2D area without any collision and building a complete 2D map of the area. Furthermore, the first presented map building algorithm is extended to another algorithm for 3D map building

    Toward Effective Physical Human-Robot Interaction

    With the fast advancement of technology, in recent years, robotics technology has significantly matured and produced robots that are able to operate in unstructured environments such as domestic environments, offices, hospitals and other human-inhabited locations. In this context, the interaction and cooperation between humans and robots has become an important and challenging aspect of robot development. Among the various kinds of possible interactions, in this Ph.D. thesis I am particularly interested in physical human-robot interaction (pHRI). In order to study how a robot can successfully engage in physical interaction with people and which factors are crucial during this kind of interaction, I investigated how humans and robots can hand over objects to each other. To study this specific interactive task I developed two robotic prototypes and conducted human-robot user studies. Although various aspects of human-robot handovers have been deeply investigated in the state of the art, during my studies I focused on three issues that have been rarely investigated so far: Human presence and motion analysis during the interaction in order to infer non-verbal communication cues and to synchronize the robot actions with the human motion; Development and evaluation of human-aware pro-active robot behaviors that enable robots to behave actively in the proximity of the human body in order to negotiate the handover location and to perform the transfer of the object; Consideration of objects grasp affordances during the handover in order to make the interaction more comfortable for the human

    Situation Assessment for Mobile Robots

    The Internet of Things Will Thrive by 2025

    This report is the latest research report in a sustained effort throughout 2014 by the Pew Research Center Internet Project to mark the 25th anniversary of the creation of the World Wide Web by Sir Tim Berners-LeeThis current report is an analysis of opinions about the likely expansion of the Internet of Things (sometimes called the Cloud of Things), a catchall phrase for the array of devices, appliances, vehicles, wearable material, and sensor-laden parts of the environment that connect to each other and feed data back and forth. It covers the over 1,600 responses that were offered specifically about our question about where the Internet of Things would stand by the year 2025. The report is the next in a series of eight Pew Research and Elon University analyses to be issued this year in which experts will share their expectations about the future of such things as privacy, cybersecurity, and net neutrality. It includes some of the best and most provocative of the predictions survey respondents made when specifically asked to share their views about the evolution of embedded and wearable computing and the Internet of Things