135 research outputs found

    Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor

    No full text
    We present an approach for real-time, robust and accurate hand pose estimation from moving egocentric RGB-D cameras in cluttered real environments. Existing methods typically fail for hand-object interactions in cluttered scenes imaged from egocentric viewpoints, common for virtual or augmented reality applications. Our approach uses two subsequently applied Convolutional Neural Networks (CNNs) to localize the hand and regress 3D joint locations. Hand localization is achieved by using a CNN to estimate the 2D position of the hand center in the input, even in the presence of clutter and occlusions. The localized hand position, together with the corresponding input depth value, is used to generate a normalized cropped image that is fed into a second CNN to regress relative 3D hand joint locations in real time. For added accuracy, robustness and temporal stability, we refine the pose estimates using a kinematic pose tracking energy. To train the CNNs, we introduce a new photorealistic dataset that uses a merged reality approach to capture and synthesize large amounts of annotated data of natural hand interaction in cluttered scenes. Through quantitative and qualitative evaluation, we show that our method is robust to self-occlusion and occlusions by objects, particularly in moving egocentric perspectives

    PointNet++ Grasping: Learning An End-to-end Spatial Grasp Generation Algorithm from Sparse Point Clouds

    Full text link
    Grasping for novel objects is important for robot manipulation in unstructured environments. Most of current works require a grasp sampling process to obtain grasp candidates, combined with local feature extractor using deep learning. This pipeline is time-costly, expecially when grasp points are sparse such as at the edge of a bowl. In this paper, we propose an end-to-end approach to directly predict the poses, categories and scores (qualities) of all the grasps. It takes the whole sparse point clouds as the input and requires no sampling or search process. Moreover, to generate training data of multi-object scene, we propose a fast multi-object grasp detection algorithm based on Ferrari Canny metrics. A single-object dataset (79 objects from YCB object set, 23.7k grasps) and a multi-object dataset (20k point clouds with annotations and masks) are generated. A PointNet++ based network combined with multi-mask loss is introduced to deal with different training points. The whole weight size of our network is only about 11.6M, which takes about 102ms for a whole prediction process using a GeForce 840M GPU. Our experiment shows our work get 71.43% success rate and 91.60% completion rate, which performs better than current state-of-art works.Comment: Accepted at the International Conference on Robotics and Automation (ICRA) 202

    A Robotic Visual Grasping Design: Rethinking Convolution Neural Network with High-Resolutions

    Full text link
    High-resolution representations are important for vision-based robotic grasping problems. Existing works generally encode the input images into low-resolution representations via sub-networks and then recover high-resolution representations. This will lose spatial information, and errors introduced by the decoder will be more serious when multiple types of objects are considered or objects are far away from the camera. To address these issues, we revisit the design paradigm of CNN for robotic perception tasks. We demonstrate that using parallel branches as opposed to serial stacked convolutional layers will be a more powerful design for robotic visual grasping tasks. In particular, guidelines of neural network design are provided for robotic perception tasks, e.g., high-resolution representation and lightweight design, which respond to the challenges in different manipulation scenarios. We then develop a novel grasping visual architecture referred to as HRG-Net, a parallel-branch structure that always maintains a high-resolution representation and repeatedly exchanges information across resolutions. Extensive experiments validate that these two designs can effectively enhance the accuracy of visual-based grasping and accelerate network training. We show a series of comparative experiments in real physical environments at Youtube: https://youtu.be/Jhlsp-xzHFY

    The State of Lifelong Learning in Service Robots: Current Bottlenecks in Object Perception and Manipulation

    Get PDF
    Service robots are appearing more and more in our daily life. The development of service robots combines multiple fields of research, from object perception to object manipulation. The state-of-the-art continues to improve to make a proper coupling between object perception and manipulation. This coupling is necessary for service robots not only to perform various tasks in a reasonable amount of time but also to continually adapt to new environments and safely interact with non-expert human users. Nowadays, robots are able to recognize various objects, and quickly plan a collision-free trajectory to grasp a target object in predefined settings. Besides, in most of the cases, there is a reliance on large amounts of training data. Therefore, the knowledge of such robots is fixed after the training phase, and any changes in the environment require complicated, time-consuming, and expensive robot re-programming by human experts. Therefore, these approaches are still too rigid for real-life applications in unstructured environments, where a significant portion of the environment is unknown and cannot be directly sensed or controlled. In such environments, no matter how extensive the training data used for batch learning, a robot will always face new objects. Therefore, apart from batch learning, the robot should be able to continually learn about new object categories and grasp affordances from very few training examples on-site. Moreover, apart from robot self-learning, non-expert users could interactively guide the process of experience acquisition by teaching new concepts, or by correcting insufficient or erroneous concepts. In this way, the robot will constantly learn how to help humans in everyday tasks by gaining more and more experiences without the need for re-programming

    From Form to Function: Detecting the Affordance of Tool Parts using Geometric Features and Material Cues

    Get PDF
    With recent advances in robotics, general purpose robots like Baxter are quickly becoming a reality. As robots begin to collaborate with humans in everyday workspaces, they will need to understand the functions of objects and their parts. To cut an apple or hammer a nail, robots need to not just know a tool’s name, but they must find its parts and identify their potential functions, or affordances. As Gibson remarked, “If you know what can be done with a[n] object, what it can be used for, you can call it whatever you please.” We hypothesize that the geometry of a part is closely related to its affordance, since its geometric properties govern the possible physical interactions with the environment. In the first part of this thesis, we investigate how the affordances of tool parts can be predicted using geometric features from RGB-D sensors like Kinect. We develop several approaches to learn affordance from geometric features: using superpixel based hierarchical sparse coding, structured random forests, and convolutional neural networks. To evaluate the proposed methods, we construct a large RGB-D dataset where parts are labeled with multiple affordances. Experiments over sequences containing clutter, occlusions, and viewpoint changes show that the approaches provide precise predictions that can be used in robotics applications. In addition to geometry, the material properties of a part also determine its potential functions. In the second part of this thesis, we investigate how material cues can be integrated into a deep learning framework for affordance prediction. We propose a modular approach for combining high-level material information, or other mid-level cues, in order to improve affordance predictions. We present experiments which demonstrate the efficacy of our approach on an expanded RGB-D dataset, which includes data from non-tool objects and multiple depth sensors. The work presented in this thesis lays a foundation for the development of robots which can predict the potential functions of tool parts, and provides a basis for higher level reasoning about affordance
    • …
    corecore