1,000 research outputs found

    Point Pair Feature based Object Detection for Random Bin Picking

    Full text link
    Point pair features are a popular representation for free form 3D object detection and pose estimation. In this paper, their performance in an industrial random bin picking context is investigated. A new method to generate representative synthetic datasets is proposed. This allows to investigate the influence of a high degree of clutter and the presence of self similar features, which are typical to our application. We provide an overview of solutions proposed in literature and discuss their strengths and weaknesses. A simple heuristic method to drastically reduce the computational complexity is introduced, which results in improved robustness, speed and accuracy compared to the naive approach

    GPU-based proximity query processing on unstructured triangular mesh model

    Get PDF
    This paper presents a novel proximity query (PQ) approach capable to detect the collision and calculate the minimal Euclidean distance between two non-convex objects in 3D, namely the robot and the environment. Such approaches are often considered as computationally demanding problems, but are of importance to many applications such as online simulation of haptic feedback and robot collision-free trajectory. Our approach enables to preserve the representation of unstructured environment in the form of triangular meshes. The proposed PQ algorithm is computationally parallel so that it can be effectively implemented on graphics processing units (GPUs). A GPU-based computation scheme is also developed and customized, which shows >200 times faster than an optimized CPU with single core. Comprehensive validation is also conducted on two simulated scenarios in order to demonstrate the practical values of its potential application in image-guided surgical robotics and humanoid robotic control.published_or_final_versio

    UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

    Full text link
    In this work, we tackle the problem of learning universal robotic dexterous grasping from a point cloud observation under a table-top setting. The goal is to grasp and lift up objects in high-quality and diverse ways and generalize across hundreds of categories and even the unseen. Inspired by successful pipelines used in parallel gripper grasping, we split the task into two stages: 1) grasp proposal (pose) generation and 2) goal-conditioned grasp execution. For the first stage, we propose a novel probabilistic model of grasp pose conditioned on the point cloud observation that factorizes rotation from translation and articulation. Trained on our synthesized large-scale dexterous grasp dataset, this model enables us to sample diverse and high-quality dexterous grasp poses for the object point cloud.For the second stage, we propose to replace the motion planning used in parallel gripper grasping with a goal-conditioned grasp policy, due to the complexity involved in dexterous grasping execution. Note that it is very challenging to learn this highly generalizable grasp policy that only takes realistic inputs without oracle states. We thus propose several important innovations, including state canonicalization, object curriculum, and teacher-student distillation. Integrating the two stages, our final pipeline becomes the first to achieve universal generalization for dexterous grasping, demonstrating an average success rate of more than 60\% on thousands of object instances, which significantly outperforms all baselines, meanwhile showing only a minimal generalization gap.Comment: Accepted to CVPR 202

    Sensing Highly Non-Rigid Objects with RGBD Sensors for Robotic Systems

    Get PDF
    The goal of this research is to enable a robotic system to manipulate clothing and other highly non-rigid objects using an RGBD sensor. The focus of this thesis is to define and test various algorithms / models that are used to solve parts of the laundry process (i.e. handling, classifying, sorting, unfolding, and folding). First, a system is presented for automatically extracting and classifying items in a pile of laundry. Using only visual sensors, the robot identifies and extracts items sequentially from the pile. When an item is removed and isolated, a model is captured of the shape and appearance of the object, which is then compared against a dataset of known items. The contributions of this part of the laundry process are a novel method for extracting articles of clothing from a pile of laundry, a novel method of classifying clothing using interactive perception, and a multi-layer approach termed L-M-H, more specifically L-C-S-H for clothing classification. This thesis describes two different approaches to classify clothing into categories. The first approach relies upon silhouettes, edges, and other low-level image measurements of the articles of clothing. Experiments from the first approach demonstrate the ability of the system to efficiently classify and label into one of six categories (pants, shorts, short-sleeve shirt, long-sleeve shirt, socks, or underwear). These results show that, on average, classification rates using robot interaction are 59% higher than those that do not use interaction. The second approach relies upon color, texture, shape, and edge information from 2D and 3D data within a local and global perspective. The multi-layer approach compartmentalizes the problem into a high (H) layer, multiple mid-level (characteristics(C), selection masks(S)) layers, and a low (L) layer. This approach produces \u27local\u27 solutions to solve the global classification problem. Experiments demonstrate the ability of the system to efficiently classify each article of clothing into one of seven categories (pants, shorts, shirts, socks, dresses, cloths, or jackets). The results presented in this paper show that, on average, the classification rates improve by +27.47% for three categories, +17.90% for four categories, and +10.35% for seven categories over the baseline system, using support vector machines. Second, an algorithm is presented for automatically unfolding a piece of clothing. A piece of cloth is pulled in different directions at various points of the cloth in order to flatten the cloth. The features of the cloth are extracted and calculated to determine a valid location and orientation in which to interact with it. The features include the peak region, corner locations, and continuity / discontinuity of the cloth. In this thesis, a two-stage algorithm is presented, introducing a novel solution to the unfolding / flattening problem using interactive perception. Simulations using 3D simulation software, and experiments with robot hardware demonstrate the ability of the algorithm to flatten pieces of laundry using different starting configurations. These results show that, at most, the algorithm flattens out a piece of cloth from 11.1% to 95.6% of the canonical configuration. Third, an energy minimization algorithm is presented that is designed to estimate the configuration of a deformable object. This approach utilizes an RGBD image to calculate feature correspondence (using SURF features), depth values, and boundary locations. Input from a Kinect sensor is used to segment the deformable surface from the background using an alpha-beta swap algorithm. Using this segmentation, the system creates an initial mesh model without prior information of the surface geometry, and it reinitializes the configuration of the mesh model after a loss of input data. This approach is able to handle in-plane rotation, out-of-plane rotation, and varying changes in translation and scale. Results display the proposed algorithm over a dataset consisting of seven shirts, two pairs of shorts, two posters, and a pair of pants. The current approach is compared using a simulated shirt model in order to calculate the mean square error of the distance from the vertices on the mesh model to the ground truth, provided by the simulation model

    Tactile mesh saliency

    Get PDF
    While the concept of visual saliency has been previously explored in the areas of mesh and image processing, saliency detection also applies to other sensory stimuli. In this paper, we explore the problem of tactile mesh saliency, where we define salient points on a virtual mesh as those that a human is more likely to grasp, press, or touch if the mesh were a real-world object. We solve the problem of taking as input a 3D mesh and computing the relative tactile saliency of every mesh vertex. Since it is difficult to manually define a tactile saliency measure, we introduce a crowdsourcing and learning framework. It is typically easy for humans to provide relative rankings of saliency between vertices rather than absolute values. We thereby collect crowdsourced data of such relative rankings and take a learning-to-rank approach. We develop a new formulation to combine deep learning and learning-to-rank methods to compute a tactile saliency measure. We demonstrate our framework with a variety of 3D meshes and various applications including material suggestion for rendering and fabricatio

    Learning meshes for dense visual SLAM

    Get PDF
    Estimating motion and surrounding geometry of a moving camera remains a challenging inference problem. From an information theoretic point of view, estimates should get better as more information is included, such as is done in dense SLAM, but this is strongly dependent on the validity of the underlying models. In the present paper, we use triangular meshes as both compact and dense geometry representation. To allow for simple and fast usage, we propose a view-based formulation for which we predict the in-plane vertex coordinates directly from images and then employ the remaining vertex depth components as free variables. Flexible and continuous integration of information is achieved through the use of a residual based inference technique. This so-called factor graph encodes all information as mapping from free variables to residuals, the squared sum of which is minimised during inference. We propose the use of different types of learnable residuals, which are trained end-to-end to increase their suitability as information bearing models and to enable accurate and reliable estimation. Detailed evaluation of all components is provided on both synthetic and real data which confirms the practicability of the presented approach
    • …
    corecore