8 research outputs found

    Reaching and grasping kitchenware objects

    Get PDF
    We integrate software components that allow ef- ficient and successful grasping of kitchenware objects. The contributed components include: The object pose detector, the gripper reaching motion and the grasp hypothesis selection. The object pose detector of Drost et. al. [10] is improved, considering rotationally symmetric objects. The reaching motion execution combines two independent dynamical systems: The approach direction system and its tangent space [21]. The coupling provides a robust reaching component that copes with several gripper configurations. The grasp hypothesis selection filters the object poses by considering the table orientation

    Structure learning of graphical models for task-oriented robot grasping

    Get PDF
    In the collective imaginaries a robot is a human like machine as any androids in science fiction. However the type of robots that you will encounter most frequently are machinery that do work that is too dangerous, boring or onerous. Most of the robots in the world are of this type. They can be found in auto, medical, manufacturing and space industries. Therefore a robot is a system that contains sensors, control systems, manipulators, power supplies and software all working together to perform a task. The development and use of such a system is an active area of research and one of the main problems is the development of interaction skills with the surrounding environment, which include the ability to grasp objects. To perform this task the robot needs to sense the environment and acquire the object informations, physical attributes that may influence a grasp. Humans can solve this grasping problem easily due to their past experiences, that is why many researchers are approaching it from a machine learning perspective finding grasp of an object using information of already known objects. But humans can select the best grasp amongst a vast repertoire not only considering the physical attributes of the object to grasp but even to obtain a certain effect. This is why in our case the study in the area of robot manipulation is focused on grasping and integrating symbolic tasks with data gained through sensors. The learning model is based on Bayesian Network to encode the statistical dependencies between the data collected by the sensors and the symbolic task. This data representation has several advantages. It allows to take into account the uncertainty of the real world, allowing to deal with sensor noise, encodes notion of causality and provides an unified network for learning. Since the network is actually implemented and based on the human expert knowledge, it is very interesting to implement an automated method to learn the structure as in the future more tasks and object features can be introduced and a complex network design based only on human expert knowledge can become unreliable. Since structure learning algorithms presents some weaknesses, the goal of this thesis is to analyze real data used in the network modeled by the human expert, implement a feasible structure learning approach and compare the results with the network designed by the expert in order to possibly enhance it

    Predicting human intention in visual observations of hand/object interactions

    Full text link
    Abstract—The main contribution of this paper is a prob-abilistic method for predicting human manipulation intention from image sequences of human-object interaction. Predicting intention amounts to inferring the imminent manipulation task when human hand is observed to have stably grasped the object. Inference is performed by means of a probabilistic graphical model that encodes object grasping tasks over the 3D state of the observed scene. The 3D state is extracted from RGB-D image sequences by a novel vision-based, markerless hand-object 3D tracking framework. To deal with the high-dimensional state-space and mixed data types (discrete and continuous) involved in grasping tasks, we introduce a generative vector quantization method using mixture models and self-organizing maps. This yields a compact model for encoding of grasping actions, able of handling uncertain and partial sensory data. Experimentation showed that the model trained on simulated data can provide a potent basis for accurate goal-inference with partial and noisy observations of actual real-world demonstrations. We also show a grasp selection process, guided by the inferred human intention, to illustrate the use of the system for goal-directed grasp imitation. I

    Unocculuded object grasping by using visual data

    Get PDF
    Automatic grasping objects can become important in the areas such as industrial processes, processes which are dangerous for human, or the operations which should be executed in the places, small for people work. In this study, it is aimed to design a robotic system for grasping unocculuded certain objects by using visual data. For this aim an experimental process was implemented. Visual data process can be divided in two main parts: identification and three dimensional positioning. Identification issue suffers from several conditions as rotation, camera position, and location of the subject in the frame. Also obtaining the features invariant from these conditions is important. Therefore Zernike moment method can be used to overcome these negativities. In order to identify the objects an artificial neural network was used to classify the objects by using Zernike moment coefficients. In the experimental system a parallel axis stereovision subsystem, a DSPFPGA embedded media processor, and five-axis robot arm were used. The success rate of artificial neural network was 98%. After identifying the objects, a sequential algebra were performed in the DSP part of the media processor and the position of the object according to robot arm reference point was extracted. After all, desired object in the instant frame was grasped and placed in different location by the robot arm

    3D object retrieval and segmentation: various approaches including 2D poisson histograms and 3D electrical charge distributions.

    Get PDF
    Nowadays 3D models play an important role in many applications: viz. games, cultural heritage, medical imaging etc. Due to the fast growth in the number of available 3D models, understanding, searching and retrieving such models have become interesting fields within computer vision. In order to search and retrieve 3D models, we present two different approaches: one is based on solving the Poisson Equation over 2D silhouettes of the models. This method uses 60 different silhouettes, which are automatically extracted from different viewangles. Solving the Poisson equation for each silhouette assigns a number to each pixel as its signature. Accumulating these signatures generates a final histogram-based descriptor for each silhouette, which we call a SilPH (Silhouette Poisson Histogram). For the second approach, we propose two new robust shape descriptors based on the distribution of charge density on the surface of a 3D model. The Finite Element Method is used to calculate the charge density on each triangular face of each model as a local feature. Then we utilize the Bag-of-Features and concentric sphere frameworks to perform global matching using these local features. In addition to examining the retrieval accuracy of the descriptors in comparison to the state-of-the-art approaches, the retrieval speeds as well as robustness to noise and deformation on different datasets are investigated. On the other hand, to understand new complex models, we have also utilized distribution of electrical charge for proposing a system to decompose models into meaningful parts. Our robust, efficient and fully-automatic segmentation approach is able to specify the segments attached to the main part of a model as well as locating the boundary parts of the segments. The segmentation ability of the proposed system is examined on the standard datasets and its timing and accuracy are compared with the existing state-of-the-art approaches

    Self-supervised learning for transferable representations

    Get PDF
    Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks
    corecore