550 research outputs found

    Deep Learning 3D Scans for Footwear Fit Estimation from a Single Depth Map

    Get PDF
    In clothing and particularly in footwear, the variance in the size and shape of people and of clothing poses a problem of how to match items of clothing to a person. This is specifically important in footwear, as fit is highly dependent on foot shape, which is not fully captured by shoe size. 3D scanning can be used to determine detailed personalized shape information, which can then be used to match against product shape for a more per- sonalized footwear matching experience. In current implementations however, this process is typically expensive and cumbersome. Typical scanning techniques require that a camera capture an object from many views in order to reconstruct shape. This usually requires either many cameras or a moving camera system, both of which being complex engineering tasks to construct. Ideally, in order to reduce the cost and complexity of scanning systems as much as possible, only a single image from a single camera would be needed. With recent techniques, semantics such as knowing the kind of object in view can be leveraged to determine the full 3D shape given incomplete information. Deep learning methods have been shown to be able to reconstruct 3D shape from limited inputs in highly symmetrical objects such as furniture and vehicles. We apply a deep learning approach to the domain of foot scanning, and present meth- ods to reconstruct a 3D point cloud from a single input depth map. Anthropomorphic body parts can be challenging due to their irregular shapes, difficulty for parameterizing and limited symmetries. We present two methods leveraging deep learning models to pro- duce complete foot scans from a single input depth map. We utilize 3D data from MPII Human Shape based on the CAESAR database, and train deep neural networks to learn anthropomorphic shape representations. Our first method attempts to complete the point cloud supplied by the input depth map by simply synthesizing the remaining information. We show that this method is capable of synthesizing the remainder of a point cloud with accuracies of 2.92±0.72 mm, and can be improved to accuracies of 2.55±0.75 mm when using an updated network architecture. Our second method fully synthesizes a complete point cloud foot scan from multiple virtual view points. We show that this method can produce foot scans with accuracies of 1.55±0.41 mm from a single input depth map. We performed additional experiments on real world foot scans captured using Kinect Fusion. We find that despite being trained only on a low resolution representation of foot shape, our models are able to recognize and synthesize reasonable complete point cloud scans. Our results suggest that our methods can be extended to work in the real world, with additional domain specific data

    DVGG: Deep Variational Grasp Generation for Dextrous Manipulation

    Full text link
    Grasping with anthropomorphic robotic hands involves much more hand-object interactions compared to parallel-jaw grippers. Modeling hand-object interactions is essential to the study of multi-finger hand dextrous manipulation. This work presents DVGG, an efficient grasp generation network that takes single-view observation as input and predicts high-quality grasp configurations for unknown objects. In general, our generative model consists of three components: 1) Point cloud completion for the target object based on the partial observation; 2) Diverse sets of grasps generation given the complete point cloud; 3) Iterative grasp pose refinement for physically plausible grasp optimization. To train our model, we build a large-scale grasping dataset that contains about 300 common object models with 1.5M annotated grasps in simulation. Experiments in simulation show that our model can predict robust grasp poses with a wide variety and high success rate. Real robot platform experiments demonstrate that the model trained on our dataset performs well in the real world. Remarkably, our method achieves a grasp success rate of 70.7\% for novel objects in the real robot platform, which is a significant improvement over the baseline methods.Comment: Accepted by Robotics and Automation Letters (RA-L, 2021

    Generalized Anthropomorphic Functional Grasping with Minimal Demonstrations

    Full text link
    This article investigates the challenge of achieving functional tool-use grasping with high-DoF anthropomorphic hands, with the aim of enabling anthropomorphic hands to perform tasks that require human-like manipulation and tool-use. However, accomplishing human-like grasping in real robots present many challenges, including obtaining diverse functional grasps for a wide variety of objects, handling generalization ability for kinematically diverse robot hands and precisely completing object shapes from a single-view perception. To tackle these challenges, we propose a six-step grasp synthesis algorithm based on fine-grained contact modeling that generates physically plausible and human-like functional grasps for category-level objects with minimal human demonstrations. With the contact-based optimization and learned dense shape correspondence, the proposed algorithm is adaptable to various objects in same category and a board range of robot hand models. To further demonstrate the robustness of the framework, over 10K functional grasps are synthesized to train our neural network, named DexFG-Net, which generates diverse sets of human-like functional grasps based on the reconstructed object model produced by a shape completion module. The proposed framework is extensively validated in simulation and on a real robot platform. Simulation experiments demonstrate that our method outperforms baseline methods by a large margin in terms of grasp functionality and success rate. Real robot experiments show that our method achieved an overall success rate of 79\% and 68\% for tool-use grasp on 3-D printed and real test objects, respectively, using a 5-Finger Schunk Hand. The experimental results indicate a step towards human-like grasping with anthropomorphic hands.Comment: 20 pages, 23 figures and 7 table

    Reconstruction and recognition of confusable models using three-dimensional perception

    Get PDF
    Perception is one of the key topics in robotics research. It is about the processing of external sensor data and its interpretation. The necessity of fully autonomous robots makes it crucial to help them to perform tasks more reliably, flexibly, and efficiently. As these platforms obtain more refined manipulation capabilities, they also require expressive and comprehensive environment models: for manipulation and affordance purposes, their models have to involve each one of the objects present in the world, coincidentally with their location, pose, shape and other aspects. The aim of this dissertation is to provide a solution to several of these challenges that arise when meeting the object grasping problem, with the aim of improving the autonomy of the mobile manipulator robot MANFRED-2. By the analysis and interpretation of 3D perception, this thesis covers in the first place the localization of supporting planes in the scenario. As the environment will contain many other things apart from the planar surface, the problem within cluttered scenarios has been solved by means of Differential Evolution, which is a particlebased evolutionary algorithm that evolves in time to the solution that yields the cost function lowest value. Since the final purpose of this thesis is to provide with valuable information for grasping applications, a complete model reconstructor has been developed. The proposed method holdsmany features such as robustness against abrupt rotations, multi-dimensional optimization, feature extensibility, compatible with other scan matching techniques, management of uncertain information and an initialization process to reduce convergence timings. It has been designed using a evolutionarybased scan matching optimizer that takes into account surface features of the object, global form and also texture and color information. The last tackled challenge regards the recognition problem. In order to procure with worthy information about the environment to the robot, a meta classifier that discerns efficiently the observed objects has been implemented. It is capable of distinguishing between confusable objects, such as mugs or dishes with similar shapes but different size or color. The contributions presented in this thesis have been fully implemented and empirically evaluated in the platform. A continuous grasping pipeline covering from perception to grasp planning including visual object recognition for confusable objects has been developed. For that purpose, an indoor environment with several objects on a table is presented in the nearby of the robot. Items are recognized from a database and, if one is chosen, the robot will calculate how to grasp it taking into account the kinematic restrictions associated to the anthropomorphic hand and the 3D model for this particular object. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------La percepción es uno de los temas más relevantes en el mundo de la investigaci ón en robótica. Su objetivo es procesar e interpretar los datos recibidos por un sensor externo. La gran necesidad de desarrollar robots autónomos hace imprescindible proporcionar soluciones que les permita realizar tareas más precisas, flexibles y eficientes. Dado que estas plataformas cada día adquieren mejores capacidades para manipular objetos, también necesitarán modelos expresivos y comprensivos: para realizar tareas de manipulación y prensión, sus modelos han de tener en cuenta cada uno de los objetos presentes en su entorno, junto con su localizaci ón, orientación, forma y otros aspectos. El objeto de la presente tesis doctoral es proponer soluciones a varios de los retos que surgen al enfrentarse al problema del agarre, con el propósito final de aumentar la capacidad de autonomía del robot manipulador MANFRED-2. Mediante el análisis e interpretación de la percepción tridimensional, esta tesis cubre en primer lugar la localización de planos de soporte en sus alrededores. Dado que el entorno contendrá muchos otros elementos aparte de la superficie de apoyo buscada, el problema en entornos abarrotados ha sido solucionado mediante Evolución Diferencial, que es un algoritmo evolutivo basado en partículas que evoluciona temporalmente a la solución que contempla el menor resultado en la función de coste. Puesto que el propósito final de este trabajo de investigación es proveer de información valiosa a las aplicaciones de prensión, se ha desarrollado un reconstructor de modelos completos. El método propuesto posee diferentes características como robustez a giros abruptos, optimización multidimensional, extensión a otras características, compatibilidad con otras técnicas de reconstrucción, manejo de incertidumbres y un proceso de inicialización para reducir el tiempo de convergencia. Ha sido diseñado usando un registro optimizado mediante técnicas evolutivas que tienen en cuenta las particularidades de la superficie del objeto, su forma global y la información relativa a la textura. El último problema abordado está relacionado con el reconocimiento de objetos. Con la intención de abastecer al robot con la mayor información posible sobre el entorno, se ha implementado un meta clasificador que diferencia de manera eficaz los objetos observados. Ha sido capacitado para distinguir objetos confundibles como tazas o platos con formas similares pero con diferentes colores o tamaños. Las contribuciones presentes en esta tesis han sido completamente implementadas y probadas de manera empírica en la plataforma. Se ha desarrollado un sistema que cubre el problema de agarre desde la percepción al cálculo de la trayectoria incluyendo el sistema de reconocimiento de objetos confundibles. Para ello, se ha presentado una mesa con objetos en un entorno cerrado cercano al robot. Los elementos son comparados con una base de datos y si se desea agarrar uno de ellos, el robot estimará cómo cogerlo teniendo en cuenta las restricciones cinemáticas asociadas a una mano antropomórfica y el modelo tridimensional generado del objeto en cuestión

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version

    Scene understanding by robotic interactive perception

    Get PDF
    This thesis presents a novel and generic visual architecture for scene understanding by robotic interactive perception. This proposed visual architecture is fully integrated into autonomous systems performing object perception and manipulation tasks. The proposed visual architecture uses interaction with the scene, in order to improve scene understanding substantially over non-interactive models. Specifically, this thesis presents two experimental validations of an autonomous system interacting with the scene: Firstly, an autonomous gaze control model is investigated, where the vision sensor directs its gaze to satisfy a scene exploration task. Secondly, autonomous interactive perception is investigated, where objects in the scene are repositioned by robotic manipulation. The proposed visual architecture for scene understanding involving perception and manipulation tasks has four components: 1) A reliable vision system, 2) Camera-hand eye calibration to integrate the vision system into an autonomous robot’s kinematic frame chain, 3) A visual model performing perception tasks and providing required knowledge for interaction with scene, and finally, 4) A manipulation model which, using knowledge received from the perception model, chooses an appropriate action (from a set of simple actions) to satisfy a manipulation task. This thesis presents contributions for each of the aforementioned components. Firstly, a portable active binocular robot vision architecture that integrates a number of visual behaviours are presented. This active vision architecture has the ability to verge, localise, recognise and simultaneously identify multiple target object instances. The portability and functional accuracy of the proposed vision architecture is demonstrated by carrying out both qualitative and comparative analyses using different robot hardware configurations, feature extraction techniques and scene perspectives. Secondly, a camera and hand-eye calibration methodology for integrating an active binocular robot head within a dual-arm robot are described. For this purpose, the forward kinematic model of the active robot head is derived and the methodology for calibrating and integrating the robot head is described in detail. A rigid calibration methodology has been implemented to provide a closed-form hand-to-eye calibration chain and this has been extended with a mechanism to allow the camera external parameters to be updated dynamically for optimal 3D reconstruction to meet the requirements for robotic tasks such as grasping and manipulating rigid and deformable objects. It is shown from experimental results that the robot head achieves an overall accuracy of fewer than 0.3 millimetres while recovering the 3D structure of a scene. In addition, a comparative study between current RGB-D cameras and our active stereo head within two dual-arm robotic test-beds is reported that demonstrates the accuracy and portability of our proposed methodology. Thirdly, this thesis proposes a visual perception model for the task of category-wise objects sorting, based on Gaussian Process (GP) classification that is capable of recognising objects categories from point cloud data. In this approach, Fast Point Feature Histogram (FPFH) features are extracted from point clouds to describe the local 3D shape of objects and a Bag-of-Words coding method is used to obtain an object-level vocabulary representation. Multi-class Gaussian Process classification is employed to provide a probability estimate of the identity of the object and serves the key role of modelling perception confidence in the interactive perception cycle. The interaction stage is responsible for invoking the appropriate action skills as required to confirm the identity of an observed object with high confidence as a result of executing multiple perception-action cycles. The recognition accuracy of the proposed perception model has been validated based on simulation input data using both Support Vector Machine (SVM) and GP based multi-class classifiers. Results obtained during this investigation demonstrate that by using a GP-based classifier, it is possible to obtain true positive classification rates of up to 80\%. Experimental validation of the above semi-autonomous object sorting system shows that the proposed GP based interactive sorting approach outperforms random sorting by up to 30\% when applied to scenes comprising configurations of household objects. Finally, a fully autonomous visual architecture is presented that has been developed to accommodate manipulation skills for an autonomous system to interact with the scene by object manipulation. This proposed visual architecture is mainly made of two stages: 1) A perception stage, that is a modified version of the aforementioned visual interaction model, 2) An interaction stage, that performs a set of ad-hoc actions relying on the information received from the perception stage. More specifically, the interaction stage simply reasons over the information (class label and associated probabilistic confidence score) received from perception stage to choose one of the following two actions: 1) An object class has been identified with high confidence, so remove from the scene and place it in the designated basket/bin for that particular class. 2) An object class has been identified with less probabilistic confidence, since from observation and inspired from the human behaviour of inspecting doubtful objects, an action is chosen to further investigate that object in order to confirm the object’s identity by capturing more images from different views in isolation. The perception stage then processes these views, hence multiple perception-action/interaction cycles take place. From an application perspective, the task of autonomous category based objects sorting is performed and the experimental design for the task is described in detail

    The State of Lifelong Learning in Service Robots: Current Bottlenecks in Object Perception and Manipulation

    Get PDF
    Service robots are appearing more and more in our daily life. The development of service robots combines multiple fields of research, from object perception to object manipulation. The state-of-the-art continues to improve to make a proper coupling between object perception and manipulation. This coupling is necessary for service robots not only to perform various tasks in a reasonable amount of time but also to continually adapt to new environments and safely interact with non-expert human users. Nowadays, robots are able to recognize various objects, and quickly plan a collision-free trajectory to grasp a target object in predefined settings. Besides, in most of the cases, there is a reliance on large amounts of training data. Therefore, the knowledge of such robots is fixed after the training phase, and any changes in the environment require complicated, time-consuming, and expensive robot re-programming by human experts. Therefore, these approaches are still too rigid for real-life applications in unstructured environments, where a significant portion of the environment is unknown and cannot be directly sensed or controlled. In such environments, no matter how extensive the training data used for batch learning, a robot will always face new objects. Therefore, apart from batch learning, the robot should be able to continually learn about new object categories and grasp affordances from very few training examples on-site. Moreover, apart from robot self-learning, non-expert users could interactively guide the process of experience acquisition by teaching new concepts, or by correcting insufficient or erroneous concepts. In this way, the robot will constantly learn how to help humans in everyday tasks by gaining more and more experiences without the need for re-programming

    Active Vision for Scene Understanding

    Get PDF
    Visual perception is one of the most important sources of information for both humans and robots. A particular challenge is the acquisition and interpretation of complex unstructured scenes. This work contributes to active vision for humanoid robots. A semantic model of the scene is created, which is extended by successively changing the robot\u27s view in order to explore interaction possibilities of the scene
    • …
    corecore