8 research outputs found

    Efficient Dense Registration, Segmentation, and Modeling Methods for RGB-D Environment Perception

    Get PDF
    One perspective for artificial intelligence research is to build machines that perform tasks autonomously in our complex everyday environments. This setting poses challenges to the development of perception skills: A robot should be able to perceive its location and objects in its surrounding, while the objects and the robot itself could also be moving. Objects may not only be composed of rigid parts, but could be non-rigidly deformable or appear in a variety of similar shapes. Furthermore, it could be relevant to the task to observe object semantics. For a robot acting fluently and immediately, these perception challenges demand efficient methods. This theses presents novel approaches to robot perception with RGB-D sensors. It develops efficient registration, segmentation, and modeling methods for scene and object perception. We propose multi-resolution surfel maps as a concise representation for RGB-D measurements. We develop probabilistic registration methods that handle rigid scenes, scenes with multiple rigid parts that move differently, and scenes that undergo non-rigid deformations. We use these methods to learn and perceive 3D models of scenes and objects in both static and dynamic environments. For learning models of static scenes, we propose a real-time capable simultaneous localization and mapping approach. It aligns key views in RGB-D video using our rigid registration method and optimizes the pose graph of the key views. The acquired models are then perceived in live images through detection and tracking within a Bayesian filtering framework. An assumption frequently made for environment mapping is that the observed scene remains static during the mapping process. Through rigid multi-body registration, we take advantage of releasing this assumption: Our registration method segments views into parts that move independently between the views and simultaneously estimates their motion. Within simultaneous motion segmentation, localization, and mapping, we separate scenes into objects by their motion. Our approach acquires 3D models of objects and concurrently infers hierarchical part relations between them using probabilistic reasoning. It can be applied for interactive learning of objects and their part decomposition. Endowing robots with manipulation skills for a large variety of objects is a tedious endeavor if the skill is programmed for every instance of an object class. Furthermore, slight deformations of an instance could not be handled by an inflexible program. Deformable registration is useful to perceive such shape variations, e.g., between specific instances of a tool. We develop an efficient deformable registration method and apply it for the transfer of robot manipulation skills between varying object instances. On the object-class level, we segment images using random decision forest classifiers in real-time. The probabilistic labelings of individual images are fused in 3D semantic maps within a Bayesian framework. We combine our object-class segmentation method with simultaneous localization and mapping to achieve online semantic mapping in real-time. The methods developed in this thesis are evaluated in experiments on publicly available benchmark datasets and novel own datasets. We publicly demonstrate several of our perception approaches within integrated robot systems in the mobile manipulation context.Effiziente Dichte Registrierungs-, Segmentierungs- und Modellierungsmethoden für die RGB-D Umgebungswahrnehmung In dieser Arbeit beschäftigen wir uns mit Herausforderungen der visuellen Wahrnehmung für intelligente Roboter in Alltagsumgebungen. Solche Roboter sollen sich selbst in ihrer Umgebung zurechtfinden, und Wissen über den Verbleib von Objekten erwerben können. Die Schwierigkeit dieser Aufgaben erhöht sich in dynamischen Umgebungen, in denen ein Roboter die Bewegung einzelner Teile differenzieren und auch wahrnehmen muss, wie sich diese Teile bewegen. Bewegt sich ein Roboter selbständig in dieser Umgebung, muss er auch seine eigene Bewegung von der Veränderung der Umgebung unterscheiden. Szenen können sich aber nicht nur durch die Bewegung starrer Teile verändern. Auch die Teile selbst können ihre Form in nicht-rigider Weise ändern. Eine weitere Herausforderung stellt die semantische Interpretation von Szenengeometrie und -aussehen dar. Damit intelligente Roboter unmittelbar und flüssig handeln können, sind effiziente Algorithmen für diese Wahrnehmungsprobleme erforderlich. Im ersten Teil dieser Arbeit entwickeln wir effiziente Methoden zur Repräsentation und Registrierung von RGB-D Messungen. Zunächst stellen wir Multi-Resolutions-Oberflächenelement-Karten (engl. multi-resolution surfel maps, MRSMaps) als eine kompakte Repräsentation von RGB-D Messungen vor, die unseren effizienten Registrierungsmethoden zugrunde liegt. Bilder können effizient in dieser Repräsentation aggregiert werde, wobei auch mehrere Bilder aus verschiedenen Blickpunkten integriert werden können, um Modelle von Szenen und Objekte aus vielfältigen Ansichten darzustellen. Für die effiziente, robuste und genaue Registrierung von MRSMaps wird eine Methode vorgestellt, die Rigidheit der betrachteten Szene voraussetzt. Die Registrierung schätzt die Kamerabewegung zwischen den Bildern und gewinnt ihre Effizienz durch die Ausnutzung der kompakten multi-resolutionalen Darstellung der Karten. Die Registrierungsmethode erzielt hohe Bildverarbeitungsraten auf einer CPU. Wir demonstrieren hohe Effizienz, Genauigkeit und Robustheit unserer Methode im Vergleich zum bisherigen Stand der Forschung auf Vergleichsdatensätzen. In einem weiteren Registrierungsansatz lösen wir uns von der Annahme, dass die betrachtete Szene zwischen Bildern statisch ist. Wir erlauben nun, dass sich rigide Teile der Szene bewegen dürfen, und erweitern unser rigides Registrierungsverfahren auf diesen Fall. Unser Ansatz segmentiert das Bild in Bereiche einzelner Teile, die sich unterschiedlich zwischen Bildern bewegen. Wir demonstrieren hohe Segmentierungsgenauigkeit und Genauigkeit in der Bewegungsschätzung unter Echtzeitbedingungen für die Verarbeitung. Schließlich entwickeln wir ein Verfahren für die Wahrnehmung von nicht-rigiden Deformationen zwischen zwei MRSMaps. Auch hier nutzen wir die multi-resolutionale Struktur in den Karten für ein effizientes Registrieren von grob zu fein. Wir schlagen Methoden vor, um aus den geschätzten Deformationen die lokale Bewegung zwischen den Bildern zu berechnen. Wir evaluieren Genauigkeit und Effizienz des Registrierungsverfahrens. Der zweite Teil dieser Arbeit widmet sich der Verwendung unserer Kartenrepräsentation und Registrierungsmethoden für die Wahrnehmung von Szenen und Objekten. Wir verwenden MRSMaps und unsere rigide Registrierungsmethode, um dichte 3D Modelle von Szenen und Objekten zu lernen. Die räumlichen Beziehungen zwischen Schlüsselansichten, die wir durch Registrierung schätzen, werden in einem Simultanen Lokalisierungs- und Kartierungsverfahren (engl. simultaneous localization and mapping, SLAM) gegeneinander abgewogen, um die Blickposen der Schlüsselansichten zu schätzen. Für das Verfolgen der Kamerapose bezüglich der Modelle in Echtzeit, kombinieren wir die Genauigkeit unserer Registrierung mit der Robustheit von Partikelfiltern. Zu Beginn der Posenverfolgung, oder wenn das Objekt aufgrund von Verdeckungen oder extremen Bewegungen nicht weiter verfolgt werden konnte, initialisieren wir das Filter durch Objektdetektion. Anschließend wenden wir unsere erweiterten Registrierungsverfahren für die Wahrnehmung in nicht-rigiden Szenen und für die Übertragung von Objekthandhabungsfähigkeiten von Robotern an. Wir erweitern unseren rigiden Kartierungsansatz auf dynamische Szenen, in denen sich rigide Teile bewegen. Die Bewegungssegmente in Schlüsselansichten werden zueinander in Bezug gesetzt, um Äquivalenz- und Teilebeziehungen von Objekten probabilistisch zu inferieren, denen die Segmente entsprechen. Auch hier liefert unsere Registrierungsmethode die Bewegung der Kamera bezüglich der Objekte, die wir in einem SLAM Verfahren optimieren. Aus diesen Blickposen wiederum können wir die Bewegungssegmente in dichten Objektmodellen vereinen. Objekte einer Klasse teilen oft eine gemeinsame Topologie von funktionalen Elementen, die durch Formkorrespondenzen ermittelt werden kann. Wir verwenden unsere deformierbare Registrierung, um solche Korrespondenzen zu finden und die Handhabung eines Objektes durch einen Roboter auf neue Objektinstanzen derselben Klasse zu übertragen. Schließlich entwickeln wir einen echtzeitfähigen Ansatz, der Kategorien von Objekten in RGB-D Bildern erkennt und segmentiert. Die Segmentierung basiert auf Ensemblen randomisierter Entscheidungsbäume, die Geometrie- und Texturmerkmale zur Klassifikation verwenden. Wir fusionieren Segmentierungen von Einzelbildern einer Szene aus mehreren Ansichten in einer semantischen Objektklassenkarte mit Hilfe unseres SLAM-Verfahrens. Die vorgestellten Methoden werden auf öffentlich verfügbaren Vergleichsdatensätzen und eigenen Datensätzen evaluiert. Einige unserer Ansätze wurden auch in integrierten Robotersystemen für mobile Objekthantierungsaufgaben öffentlich demonstriert. Sie waren ein wichtiger Bestandteil für das Gewinnen der RoboCup-Roboterwettbewerbe in der RoboCup@Home Liga in den Jahren 2011, 2012 und 2013

    Perception of Deformable Objects and Compliant Manipulation for Service Robots

    Full text link
    Abstract We identified softness in robot control as well as robot perception as key enabling technologies for future service robots. Compliance in motion control compensates for small errors in model acquisition and estimation and enables safe physical interaction with humans. The perception of shape similarities and defor-mations allows a robot to adapt its skills to the object at hand, given a description of the skill that generalizes between different objects. In this paper, we present our approaches to compliant control and object manipulation skill transfer for service robots. We report on evaluation results and public demonstrations of our ap-proaches. 1

    Differentiable world programs

    Full text link
    L'intelligence artificielle (IA) moderne a ouvert de nouvelles perspectives prometteuses pour la création de robots intelligents. En particulier, les architectures d'apprentissage basées sur le gradient (réseaux neuronaux profonds) ont considérablement amélioré la compréhension des scènes 3D en termes de perception, de raisonnement et d'action. Cependant, ces progrès ont affaibli l'attrait de nombreuses techniques ``classiques'' développées au cours des dernières décennies. Nous postulons qu'un mélange de méthodes ``classiques'' et ``apprises'' est la voie la plus prometteuse pour développer des modèles du monde flexibles, interprétables et exploitables : une nécessité pour les agents intelligents incorporés. La question centrale de cette thèse est : ``Quelle est la manière idéale de combiner les techniques classiques avec des architectures d'apprentissage basées sur le gradient pour une compréhension riche du monde 3D ?''. Cette vision ouvre la voie à une multitude d'applications qui ont un impact fondamental sur la façon dont les agents physiques perçoivent et interagissent avec leur environnement. Cette thèse, appelée ``programmes différentiables pour modèler l'environnement'', unifie les efforts de plusieurs domaines étroitement liés mais actuellement disjoints, notamment la robotique, la vision par ordinateur, l'infographie et l'IA. Ma première contribution---gradSLAM--- est un système de localisation et de cartographie simultanées (SLAM) dense et entièrement différentiable. En permettant le calcul du gradient à travers des composants autrement non différentiables tels que l'optimisation non linéaire par moindres carrés, le raycasting, l'odométrie visuelle et la cartographie dense, gradSLAM ouvre de nouvelles voies pour intégrer la reconstruction 3D classique et l'apprentissage profond. Ma deuxième contribution - taskography - propose une sparsification conditionnée par la tâche de grandes scènes 3D encodées sous forme de graphes de scènes 3D. Cela permet aux planificateurs classiques d'égaler (et de surpasser) les planificateurs de pointe basés sur l'apprentissage en concentrant le calcul sur les attributs de la scène pertinents pour la tâche. Ma troisième et dernière contribution---gradSim--- est un simulateur entièrement différentiable qui combine des moteurs physiques et graphiques différentiables pour permettre l'estimation des paramètres physiques et le contrôle visuomoteur, uniquement à partir de vidéos ou d'une image fixe.Modern artificial intelligence (AI) has created exciting new opportunities for building intelligent robots. In particular, gradient-based learning architectures (deep neural networks) have tremendously improved 3D scene understanding in terms of perception, reasoning, and action. However, these advancements have undermined many ``classical'' techniques developed over the last few decades. We postulate that a blend of ``classical'' and ``learned'' methods is the most promising path to developing flexible, interpretable, and actionable models of the world: a necessity for intelligent embodied agents. ``What is the ideal way to combine classical techniques with gradient-based learning architectures for a rich understanding of the 3D world?'' is the central question in this dissertation. This understanding enables a multitude of applications that fundamentally impact how embodied agents perceive and interact with their environment. This dissertation, dubbed ``differentiable world programs'', unifies efforts from multiple closely-related but currently-disjoint fields including robotics, computer vision, computer graphics, and AI. Our first contribution---gradSLAM---is a fully differentiable dense simultaneous localization and mapping (SLAM) system. By enabling gradient computation through otherwise non-differentiable components such as nonlinear least squares optimization, ray casting, visual odometry, and dense mapping, gradSLAM opens up new avenues for integrating classical 3D reconstruction and deep learning. Our second contribution---taskography---proposes a task-conditioned sparsification of large 3D scenes encoded as 3D scene graphs. This enables classical planners to match (and surpass) state-of-the-art learning-based planners by focusing computation on task-relevant scene attributes. Our third and final contribution---gradSim---is a fully differentiable simulator that composes differentiable physics and graphics engines to enable physical parameter estimation and visuomotor control, solely from videos or a still image

    Robot tool use: A survey

    Get PDF
    Using human tools can significantly benefit robots in many application domains. Such ability would allow robots to solve problems that they were unable to without tools. However, robot tool use is a challenging task. Tool use was initially considered to be the ability that distinguishes human beings from other animals. We identify three skills required for robot tool use: perception, manipulation, and high-level cognition skills. While both general manipulation tasks and tool use tasks require the same level of perception accuracy, there are unique manipulation and cognition challenges in robot tool use. In this survey, we first define robot tool use. The definition highlighted the skills required for robot tool use. The skills coincide with an affordance model which defined a three-way relation between actions, objects, and effects. We also compile a taxonomy of robot tool use with insights from animal tool use literature. Our definition and taxonomy lay a theoretical foundation for future robot tool use studies and also serve as practical guidelines for robot tool use applications. We first categorize tool use based on the context of the task. The contexts are highly similar for the same task (e.g., cutting) in non-causal tool use, while the contexts for causal tool use are diverse. We further categorize causal tool use based on the task complexity suggested in animal tool use studies into single-manipulation tool use and multiple-manipulation tool use. Single-manipulation tool use are sub-categorized based on tool features and prior experiences of tool use. This type of tool may be considered as building blocks of causal tool use. Multiple-manipulation tool use combines these building blocks in different ways. The different combinations categorize multiple-manipulation tool use. Moreover, we identify different skills required in each sub-type in the taxonomy. We then review previous studies on robot tool use based on the taxonomy and describe how the relations are learned in these studies. We conclude with a discussion of the current applications of robot tool use and open questions to address future robot tool use

    Reconstruction and recognition of confusable models using three-dimensional perception

    Get PDF
    Perception is one of the key topics in robotics research. It is about the processing of external sensor data and its interpretation. The necessity of fully autonomous robots makes it crucial to help them to perform tasks more reliably, flexibly, and efficiently. As these platforms obtain more refined manipulation capabilities, they also require expressive and comprehensive environment models: for manipulation and affordance purposes, their models have to involve each one of the objects present in the world, coincidentally with their location, pose, shape and other aspects. The aim of this dissertation is to provide a solution to several of these challenges that arise when meeting the object grasping problem, with the aim of improving the autonomy of the mobile manipulator robot MANFRED-2. By the analysis and interpretation of 3D perception, this thesis covers in the first place the localization of supporting planes in the scenario. As the environment will contain many other things apart from the planar surface, the problem within cluttered scenarios has been solved by means of Differential Evolution, which is a particlebased evolutionary algorithm that evolves in time to the solution that yields the cost function lowest value. Since the final purpose of this thesis is to provide with valuable information for grasping applications, a complete model reconstructor has been developed. The proposed method holdsmany features such as robustness against abrupt rotations, multi-dimensional optimization, feature extensibility, compatible with other scan matching techniques, management of uncertain information and an initialization process to reduce convergence timings. It has been designed using a evolutionarybased scan matching optimizer that takes into account surface features of the object, global form and also texture and color information. The last tackled challenge regards the recognition problem. In order to procure with worthy information about the environment to the robot, a meta classifier that discerns efficiently the observed objects has been implemented. It is capable of distinguishing between confusable objects, such as mugs or dishes with similar shapes but different size or color. The contributions presented in this thesis have been fully implemented and empirically evaluated in the platform. A continuous grasping pipeline covering from perception to grasp planning including visual object recognition for confusable objects has been developed. For that purpose, an indoor environment with several objects on a table is presented in the nearby of the robot. Items are recognized from a database and, if one is chosen, the robot will calculate how to grasp it taking into account the kinematic restrictions associated to the anthropomorphic hand and the 3D model for this particular object. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------La percepción es uno de los temas más relevantes en el mundo de la investigaci ón en robótica. Su objetivo es procesar e interpretar los datos recibidos por un sensor externo. La gran necesidad de desarrollar robots autónomos hace imprescindible proporcionar soluciones que les permita realizar tareas más precisas, flexibles y eficientes. Dado que estas plataformas cada día adquieren mejores capacidades para manipular objetos, también necesitarán modelos expresivos y comprensivos: para realizar tareas de manipulación y prensión, sus modelos han de tener en cuenta cada uno de los objetos presentes en su entorno, junto con su localizaci ón, orientación, forma y otros aspectos. El objeto de la presente tesis doctoral es proponer soluciones a varios de los retos que surgen al enfrentarse al problema del agarre, con el propósito final de aumentar la capacidad de autonomía del robot manipulador MANFRED-2. Mediante el análisis e interpretación de la percepción tridimensional, esta tesis cubre en primer lugar la localización de planos de soporte en sus alrededores. Dado que el entorno contendrá muchos otros elementos aparte de la superficie de apoyo buscada, el problema en entornos abarrotados ha sido solucionado mediante Evolución Diferencial, que es un algoritmo evolutivo basado en partículas que evoluciona temporalmente a la solución que contempla el menor resultado en la función de coste. Puesto que el propósito final de este trabajo de investigación es proveer de información valiosa a las aplicaciones de prensión, se ha desarrollado un reconstructor de modelos completos. El método propuesto posee diferentes características como robustez a giros abruptos, optimización multidimensional, extensión a otras características, compatibilidad con otras técnicas de reconstrucción, manejo de incertidumbres y un proceso de inicialización para reducir el tiempo de convergencia. Ha sido diseñado usando un registro optimizado mediante técnicas evolutivas que tienen en cuenta las particularidades de la superficie del objeto, su forma global y la información relativa a la textura. El último problema abordado está relacionado con el reconocimiento de objetos. Con la intención de abastecer al robot con la mayor información posible sobre el entorno, se ha implementado un meta clasificador que diferencia de manera eficaz los objetos observados. Ha sido capacitado para distinguir objetos confundibles como tazas o platos con formas similares pero con diferentes colores o tamaños. Las contribuciones presentes en esta tesis han sido completamente implementadas y probadas de manera empírica en la plataforma. Se ha desarrollado un sistema que cubre el problema de agarre desde la percepción al cálculo de la trayectoria incluyendo el sistema de reconocimiento de objetos confundibles. Para ello, se ha presentado una mesa con objetos en un entorno cerrado cercano al robot. Los elementos son comparados con una base de datos y si se desea agarrar uno de ellos, el robot estimará cómo cogerlo teniendo en cuenta las restricciones cinemáticas asociadas a una mano antropomórfica y el modelo tridimensional generado del objeto en cuestión

    The attentive robot companion: learning spatial information from observation and verbal interaction

    Get PDF
    Ziegler L. The attentive robot companion: learning spatial information from observation and verbal interaction. Bielefeld: Universität Bielefeld; 2015.This doctoral thesis investigates how a robot companion can gain a certain degree of situational awareness through observation and interaction with its surroundings. The focus lies on the representation of the spatial knowledge gathered constantly over time in an indoor environment. However, from the background of research on an interactive service robot, methods for deployment in inference and verbal communication tasks are presented. The design and application of the models are guided by the requirements of referential communication. The approach here involves the analysis of the dynamic properties of structures in the robot’s field of view allowing it to distinguish objects of interest from other agents and background structures. The use of multiple persistent models representing these dynamic properties enables the robot to track changes in multiple scenes over time to establish spatial and temporal references. This work includes building a coherent representation considering allocentric and egocentric aspects of spatial knowledge for these models. Spatial analysis is extended with a semantic interpretation of objects and regions. This top-down approach for generating additional context information enhances the grounding process in communication. A holistic, boosting-based classification approach using a wide range of 2D and 3D visual features anchored in the spatial representation allows the system to identify room types. The process of grounding referential descriptions from a human interlocutor in the spatial representation is evaluated through referencing furniture. This method uses a probabilistic network for handling ambiguities in the descriptions and employs a strategy for resolving conflicts. In order to approve the real-world applicability of these approaches, this system was deployed on the mobile robot BIRON in a realistic apartment scenario involving observation and verbal interaction with an interlocutor

    Efficient deformable registration of multi-resolution surfel maps for object manipulation skill transfer

    No full text

    Efficient dense registration, segmentation, and modeling methods for RGB-D environment perception

    No full text
    One perspective for artificial intelligence research is to build machines that perform tasks autonomously in our complex everyday environments. This setting poses challenges to the development of perception skills: A robot should be able to perceive its location and objects in its surrounding, while the objects and the robot itself could also be moving. Objects may not only be composed of rigid parts, but could be non-rigidly deformable or appear in a variety of similar shapes. Furthermore, it could be relevant to the task to observe object semantics. For a robot acting fluently and immediately, these perception challenges demand efficient methods. This theses presents novel approaches to robot perception with RGB-D sensors. It develops efficient registration, segmentation, and modeling methods for scene and object perception. We propose multi-resolution surfel maps as a concise representation for RGB-D measurements. We develop probabilistic registration methods that handle rigid scenes, scenes with multiple rigid parts that move differently, and scenes that undergo non-rigid deformations. We use these methods to learn and perceive 3D models of scenes and objects in both static and dynamic environments. For learning models of static scenes, we propose a real-time capable simultaneous localization and mapping approach. It aligns key views in RGB-D video using our rigid registration method and optimizes the pose graph of the key views. The acquired models are then perceived in live images through detection and tracking within a Bayesian filtering framework. An assumption frequently made for environment mapping is that the observed scene remains static during the mapping process. Through rigid multi-body registration, we take advantage of releasing this assumption: Our registration method segments views into parts that move independently between the views and simultaneously estimates their motion. Within simultaneous motion segmentation, localization, and mapping, we separate scenes into objects by their motion. Our approach acquires 3D models of objects and concurrently infers hierarchical part relations between them using probabilistic reasoning. It can be applied for interactive learning of objects and their part decomposition. Endowing robots with manipulation skills for a large variety of objects is a tedious endeavor if the skill is programmed for every instance of an object class. Furthermore, slight deformations of an instance could not be handled by an inflexible program. Deformable registration is useful to perceive such shape variations, e.g., between specific instances of a tool. We develop an efficient deformable registration method and apply it for the transfer of robot manipulation skills between varying object instances. On the object-class level, we segment images using random decision forest classifiers in real-time. The probabilistic labelings of individual images are fused in 3D semantic maps within a Bayesian framework. We combine our object-class segmentation method with simultaneous localization and mapping to achieve online semantic mapping in real-time. The methods developed in this thesis are evaluated in experiments on publicly available benchmark datasets and novel own datasets. We publicly demonstrate several of our perception approaches within integrated robot systems in the mobile manipulation context
    corecore