20 research outputs found

    Hybrid Architectures for Object Pose and Velocity Tracking at the Intersection of Kalman Filtering and Machine Learning

    Get PDF
    The study of object perception algorithms is fundamental for the development of robotic platforms capable of planning and executing actions involving objects with high precision, reliability and safety. Indeed, this topic has been vastly explored in both the robotic and computer vision research communities using diverse techniques, ranging from classical Bayesian filtering to more modern Machine Learning techniques, and complementary sensing modalities such as vision and touch. Recently, the ever-growing availability of tools for synthetic data generation has substantially increased the adoption of Deep Learning for both 2D tasks, as object detection and segmentation, and 6D tasks, such as object pose estimation and tracking. The proposed methods exhibit interesting performance on computer vision benchmarks and robotic tasks, e.g. using object pose estimation for grasp planning purposes. Nonetheless, they generally do not consider useful information connected with the physics of the object motion and the peculiarities and requirements of robotic systems. Examples are the necessity to provide well-behaved output signals for robot motion control, the possibility to integrate modelling priors on the motion of the object and algorithmic priors. These help exploit the temporal correlation of the object poses, handle the pose uncertainties and mitigate the effect of outliers. Most of these concepts are considered in classical approaches, e.g. from the Bayesian and Kalman filtering literature, which however are not as powerful as Deep Learning in handling visual data. As a consequence, the development of hybrid architectures that combine the best features from both worlds is particularly appealing in a robotic setting. Motivated by these considerations, in this Thesis, I aimed at devising hybrid architectures for object perception, focusing on the task of object pose and velocity tracking. The proposed architectures use Kalman filtering supported by state-of-the-art Deep Neural Networks to track the 6D pose and velocity of objects from images. The devised solutions exhibit state-of-the-art performance, increased modularity and do not require training to implement the actual tracking behaviors. Furthermore, they can track even fast object motions despite the possible non-negligible inference times of the adopted neural networks. Also, by relying on data-driven Kalman filtering, I explored a paradigm that enables to track the state of systems that cannot be easily modeled analytically. Specifically, I used this approach to learn the measurement model of soft 3D tactile sensors and address the problem of tracking the sliding motion of hand-held objects

    Robot Learning Assembly Tasks from Human Demonstrations

    Get PDF
    The industry robots are widely deployed in the assembly and production lines as they are efficient in performing highly repetitive tasks. They are mainly position-controlled and pre-programmed to work in well-structured environments. However, they cannot deal with dynamical changes and unexpected events in their operations as they do not have sufficient sensing and learning capabilities. It remains a big challenge for robotic assembly operations to be conducted in unstructured environments today. This thesis research focuses on the development of robot learning from demonstration (LfD) for the robotic assembly task by using visual teaching. Firstly, the human kinesthetic teaching method is adopted for robot to learn an effective grasping skill in unstructured environment. During this teaching process, the robot learns the object's SIFT feature and grasping pose from human demonstrations. Secondly, a novel skeleton-joint mapping framework is proposed for robot learning from human demonstrations. The mapping algorithm transfers the human motion from the human joint space to the robot motor space so that the robot can be taught intuitively in a remote place. Thirdly, a novel visual-mapping demonstration framework is built for robot learning assembly tasks, in which, the demonstrator is able to teach the robot with feedback in real-time. Gaussian Mixture Model and Gaussian Mixture Regression are used to encode the learned skills for the robot. Finally, The effectiveness of the approach is evaluated with practical assembly tasks by the Baxter robot. The significance of this thesis research is on its comprehensive insight of robot learning from demonstration for assembly tasks. The proposed LfD paradigm has the potential to effectively transfer human skills to robots both in industrial and domestic environments. It paves the way for general public to use the robots without the need of programming skills

    Development of an Augmented Reality Interface for Intuitive Robot Programming

    Get PDF
    As the demand for advanced robotic systems continues to grow, the need for new technologies and techniques that can improve the efficiency and effectiveness of robot programming is imperative. The latter relies heavily on the effective communication of tasks between the user and the robot. To address this issue, we developed an Augmented Reality (AR) interface that incorporates Head Mounted Display (HMD) capabilities, and integrated it with an active learning framework for intuitive programming of robots. This integration enables the execution of conditional tasks, bridging the gap between user and robot knowledge. The active learning model with the user's guidance incrementally programs a complex task and after encoding the skills, generates a high level task graph. Then the holographic robot is visualising individual skills of the task in order to increase the user's intuition of the whole procedure with sensory information retrieved from the physical robot in real-time. The interactive aspect of the interface can be utilised in this phase, by providing the user the option of actively validating the learnt skills or potentially changing them and thus generating a new skill sequence. Teaching the real robot through teleoperation by using the HMD is also possible for the user to increase the directness and immersion factors of teaching procedure while safely manipulating the physical robot from a distance. The evaluation of the proposed framework is conducted through a series of experiments employing the developed interface on the real system. These experiments aim to assess the degree of intuitiveness provided by the interface features to the user and to determine the extent of similarity between the virtual system's behavior during the robot programming procedure and that of its physical counterpart

    Adaptive Robot Systems in Highly Dynamic Environments: A Table Tennis Robot

    Get PDF
    Hintergrund: Tischtennis bietet ideale Bedingungen, um Kamera-basierte Roboterarme am Limit zu testen. Die besondere Herausforderung liegt in der hohen Geschwindigkeit des Spiels und in der großen Varianz von Spin und Tempo jedes einzelnen Schlages. Die bisherige Forschung mit Tischtennisrobotern beschrĂ€nkt sich jedoch auf einfache Szenarien, d.h. auf langsame BĂ€lle mit einer geringen Rotation. Forschungsziel: Es soll ein lernfĂ€higer Tischtennisroboter entwickelt werden, der mit dem Spin menschlicher Gegner umgehen kann. Methoden: Das vorgestellte Robotersystem besteht aus sechs Komponenten: Ballpositionserkennung, Ballspinerkennung, Balltrajektorienvorhersage, Schlagparameterbestimmung, Robotertrajektorienplanung und Robotersteuerung. Zuerst wird der Ball mit traditioneller Bildverarbeitung in den Kamerabildern lokalisiert. Mit iterativer Triangulation wird dann seine 3D-Position berechnet. Aus der Kurve der Ballpositionen wird die aktuelle Position und Geschwindigkeit des Balles ermittelt. FĂŒr die Spinerkennung werden drei Methoden prĂ€sentiert: Die ersten beiden verfolgen die Bewegung des aufgedruckten Ball-Logos auf hochauflösenden Bildern durch Computer Vision bzw. Convolutional Neural Networks. Im dritten Ansatz wird die Flugbahn des Balls unter BerĂŒcksichtigung der Magnus-Kraft analysiert. Anhand der Position, der Geschwindigkeit und des Spins des Balls wird die zukĂŒnftige Flugbahn berechnet. DafĂŒr wird die physikalische Diffenzialgleichung mit Gravitationskraft, Luftwiderstandskraft und Magnus-Kraft schrittweise gelöst. Mit dem berechneten Zustand des Balls am Schlagpunkt haben wir einen Reinforcement-Learning-Algorithmus trainiert, der bestimmt, mit welchen Schlagparametern der Ball zu treffen ist. Eine passende Robotertrajektorie wird von der Reflexxes-Bibliothek generiert. %Der Roboter wird dann mit einer Frequenz von 250 Hz angesteuert. Ergebnisse: In der quantitativen Auswertung erzielen die einzelnen Komponenten mindestens so gute Ergebnisse wie vergleichbare Tischtennisroboter. Im Hinblick auf das Forschungsziel konnte der Roboter - ein Konterspiel mit einem Menschen fĂŒhren, mit bis zu 60 RĂŒckschlĂ€gen, - unterschiedlichen Spin (Über- und Unterschnitt) retournieren - und mehrere TischtennisĂŒbungen innerhalb von 200 SchlĂ€gen erlernen. Schlußfolgerung: Bedeutende algorithmische Neuerungen fĂŒhren wir in der Spinerkennung und beim Reinforcement Learning von Schlagparametern ein. Dadurch meistert der Roboter anspruchsvollere Spin- und Übungsszenarien als in vergleichbaren Arbeiten.Background: Robotic table tennis systems offer an ideal platform for pushing camera-based robotic manipulation systems to the limit. The unique challenge arises from the fast-paced play and the wide variation in spin and speed between strokes. The range of scenarios under which existing table tennis robots are able to operate is, however, limited, requiring slow play with low rotational velocity of the ball (spin). Research Goal: We aim to develop a table tennis robot system with learning capabilities able to handle spin against a human opponent. Methods: The robot system presented in this thesis consists of six components: ball position detection, ball spin detection, ball trajectory prediction, stroke parameter suggestion, robot trajectory generation, and robot control. For ball detection, the camera images pass through a conventional image processing pipeline. The ball’s 3D positions are determined using iterative triangulation and these are then used to estimate the current ball state (position and velocity). We propose three methods for estimating the spin. The first two methods estimate spin by analyzing the movement of the logo printed on the ball on high-resolution images using either conventional computer vision or convolutional neural networks. The final approach involves analyzing the trajectory of the ball using Magnus force fitting. Once the ball’s position, velocity, and spin are known, the future trajectory is predicted by forward-solving a physical ball model involving gravitational, drag, and Magnus forces. With the predicted ball state at hitting time as state input, we train a reinforcement learning algorithm to suggest the racket state at hitting time (stroke parameter). We use the Reflexxes library to generate a robot trajectory to achieve the suggested racket state. Results: Quantitative evaluation showed that all system components achieve results as good as or better than comparable robots. Regarding the research goal of this thesis, the robot was able to - maintain stable counter-hitting rallies of up to 60 balls with a human player, - return balls with different spin types (topspin and backspin) in the same rally, - learn multiple table tennis drills in just 200 strokes or fewer. Conclusion: Our spin detection system and reinforcement learning-based stroke parameter suggestion introduce significant algorithmic novelties. In contrast to previous work, our robot succeeds in more difficult spin scenarios and drills

    Collaborative Localization and Mapping for Autonomous Planetary Exploration : Distributed Stereo Vision-Based 6D SLAM in GNSS-Denied Environments

    Get PDF
    Mobile robots are a crucial element of present and future scientific missions to explore the surfaces of foreign celestial bodies such as Moon and Mars. The deployment of teams of robots allows to improve efficiency and robustness in such challenging environments. As long communication round-trip times to Earth render the teleoperation of robotic systems inefficient to impossible, on-board autonomy is a key to success. The robots operate in Global Navigation Satellite System (GNSS)-denied environments and thus have to rely on space-suitable on-board sensors such as stereo camera systems. They need to be able to localize themselves online, to model their surroundings, as well as to share information about the environment and their position therein. These capabilities constitute the basis for the local autonomy of each system as well as for any coordinated joint action within the team, such as collaborative autonomous exploration. In this thesis, we present a novel approach for stereo vision-based on-board and online Simultaneous Localization and Mapping (SLAM) for multi-robot teams given the challenges imposed by planetary exploration missions. We combine distributed local and decentralized global estimation methods to get the best of both worlds: A local reference filter on each robot provides real-time local state estimates required for robot control and fast reactive behaviors. We designed a novel graph topology to incorporate these state estimates into an online incremental graph optimization to compute global pose and map estimates that serve as input to higher-level autonomy functions. In order to model the 3D geometry of the environment, we generate dense 3D point cloud and probabilistic voxel-grid maps from noisy stereo data. We distribute the computational load and reduce the required communication bandwidth between robots by locally aggregating high-bandwidth vision data into partial maps that are then exchanged between robots and composed into global models of the environment. We developed methods for intra- and inter-robot map matching to recognize previously visited locations in semi- and unstructured environments based on their estimated local geometry, which is mostly invariant to light conditions as well as different sensors and viewpoints in heterogeneous multi-robot teams. A decoupling of observable and unobservable states in the local filter allows us to introduce a novel optimization: Enforcing all submaps to be gravity-aligned, we can reduce the dimensionality of the map matching from 6D to 4D. In addition to map matches, the robots use visual fiducial markers to detect each other. In this context, we present a novel method for modeling the errors of the loop closure transformations that are estimated from these detections. We demonstrate the robustness of our methods by integrating them on a total of five different ground-based and aerial mobile robots that were deployed in a total of 31 real-world experiments for quantitative evaluations in semi- and unstructured indoor and outdoor settings. In addition, we validated our SLAM framework through several different demonstrations at four public events in Moon and Mars-like environments. These include, among others, autonomous multi-robot exploration tests at a Moon-analogue site on top of the volcano Mt. Etna, Italy, as well as the collaborative mapping of a Mars-like environment with a heterogeneous robotic team of flying and driving robots in more than 35 public demonstration runs

    Informed Data Selection For Dynamic Multi-Camera Clusters

    Get PDF
    Traditional multi-camera systems require a fixed calibration between cameras to provide the solution at the correct scale, which places many limitations on its performance. This thesis investigates the calibration of dynamic camera clusters, or DCCs, where one or more of the cluster cameras is mounted to an actuated mechanism, such as a gimbal or robotic manipulator. Our novel calibration approach parameterizes the actuated mechanism using the Denavit-Hartenberg convention, then determines the calibration parameters which allow for the estimation of the time varying extrinsic transformations between the static and dynamic camera frames. A degeneracy analysis is also presented, which identifies redundant parameters of the DCC calibration system. In order to automate the calibration process, this thesis also presents two information theoretic methods which selects the optimal calibration viewpoints using a next-best-view strategy. The first strategy looks at minimizing the entropy of the calibration parameters, while the second method selects the viewpoints which maximize the mutual information between the joint angle input and calibration parameters. Finally, the effective selection of key-frames is also an essential aspect of robust visual navigation algorithms, as it ensures metrically consistent mapping solutions while reducing the computational complexity of the bundle adjustment process. To that end, we propose two entropy based methods which aim to insert key-frames that will directly improve the system's ability to localize. The first approach inserts key-frames based on the cumulative point entropy reduction in the existing map, while the second approach uses the predicted point flow discrepancy to select key-frames which best initialize new features for the camera to track against in the future. The DCC calibration methods are verified in both simulation and using physical hardware consisting of a 5-DOF Fanuc manipulator and a 3-DOF Aeryon Skyranger gimbal. We demonstrate that the proposed methods are able to achieve high quality calibrations using RMSE pixel error metrics, as well as through analysis of the estimator covariance matrix. The key-frame insertion methods are implemented within the Multi-Camera Parallel Mapping and Tracking (MCPTAM) framework, and we confirm the effectiveness of these approaches using high quality ground truth collected using an indoor positioning system

    Applied Cognitive Sciences

    Get PDF
    Cognitive science is an interdisciplinary field in the study of the mind and intelligence. The term cognition refers to a variety of mental processes, including perception, problem solving, learning, decision making, language use, and emotional experience. The basis of the cognitive sciences is the contribution of philosophy and computing to the study of cognition. Computing is very important in the study of cognition because computer-aided research helps to develop mental processes, and computers are used to test scientific hypotheses about mental organization and functioning. This book provides a platform for reviewing these disciplines and presenting cognitive research as a separate discipline

    Visual Perception For Robotic Spatial Understanding

    Get PDF
    Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet
    corecore