27 research outputs found

    Visual Servoing from Deep Neural Networks

    Get PDF
    We present a deep neural network-based method to perform high-precision, robust and real-time 6 DOF visual servoing. The paper describes how to create a dataset simulating various perturbations (occlusions and lighting conditions) from a single real-world image of the scene. A convolutional neural network is fine-tuned using this dataset to estimate the relative pose between two images of the same scene. The output of the network is then employed in a visual servoing control scheme. The method converges robustly even in difficult real-world settings with strong lighting variations and occlusions.A positioning error of less than one millimeter is obtained in experiments with a 6 DOF robot.Comment: fixed authors lis

    Direct Visual Servoing Based on Discrete Orthogonal Moments

    Full text link
    This paper proposes a new approach to achieve direct visual servoing (DVS) based on discrete orthogonal moments (DOM). DVS is conducted whereby the extraction of geometric primitives, matching and tracking steps in the conventional feature-based visual servoing pipeline can be bypassed. Although DVS enables highly precise positioning, and suffers from a small convergence domain and poor robustness, due to the high non-linearity of the cost function to be minimized and the presence of redundant data between visual features. To tackle these issues, we propose a generic and augmented framework to take DOM as visual features into consideration. Through taking Tchebichef, Krawtchouk and Hahn moments as examples, we not only present the strategies for adaptive adjusting the parameters and orders of the visual features, but also exhibit the analytical formulation of the associated interaction matrix. Simulations demonstrate the robustness and accuracy of our method, as well as the advantages over the state of the art. The real experiments have also been performed to validate the effectiveness of our approach

    3D Spectral Domain Registration-Based Visual Servoing

    Full text link
    This paper presents a spectral domain registration-based visual servoing scheme that works on 3D point clouds. Specifically, we propose a 3D model/point cloud alignment method, which works by finding a global transformation between reference and target point clouds using spectral analysis. A 3D Fast Fourier Transform (FFT) in R3 is used for the translation estimation, and the real spherical harmonics in SO(3) are used for the rotations estimation. Such an approach allows us to derive a decoupled 6 degrees of freedom (DoF) controller, where we use gradient ascent optimisation to minimise translation and rotational costs. We then show how this methodology can be used to regulate a robot arm to perform a positioning task. In contrast to the existing state-of-the-art depth-based visual servoing methods that either require dense depth maps or dense point clouds, our method works well with partial point clouds and can effectively handle larger transformations between the reference and the target positions. Furthermore, the use of spectral data (instead of spatial data) for transformation estimation makes our method robust to sensor-induced noise and partial occlusions. We validate our approach by performing experiments using point clouds acquired by a robot-mounted depth camera. Obtained results demonstrate the effectiveness of our visual servoing approach.Comment: Accepted to 2023 IEEE International Conference on Robotics and Automation (ICRA'23

    Methods for visual servoing of robotic systems: A state of the art survey

    Get PDF
    U ovom preglednom radu su prikazane metode vizuelnog upravljanja robotskih sistema, sa primarnim fokusom na mobilne robote sa diferencijalnim pogonom. Analizirane su standardne metode vizuelnog upravljanja bazirane na (i) greškama u parametrima slike (engl. Image-Based Visual Servoing - IBVS) i (ii) izdvojenim karakteristikama sa slike neophodnim za estimaciju položaja izabranog objekta (engl. Position-Based Visual Servoing - PBVS) i poređene sa novom metodom direktnog vizuelnog upravljanja (engl. Direct Visual Servoing - DVS). U poređenju sa IBVS i PBVS metodama, DVS metod se odlikuje višom tačnošću, ali i manjim domenom konvergencije. Zbog ovog razloga je DVS metod upravljanja pogodan za integraciju u hibridne sisteme vizuelnog upravljanja. Takođe, predstavljeni su radovi koji unapređuju sistem vizuelnog upravljanja korišćenjem stereo sistema (sistem sa dve kamere). Stereo sistem, u poređenju sa alternativnim metodama, omogućava tačniju ocenu dubine karakterističnih objekata sa slike, koja je neophodna za zadatke vizuelnog upravljanja. Predmet analize su i radovi koji integrišu tehnike veštačke inteligencije u sistem vizuelnog upravljanja. Ovim tehnikama sistemi vizuelnog upravljanja dobijaju mogućnost da uče, čime se njihov domen primene znatno proširuje. Na kraju, napominje se i mogućnost integracije vizuelne odometrije u sisteme vizuelnog upravljanja, što prouzrokuje povećanje robusnosti čitavog robotskog sistema.This paper surveys the methods used for visual servoing of robotic systems, where the main focus is on mobile robot systems. The three main areas of research include the Direct Visual Servoing, stereo vision systems, and artificial intelligence in visual servoing. The standard methods such as Image-Based Visual Servoing (IBVS) and Position-Based Visual Servoing (PBVS) are analyzed and compared with the new method named Direct Visual Servoing (DVS). DVS methods have better accuracy, compared to IBVS and PBVS, but have limited convergence area. Because of their high accuracy, DVS methods are suitable for integration into hybrid systems. Furthermore, the use of the stereo systems for visual servoing is comprehensively analyzed. The main contribution of the stereo system is the accurate depth estimation, which is critical for many visual servoing tasks. The use of artificial intelligence (AI) in visual servoing purposes has also gained popularity over the years. AI techniques give visual servoing controllers the ability to learn by using predefined examples or empirical knowledge. The learning ability is crucial for the implementation of robotic systems in a real-world dynamic manufacturing environment. Also, we analyzed the use of visual odometry in combination with a visual servoing controller for creating more robust and reliable positioning system

    Vision-based methods for state estimation and control of robotic systems with application to mobile and surgical robots

    Get PDF
    For autonomous systems that need to perceive the surrounding environment for the accomplishment of a given task, vision is a highly informative exteroceptive sensory source. When gathering information from the available sensors, in fact, the richness of visual data allows to provide a complete description of the environment, collecting geometrical and semantic information (e.g., object pose, distances, shapes, colors, lights). The huge amount of collected data allows to consider both methods exploiting the totality of the data (dense approaches), or a reduced set obtained from feature extraction procedures (sparse approaches). This manuscript presents dense and sparse vision-based methods for control and sensing of robotic systems. First, a safe navigation scheme for mobile robots, moving in unknown environments populated by obstacles, is presented. For this task, dense visual information is used to perceive the environment (i.e., detect ground plane and obstacles) and, in combination with other sensory sources, provide an estimation of the robot motion with a linear observer. On the other hand, sparse visual data are extrapolated in terms of geometric primitives, in order to implement a visual servoing control scheme satisfying proper navigation behaviours. This controller relies on visual estimated information and is designed in order to guarantee safety during navigation. In addition, redundant structures are taken into account to re-arrange the internal configuration of the robot and reduce its encumbrance when the workspace is highly cluttered. Vision-based estimation methods are relevant also in other contexts. In the field of surgical robotics, having reliable data about unmeasurable quantities is of great importance and critical at the same time. In this manuscript, we present a Kalman-based observer to estimate the 3D pose of a suturing needle held by a surgical manipulator for robot-assisted suturing. The method exploits images acquired by the endoscope of the robot platform to extrapolate relevant geometrical information and get projected measurements of the tool pose. This method has also been validated with a novel simulator designed for the da Vinci robotic platform, with the purpose to ease interfacing and employment in ideal conditions for testing and validation. The Kalman-based observers mentioned above are classical passive estimators, whose system inputs used to produce the proper estimation are theoretically arbitrary. This does not provide any possibility to actively adapt input trajectories in order to optimize specific requirements on the performance of the estimation. For this purpose, active estimation paradigm is introduced and some related strategies are presented. More specifically, a novel active sensing algorithm employing visual dense information is described for a typical Structure-from-Motion (SfM) problem. The algorithm generates an optimal estimation of a scene observed by a moving camera, while minimizing the maximum uncertainty of the estimation. This approach can be applied to any robotic platforms and has been validated with a manipulator arm equipped with a monocular camera

    Image Analysis via Applied Harmonic Analysis : Perceptual Image Quality Assessment, Visual Servoing, and Feature Detection

    Get PDF
    Certain systems of analyzing functions developed in the field of applied harmonic analysis are specifically designed to yield efficient representations of structures which are characteristic of common classes of two-dimensional signals, like images. In particular, functions in these systems are typically sensitive to features that define the geometry of a signal, like edges and curves in the case of images. These properties make them ideal candidates for a wide variety of tasks in image processing and image analysis. This thesis discusses three recently developed approaches to utilizing systems of wavelets, shearlets, and alpha-molecules in specific image analysis tasks. First, a perceptual image similarity measure is introduced that is solely based on the coefficients obtained from six discrete Haar wavelet filters but yields state of the art correlations with human opinion scores on large benchmark databases. The second application concerns visual servoing, which is a technique for controlling the motion of a robot by using feedback from a visual sensor. In particular, it will be investigated how the coefficients yielded by discrete wavelet and shearlet transforms can be used as the visual features that control the motion of a robot with six degrees of freedom. Finally, a novel framework for the detection and characterization of features such as edges, ridges, and blobs in two-dimensional images is presented and evaluated in extensive numerical experiments. Here, versatile and robust feature detectors are obtained by exploiting the special symmetry properties of directionally sensitive analyzing functions in systems created within the recently introduced alpha-molecule framework

    Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots

    Get PDF
    Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application
    corecore