Search CORE

3 research outputs found

Hybrid Multi-camera Visual Servoing to Moving Target

Author: Cuevas Velasquez Hanz
Fisher Robert
Li Nanbo
Saval-Calvo Marcelo
Tylecek Radim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/08/2018
Field of study

Visual servoing is a well-known task in robotics. However, there are still challenges when multiple visual sources are combined to accurately guide the robot or occlusions appear. In this paper we present a novel visual servoing approach using hybrid multi-camera input data to lead a robot arm accurately to dynamically moving target points in the presence of partial occlusions. The approach uses four RGBD sensors as Eye-to-Hand (EtoH) visual input, and an arm-mounted stereo camera as Eye-in-Hand (EinH). A Master supervisor task selects between using the EtoH or the EinH, depending on the distance between the robot and target. The Master also selects the subset of EtoH cameras that best perceive the target. When the EinH sensor is used, if the target becomes occluded or goes out of the sensor's view-frustum, the Master switches back to the EtoH sensors to re-track the object. Using this adaptive visual input data, the robot is then controlled using an iterative planner that uses position, orientation and joint configuration to estimate the trajectory. Since the target is dynamic, this trajectory is updated every time-step. Experiments show good performance in four different situations: tracking a ball, targeting a bulls-eye, guiding a straw to a mouth and delivering an item to a moving hand. The experiments cover both simple situations such as a ball that is mostly visible from all cameras, and more complex situations such as the mouth which is partially occluded from some of the sensors.Comment: 6 pages, Published in IROS 201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Two Heads are Better than One: Geometric-Latent Attention for Point Cloud Classification and Segmentation

Author: Cuevas-Velasquez Hanz
Fisher Robert B
Gallego Antonio Javier
Publication venue
Publication date: 30/10/2021
Field of study

We present an innovative two-headed attention layer that combines geometric and latent features to segment a 3D scene into semantically meaningful subsets. Each head combines local and global information, using either the geometric or latent features, of a neighborhood of points and uses this information to learn better local relationships. This Geometric-Latent attention layer (Ge-Latto) is combined with a sub-sampling strategy to capture global features. Our method is invariant to permutation thanks to the use of shared-MLP layers, and it can also be used with point clouds with varying densities because the local attention layer does not depend on the neighbor order. Our proposal is simple yet robust, which allows it to achieve competitive results in the ShapeNetPart and ModelNet40 datasets, and the state-of-the-art when segmenting the complex dataset S3DIS, with 69.2% IoU on Area 5, and 89.7% overall accuracy using K-fold cross-validation on the 6 areas.Comment: Accepted in BMVC 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

3D segmentation and localization using visual cues in uncontrolled environments

Author: Cuevas Velasquez Hanz
Publication venue: The University of Edinburgh
Publication date: 23/03/2022
Field of study

3D scene understanding is an important area in robotics, autonomous vehicles, and virtual reality. The goal of scene understanding is to recognize and localize all the objects around the agent. This is done through semantic segmentation and depth estimation. Current approaches focus on improving the robustness to solve each task but fail in making them efficient for real-time usage. This thesis presents four efficient methods for scene understanding that work in real environments. The methods also aim to provide a solution for 2D and 3D data. The first approach presents a pipeline that combines the block matching algorithm for disparity estimation, an encoder-decoder neural network for semantic segmentation, and a refinement step that uses both outputs to complete the regions that were not labelled or did not have any disparity assigned to them. This method provides accurate results in 3D reconstruction and morphology estimation of complex structures like rose bushes. Due to the lack of datasets of rose bushes and their segmentation, we also made three large datasets. Two of them have real roses that were manually labelled, and the third one was created using a scene modeler and 3D rendering software. The last dataset aims to capture diversity, realism and obtain different types of labelling. The second contribution provides a strategy for real-time rose pruning using visual servoing of a robotic arm and our previous approach. Current methods obtain the structure of the plant and plan the cutting trajectory using only a global planner and assume a constant background. Our method works in real environments and uses visual feedback to refine the location of the cutting targets and modify the planned trajectory. The proposed visual servoing allows the robot to reach the cutting points 94% of the time. This is an improvement compared to only using a global planner without visual feedback, which reaches the targets 50% of the time. To the best of our knowledge, this is the first robot able to prune a complete rose bush in a natural environment. Recent deep learning image segmentation and disparity estimation networks provide accurate results. However, most of these methods are computationally expensive, which makes them impractical for real-time tasks. Our third contribution uses multi-task learning to learn the image segmentation and disparity estimation together end-to-end. The experiments show that our network has at most 1/3 of the parameters of the state-of-the-art of each individual task and still provides competitive results. The last contribution explores the area of scene understanding using 3D data. Recent approaches use point-based networks to do point cloud segmentation and find local relations between points using only the latent features provided by the network, omitting the geometric information from the point clouds. Our approach aggregates the geometric information into the network. Given that the geometric and latent features are different, our network also uses a two-headed attention mechanism to do local aggregation at the latent and geometric level. This additional information helps the network to obtain a more accurate semantic segmentation, in real point cloud data, using fewer parameters than current methods. Overall, the method obtains the state-of-the-art segmentation in the real datasets S3DIS with 69.2% and competitive results in the ModelNet40 and ShapeNetPart datasets

Edinburgh Research Archive