348 research outputs found
3D objects and scenes classification, recognition, segmentation, and reconstruction using 3D point cloud data: A review
Three-dimensional (3D) point cloud analysis has become one of the attractive
subjects in realistic imaging and machine visions due to its simplicity,
flexibility and powerful capacity of visualization. Actually, the
representation of scenes and buildings using 3D shapes and formats leveraged
many applications among which automatic driving, scenes and objects
reconstruction, etc. Nevertheless, working with this emerging type of data has
been a challenging task for objects representation, scenes recognition,
segmentation, and reconstruction. In this regard, a significant effort has
recently been devoted to developing novel strategies, using different
techniques such as deep learning models. To that end, we present in this paper
a comprehensive review of existing tasks on 3D point cloud: a well-defined
taxonomy of existing techniques is performed based on the nature of the adopted
algorithms, application scenarios, and main objectives. Various tasks performed
on 3D point could data are investigated, including objects and scenes
detection, recognition, segmentation and reconstruction. In addition, we
introduce a list of used datasets, we discuss respective evaluation metrics and
we compare the performance of existing solutions to better inform the
state-of-the-art and identify their limitations and strengths. Lastly, we
elaborate on current challenges facing the subject of technology and future
trends attracting considerable interest, which could be a starting point for
upcoming research studie
Implicit Object Pose Estimation on RGB Images Using Deep Learning Methods
With the rise of robotic and camera systems and the success of deep learning in computer vision,
there is growing interest in precisely determining object positions and orientations. This is crucial for
tasks like automated bin picking, where a camera sensor analyzes images or point clouds to guide a
robotic arm in grasping objects. Pose recognition has broader applications, such as predicting a
car's trajectory in autonomous driving or adapting objects in virtual reality based on the viewer's
perspective.
This dissertation focuses on RGB-based pose estimation methods that use depth information only
for refinement, which is a challenging problem. Recent advances in deep learning have made it
possible to predict object poses in RGB images, despite challenges like object overlap, object
symmetries and more.
We introduce two implicit deep learning-based pose estimation methods for RGB images, covering
the entire process from data generation to pose selection. Furthermore, theoretical findings on
Fourier embeddings are shown to improve the performance of the so-called implicit neural
representations - which are then successfully utilized for the task of implicit pose estimation
Map-Based Localization for Unmanned Aerial Vehicle Navigation
Unmanned Aerial Vehicles (UAVs) require precise pose estimation when navigating in indoor and GNSS-denied / GNSS-degraded outdoor environments. The possibility of crashing in these environments is high, as spaces are confined, with many moving obstacles. There are many solutions for localization in GNSS-denied environments, and many different technologies are used. Common solutions involve setting up or using existing infrastructure, such as beacons, Wi-Fi, or surveyed targets. These solutions were avoided because the cost should be proportional to the number of users, not the coverage area. Heavy and expensive sensors, for example a high-end IMU, were also avoided. Given these requirements, a camera-based localization solution was selected for the sensor pose estimation. Several camera-based localization approaches were investigated. Map-based localization methods were shown to be the most efficient because they close loops using a pre-existing map, thus the amount of data and the amount of time spent collecting data are reduced as there is no need to re-observe the same areas multiple times. This dissertation proposes a solution to address the task of fully localizing a monocular camera onboard a UAV with respect to a known environment (i.e., it is assumed that a 3D model of the environment is available) for the purpose of navigation for UAVs in structured environments.
Incremental map-based localization involves tracking a map through an image sequence. When the map is a 3D model, this task is referred to as model-based tracking. A by-product of the tracker is the relative 3D pose (position and orientation) between the camera and the object being tracked. State-of-the-art solutions advocate that tracking geometry is more robust than tracking image texture because edges are more invariant to changes in object appearance and lighting. However, model-based trackers have been limited to tracking small simple objects in small environments. An assessment was performed in tracking larger, more complex building models, in larger environments. A state-of-the art model-based tracker called ViSP (Visual Servoing Platform) was applied in tracking outdoor and indoor buildings using a UAVs low-cost camera. The assessment revealed weaknesses at large scales. Specifically, ViSP failed when tracking was lost, and needed to be manually re-initialized. Failure occurred when there was a lack of model features in the cameras field of view, and because of rapid camera motion. Experiments revealed that ViSP achieved positional accuracies similar to single point positioning solutions obtained from single-frequency (L1) GPS observations standard deviations around 10 metres. These errors were considered to be large, considering the geometric accuracy of the 3D model used in the experiments was 10 to 40 cm. The first contribution of this dissertation proposes to increase the performance of the localization system by combining ViSP with map-building incremental localization, also referred to as simultaneous localization and mapping (SLAM). Experimental results in both indoor and outdoor environments show sub-metre positional accuracies were achieved, while reducing the number of tracking losses throughout the image sequence. It is shown that by integrating model-based tracking with SLAM, not only does SLAM improve model tracking performance, but the model-based tracker alleviates the computational expense of SLAMs loop closing procedure to improve runtime performance. Experiments also revealed that ViSP was unable to handle occlusions when a complete 3D building model was used, resulting in large errors in its pose estimates. The second contribution of this dissertation is a novel map-based incremental localization algorithm that improves tracking performance, and increases pose estimation accuracies from ViSP. The novelty of this algorithm is the implementation of an efficient matching process that identifies corresponding linear features from the UAVs RGB image data and a large, complex, and untextured 3D model. The proposed model-based tracker improved positional accuracies from 10 m (obtained with ViSP) to 46 cm in outdoor environments, and improved from an unattainable result using VISP to 2 cm positional accuracies in large indoor environments.
The main disadvantage of any incremental algorithm is that it requires the camera pose of the first frame. Initialization is often a manual process. The third contribution of this dissertation is a map-based absolute localization algorithm that automatically estimates the camera pose when no prior pose information is available. The method benefits from vertical line matching to accomplish a registration procedure of the reference model views with a set of initial input images via geometric hashing. Results demonstrate that sub-metre positional accuracies were achieved and a proposed enhancement of conventional geometric hashing produced more correct matches - 75% of the correct matches were identified, compared to 11%. Further the number of incorrect matches was reduced by 80%
Tactile Perception And Visuotactile Integration For Robotic Exploration
As the close perceptual sibling of vision, the sense of touch has historically received less than deserved attention in both human psychology and robotics. In robotics, this may be attributed to at least two reasons. First, it suffers from the vicious cycle of immature sensor technology, which causes industry demand to be low, and then there is even less incentive to make existing sensors in research labs easy to manufacture and marketable. Second, the situation stems from a fear of making contact with the environment, avoided in every way so that visually perceived states do not change before a carefully estimated and ballistically executed physical interaction. Fortunately, the latter viewpoint is starting to change. Work in interactive perception and contact-rich manipulation are on the rise. Good reasons are steering the manipulation and locomotion communities’ attention towards deliberate physical interaction with the environment prior to, during, and after a task.
We approach the problem of perception prior to manipulation, using the sense of touch, for the purpose of understanding the surroundings of an autonomous robot. The overwhelming majority of work in perception for manipulation is based on vision. While vision is a fast and global modality, it is insufficient as the sole modality, especially in environments where the ambient light or the objects therein do not lend themselves to vision, such as in darkness, smoky or dusty rooms in search and rescue, underwater, transparent and reflective objects, and retrieving items inside a bag. Even in normal lighting conditions, during a manipulation task, the target object and fingers are usually occluded from view by the gripper. Moreover, vision-based grasp planners, typically trained in simulation, often make errors that cannot be foreseen until contact. As a step towards addressing these problems, we present first a global shape-based feature descriptor for object recognition using non-prehensile tactile probing alone. Then, we investigate in making the tactile modality, local and slow by nature, more efficient for the task by predicting the most cost-effective moves using active exploration. To combine the local and physical advantages of touch and the fast and global advantages of vision, we propose and evaluate a learning-based method for visuotactile integration for grasping
Automated freeform assembly of threaded fasteners
Over the past two decades, a major part of the manufacturing and assembly market has been driven by its customer requirements. Increasing customer demand for personalised products create the demand for smaller batch sizes, shorter production times, lower costs, and the flexibility to produce families of products - or different parts - with the same sets of equipment. Consequently, manufacturing companies have deployed various automation systems and production strategies to improve their resource efficiency and move towards right-first-time production. However, many of these automated systems, which are involved with robot-based, repeatable assembly automation, require component- specific fixtures for accurate positioning and extensive robot programming, to achieve flexibility in their production.
Threaded fastening operations are widely used in assembly. In high-volume production, the fastening processes are commonly automated using jigs, fixtures, and semi-automated tools. This form of automation delivers reliable assembly results at the expense of flexibility and requires component variability to be adequately controlled. On the other hand, in low- volume, high- value manufacturing, fastening processes are typically carried out manually by skilled workers.
This research is aimed at addressing the aforementioned issues by developing a freeform automated threaded fastener assembly system that uses 3D visual guidance. The proof-of-concept system developed focuses on picking up fasteners from clutter, identifying a hole feature in an imprecisely positioned target component and carry out torque-controlled fastening. This approach has achieved flexibility and adaptability without the use of dedicated fixtures and robot programming.
This research also investigates and evaluates different 3D imaging technology to identify the suitable technology required for fastener assembly in a non-structured industrial environment. The proposed solution utilises the commercially available technologies to enhance the precision and speed of identification of components for assembly processes, thereby improving and validating the possibility of reliably implementing this solution for industrial applications.
As a part of this research, a number of novel algorithms are developed to robustly identify assembly components located in a random environment by enhancing the existing methods and technologies within the domain of the fastening processes. A bolt identification algorithm was developed to identify bolts located in a random clutter by enhancing the existing surface-based matching algorithm. A novel hole feature identification algorithm was developed to detect threaded holes and identify its size and location in 3D.
The developed bolt and feature identification algorithms are robust and has sub-millimetre accuracy required to perform successful fastener assembly in industrial conditions. In addition, the processing time required for these identification algorithms - to identify and localise bolts and hole features - is less than a second, thereby increasing the speed of fastener assembly
Recommended from our members
Towards Generalist Robots through Visual World Modeling
Moving from narrow robots specializing in specific tasks to generalist robots excelling in multiple tasks in various environmental conditions is the future of next-generation robotics. The key to generalist robots is the ability to learn world models that are reusable, generalizable, and adaptable. Having a general understanding of how the physical world works will enable robots to acquire transferable knowledge across different tasks, predict possible outcomes of future actions before execution, and constantly update their knowledge through continual interactions. While the majority of robot learning frameworks tend to mix task-related and task-agnostic components altogether throughout the learning process, these two components are often not intertwined when one of them is changed. For example, a task-agnostic component such as the computational model of the robot body remains the same even under different task settings, while a task-related component such as the dynamics of a moving object remains the same for different embodiments.
This thesis studies the key steps towards building generalist robots by decomposing the world modeling problem into task-agnostic and task-related elements: (1) robot self-modeling; (2) robot modeling other agents; and (3) robot modeling the physical environment. This framework has produced powerful and efficient learning-based robotic systems for a variety of tasks and physical embodiments, such as computational models of physical robots that can be reused and adapted to numerous task objectives and changing environments, behavior modeling frameworks for complex multi-robot applications, and dynamical system understanding algorithms to distill compact physics knowledge from high-dimensional and multi-modal sensory data. The approach in this thesis could help catalyze the understanding, prediction, and control of increasingly complex systems
Reconstruction and recognition of confusable models using three-dimensional perception
Perception is one of the key topics in robotics research. It is about the processing
of external sensor data and its interpretation. The necessity of fully autonomous
robots makes it crucial to help them to perform tasks more reliably, flexibly, and
efficiently. As these platforms obtain more refined manipulation capabilities, they
also require expressive and comprehensive environment models: for manipulation
and affordance purposes, their models have to involve each one of the objects
present in the world, coincidentally with their location, pose, shape and other aspects.
The aim of this dissertation is to provide a solution to several of these challenges
that arise when meeting the object grasping problem, with the aim of improving
the autonomy of the mobile manipulator robot MANFRED-2. By the analysis
and interpretation of 3D perception, this thesis covers in the first place the
localization of supporting planes in the scenario. As the environment will contain
many other things apart from the planar surface, the problem within cluttered
scenarios has been solved by means of Differential Evolution, which is a particlebased
evolutionary algorithm that evolves in time to the solution that yields the
cost function lowest value.
Since the final purpose of this thesis is to provide with valuable information for
grasping applications, a complete model reconstructor has been developed. The
proposed method holdsmany features such as robustness against abrupt rotations,
multi-dimensional optimization, feature extensibility, compatible with other scan
matching techniques, management of uncertain information and an initialization
process to reduce convergence timings. It has been designed using a evolutionarybased
scan matching optimizer that takes into account surface features of the object,
global form and also texture and color information.
The last tackled challenge regards the recognition problem. In order to procure
with worthy information about the environment to the robot, a meta classifier that discerns efficiently the observed objects has been implemented. It is capable
of distinguishing between confusable objects, such as mugs or dishes with similar
shapes but different size or color.
The contributions presented in this thesis have been fully implemented and
empirically evaluated in the platform. A continuous grasping pipeline covering
from perception to grasp planning including visual object recognition for confusable
objects has been developed. For that purpose, an indoor environment with
several objects on a table is presented in the nearby of the robot. Items are recognized
from a database and, if one is chosen, the robot will calculate how to grasp
it taking into account the kinematic restrictions associated to the anthropomorphic
hand and the 3D model for this particular object. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------La percepción es uno de los temas más relevantes en el mundo de la investigaci
ón en robótica. Su objetivo es procesar e interpretar los datos recibidos por
un sensor externo. La gran necesidad de desarrollar robots autónomos hace imprescindible
proporcionar soluciones que les permita realizar tareas más precisas,
flexibles y eficientes. Dado que estas plataformas cada día adquieren mejores capacidades
para manipular objetos, también necesitarán modelos expresivos y comprensivos:
para realizar tareas de manipulación y prensión, sus modelos han de
tener en cuenta cada uno de los objetos presentes en su entorno, junto con su localizaci
ón, orientación, forma y otros aspectos.
El objeto de la presente tesis doctoral es proponer soluciones a varios de los
retos que surgen al enfrentarse al problema del agarre, con el propósito final de
aumentar la capacidad de autonomía del robot manipulador MANFRED-2. Mediante
el análisis e interpretación de la percepción tridimensional, esta tesis cubre
en primer lugar la localización de planos de soporte en sus alrededores. Dado que
el entorno contendrá muchos otros elementos aparte de la superficie de apoyo buscada, el problema en entornos abarrotados ha sido solucionado mediante Evolución
Diferencial, que es un algoritmo evolutivo basado en partículas que evoluciona
temporalmente a la solución que contempla el menor resultado en la función de
coste.
Puesto que el propósito final de este trabajo de investigación es proveer de información valiosa a las aplicaciones de prensión, se ha desarrollado un reconstructor
de modelos completos. El método propuesto posee diferentes características
como robustez a giros abruptos, optimización multidimensional, extensión a otras
características, compatibilidad con otras técnicas de reconstrucción, manejo de incertidumbres
y un proceso de inicialización para reducir el tiempo de convergencia. Ha sido diseñado usando un registro optimizado mediante técnicas evolutivas
que tienen en cuenta las particularidades de la superficie del objeto, su forma
global y la información relativa a la textura.
El último problema abordado está relacionado con el reconocimiento de objetos. Con la intención de abastecer al robot con la mayor información posible sobre el entorno, se ha implementado un meta clasificador que diferencia de manera eficaz los objetos observados. Ha sido capacitado para distinguir objetos confundibles como tazas o platos con formas similares pero con diferentes colores o tamaños.
Las contribuciones presentes en esta tesis han sido completamente implementadas y probadas de manera empírica en la plataforma. Se ha desarrollado un sistema que cubre el problema de agarre desde la percepción al cálculo de la trayectoria
incluyendo el sistema de reconocimiento de objetos confundibles. Para ello, se ha presentado una mesa con objetos en un entorno cerrado cercano al robot. Los elementos son comparados con una base de datos y si se desea agarrar uno de ellos,
el robot estimará cómo cogerlo teniendo en cuenta las restricciones cinemáticas asociadas a una mano antropomórfica y el modelo tridimensional generado del objeto en cuestión
- …