348 research outputs found

    3D objects and scenes classification, recognition, segmentation, and reconstruction using 3D point cloud data: A review

    Three-dimensional (3D) point cloud analysis has become one of the attractive subjects in realistic imaging and machine visions due to its simplicity, flexibility and powerful capacity of visualization. Actually, the representation of scenes and buildings using 3D shapes and formats leveraged many applications among which automatic driving, scenes and objects reconstruction, etc. Nevertheless, working with this emerging type of data has been a challenging task for objects representation, scenes recognition, segmentation, and reconstruction. In this regard, a significant effort has recently been devoted to developing novel strategies, using different techniques such as deep learning models. To that end, we present in this paper a comprehensive review of existing tasks on 3D point cloud: a well-defined taxonomy of existing techniques is performed based on the nature of the adopted algorithms, application scenarios, and main objectives. Various tasks performed on 3D point could data are investigated, including objects and scenes detection, recognition, segmentation and reconstruction. In addition, we introduce a list of used datasets, we discuss respective evaluation metrics and we compare the performance of existing solutions to better inform the state-of-the-art and identify their limitations and strengths. Lastly, we elaborate on current challenges facing the subject of technology and future trends attracting considerable interest, which could be a starting point for upcoming research studie

    Implicit Object Pose Estimation on RGB Images Using Deep Learning Methods

    With the rise of robotic and camera systems and the success of deep learning in computer vision, there is growing interest in precisely determining object positions and orientations. This is crucial for tasks like automated bin picking, where a camera sensor analyzes images or point clouds to guide a robotic arm in grasping objects. Pose recognition has broader applications, such as predicting a car's trajectory in autonomous driving or adapting objects in virtual reality based on the viewer's perspective. This dissertation focuses on RGB-based pose estimation methods that use depth information only for refinement, which is a challenging problem. Recent advances in deep learning have made it possible to predict object poses in RGB images, despite challenges like object overlap, object symmetries and more. We introduce two implicit deep learning-based pose estimation methods for RGB images, covering the entire process from data generation to pose selection. Furthermore, theoretical findings on Fourier embeddings are shown to improve the performance of the so-called implicit neural representations - which are then successfully utilized for the task of implicit pose estimation

    Map-Based Localization for Unmanned Aerial Vehicle Navigation

    Unmanned Aerial Vehicles (UAVs) require precise pose estimation when navigating in indoor and GNSS-denied / GNSS-degraded outdoor environments. The possibility of crashing in these environments is high, as spaces are confined, with many moving obstacles. There are many solutions for localization in GNSS-denied environments, and many different technologies are used. Common solutions involve setting up or using existing infrastructure, such as beacons, Wi-Fi, or surveyed targets. These solutions were avoided because the cost should be proportional to the number of users, not the coverage area. Heavy and expensive sensors, for example a high-end IMU, were also avoided. Given these requirements, a camera-based localization solution was selected for the sensor pose estimation. Several camera-based localization approaches were investigated. Map-based localization methods were shown to be the most efficient because they close loops using a pre-existing map, thus the amount of data and the amount of time spent collecting data are reduced as there is no need to re-observe the same areas multiple times. This dissertation proposes a solution to address the task of fully localizing a monocular camera onboard a UAV with respect to a known environment (i.e., it is assumed that a 3D model of the environment is available) for the purpose of navigation for UAVs in structured environments. Incremental map-based localization involves tracking a map through an image sequence. When the map is a 3D model, this task is referred to as model-based tracking. A by-product of the tracker is the relative 3D pose (position and orientation) between the camera and the object being tracked. State-of-the-art solutions advocate that tracking geometry is more robust than tracking image texture because edges are more invariant to changes in object appearance and lighting. However, model-based trackers have been limited to tracking small simple objects in small environments. An assessment was performed in tracking larger, more complex building models, in larger environments. A state-of-the art model-based tracker called ViSP (Visual Servoing Platform) was applied in tracking outdoor and indoor buildings using a UAVs low-cost camera. The assessment revealed weaknesses at large scales. Specifically, ViSP failed when tracking was lost, and needed to be manually re-initialized. Failure occurred when there was a lack of model features in the cameras field of view, and because of rapid camera motion. Experiments revealed that ViSP achieved positional accuracies similar to single point positioning solutions obtained from single-frequency (L1) GPS observations standard deviations around 10 metres. These errors were considered to be large, considering the geometric accuracy of the 3D model used in the experiments was 10 to 40 cm. The first contribution of this dissertation proposes to increase the performance of the localization system by combining ViSP with map-building incremental localization, also referred to as simultaneous localization and mapping (SLAM). Experimental results in both indoor and outdoor environments show sub-metre positional accuracies were achieved, while reducing the number of tracking losses throughout the image sequence. It is shown that by integrating model-based tracking with SLAM, not only does SLAM improve model tracking performance, but the model-based tracker alleviates the computational expense of SLAMs loop closing procedure to improve runtime performance. Experiments also revealed that ViSP was unable to handle occlusions when a complete 3D building model was used, resulting in large errors in its pose estimates. The second contribution of this dissertation is a novel map-based incremental localization algorithm that improves tracking performance, and increases pose estimation accuracies from ViSP. The novelty of this algorithm is the implementation of an efficient matching process that identifies corresponding linear features from the UAVs RGB image data and a large, complex, and untextured 3D model. The proposed model-based tracker improved positional accuracies from 10 m (obtained with ViSP) to 46 cm in outdoor environments, and improved from an unattainable result using VISP to 2 cm positional accuracies in large indoor environments. The main disadvantage of any incremental algorithm is that it requires the camera pose of the first frame. Initialization is often a manual process. The third contribution of this dissertation is a map-based absolute localization algorithm that automatically estimates the camera pose when no prior pose information is available. The method benefits from vertical line matching to accomplish a registration procedure of the reference model views with a set of initial input images via geometric hashing. Results demonstrate that sub-metre positional accuracies were achieved and a proposed enhancement of conventional geometric hashing produced more correct matches - 75% of the correct matches were identified, compared to 11%. Further the number of incorrect matches was reduced by 80%

    Tactile Perception And Visuotactile Integration For Robotic Exploration

    As the close perceptual sibling of vision, the sense of touch has historically received less than deserved attention in both human psychology and robotics. In robotics, this may be attributed to at least two reasons. First, it suffers from the vicious cycle of immature sensor technology, which causes industry demand to be low, and then there is even less incentive to make existing sensors in research labs easy to manufacture and marketable. Second, the situation stems from a fear of making contact with the environment, avoided in every way so that visually perceived states do not change before a carefully estimated and ballistically executed physical interaction. Fortunately, the latter viewpoint is starting to change. Work in interactive perception and contact-rich manipulation are on the rise. Good reasons are steering the manipulation and locomotion communities’ attention towards deliberate physical interaction with the environment prior to, during, and after a task. We approach the problem of perception prior to manipulation, using the sense of touch, for the purpose of understanding the surroundings of an autonomous robot. The overwhelming majority of work in perception for manipulation is based on vision. While vision is a fast and global modality, it is insufficient as the sole modality, especially in environments where the ambient light or the objects therein do not lend themselves to vision, such as in darkness, smoky or dusty rooms in search and rescue, underwater, transparent and reflective objects, and retrieving items inside a bag. Even in normal lighting conditions, during a manipulation task, the target object and fingers are usually occluded from view by the gripper. Moreover, vision-based grasp planners, typically trained in simulation, often make errors that cannot be foreseen until contact. As a step towards addressing these problems, we present first a global shape-based feature descriptor for object recognition using non-prehensile tactile probing alone. Then, we investigate in making the tactile modality, local and slow by nature, more efficient for the task by predicting the most cost-effective moves using active exploration. To combine the local and physical advantages of touch and the fast and global advantages of vision, we propose and evaluate a learning-based method for visuotactile integration for grasping

    Automated freeform assembly of threaded fasteners

    Over the past two decades, a major part of the manufacturing and assembly market has been driven by its customer requirements. Increasing customer demand for personalised products create the demand for smaller batch sizes, shorter production times, lower costs, and the flexibility to produce families of products - or different parts - with the same sets of equipment. Consequently, manufacturing companies have deployed various automation systems and production strategies to improve their resource efficiency and move towards right-first-time production. However, many of these automated systems, which are involved with robot-based, repeatable assembly automation, require component- specific fixtures for accurate positioning and extensive robot programming, to achieve flexibility in their production. Threaded fastening operations are widely used in assembly. In high-volume production, the fastening processes are commonly automated using jigs, fixtures, and semi-automated tools. This form of automation delivers reliable assembly results at the expense of flexibility and requires component variability to be adequately controlled. On the other hand, in low- volume, high- value manufacturing, fastening processes are typically carried out manually by skilled workers. This research is aimed at addressing the aforementioned issues by developing a freeform automated threaded fastener assembly system that uses 3D visual guidance. The proof-of-concept system developed focuses on picking up fasteners from clutter, identifying a hole feature in an imprecisely positioned target component and carry out torque-controlled fastening. This approach has achieved flexibility and adaptability without the use of dedicated fixtures and robot programming. This research also investigates and evaluates different 3D imaging technology to identify the suitable technology required for fastener assembly in a non-structured industrial environment. The proposed solution utilises the commercially available technologies to enhance the precision and speed of identification of components for assembly processes, thereby improving and validating the possibility of reliably implementing this solution for industrial applications. As a part of this research, a number of novel algorithms are developed to robustly identify assembly components located in a random environment by enhancing the existing methods and technologies within the domain of the fastening processes. A bolt identification algorithm was developed to identify bolts located in a random clutter by enhancing the existing surface-based matching algorithm. A novel hole feature identification algorithm was developed to detect threaded holes and identify its size and location in 3D. The developed bolt and feature identification algorithms are robust and has sub-millimetre accuracy required to perform successful fastener assembly in industrial conditions. In addition, the processing time required for these identification algorithms - to identify and localise bolts and hole features - is less than a second, thereby increasing the speed of fastener assembly

    Reconstruction and recognition of confusable models using three-dimensional perception

    Perception is one of the key topics in robotics research. It is about the processing of external sensor data and its interpretation. The necessity of fully autonomous robots makes it crucial to help them to perform tasks more reliably, flexibly, and efficiently. As these platforms obtain more refined manipulation capabilities, they also require expressive and comprehensive environment models: for manipulation and affordance purposes, their models have to involve each one of the objects present in the world, coincidentally with their location, pose, shape and other aspects. The aim of this dissertation is to provide a solution to several of these challenges that arise when meeting the object grasping problem, with the aim of improving the autonomy of the mobile manipulator robot MANFRED-2. By the analysis and interpretation of 3D perception, this thesis covers in the first place the localization of supporting planes in the scenario. As the environment will contain many other things apart from the planar surface, the problem within cluttered scenarios has been solved by means of Differential Evolution, which is a particlebased evolutionary algorithm that evolves in time to the solution that yields the cost function lowest value. Since the final purpose of this thesis is to provide with valuable information for grasping applications, a complete model reconstructor has been developed. The proposed method holdsmany features such as robustness against abrupt rotations, multi-dimensional optimization, feature extensibility, compatible with other scan matching techniques, management of uncertain information and an initialization process to reduce convergence timings. It has been designed using a evolutionarybased scan matching optimizer that takes into account surface features of the object, global form and also texture and color information. The last tackled challenge regards the recognition problem. In order to procure with worthy information about the environment to the robot, a meta classifier that discerns efficiently the observed objects has been implemented. It is capable of distinguishing between confusable objects, such as mugs or dishes with similar shapes but different size or color. The contributions presented in this thesis have been fully implemented and empirically evaluated in the platform. A continuous grasping pipeline covering from perception to grasp planning including visual object recognition for confusable objects has been developed. For that purpose, an indoor environment with several objects on a table is presented in the nearby of the robot. Items are recognized from a database and, if one is chosen, the robot will calculate how to grasp it taking into account the kinematic restrictions associated to the anthropomorphic hand and the 3D model for this particular object. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------La percepción es uno de los temas más relevantes en el mundo de la investigaci ón en robótica. Su objetivo es procesar e interpretar los datos recibidos por un sensor externo. La gran necesidad de desarrollar robots autónomos hace imprescindible proporcionar soluciones que les permita realizar tareas más precisas, flexibles y eficientes. Dado que estas plataformas cada día adquieren mejores capacidades para manipular objetos, también necesitarán modelos expresivos y comprensivos: para realizar tareas de manipulación y prensión, sus modelos han de tener en cuenta cada uno de los objetos presentes en su entorno, junto con su localizaci ón, orientación, forma y otros aspectos. El objeto de la presente tesis doctoral es proponer soluciones a varios de los retos que surgen al enfrentarse al problema del agarre, con el propósito final de aumentar la capacidad de autonomía del robot manipulador MANFRED-2. Mediante el análisis e interpretación de la percepción tridimensional, esta tesis cubre en primer lugar la localización de planos de soporte en sus alrededores. Dado que el entorno contendrá muchos otros elementos aparte de la superficie de apoyo buscada, el problema en entornos abarrotados ha sido solucionado mediante Evolución Diferencial, que es un algoritmo evolutivo basado en partículas que evoluciona temporalmente a la solución que contempla el menor resultado en la función de coste. Puesto que el propósito final de este trabajo de investigación es proveer de información valiosa a las aplicaciones de prensión, se ha desarrollado un reconstructor de modelos completos. El método propuesto posee diferentes características como robustez a giros abruptos, optimización multidimensional, extensión a otras características, compatibilidad con otras técnicas de reconstrucción, manejo de incertidumbres y un proceso de inicialización para reducir el tiempo de convergencia. Ha sido diseñado usando un registro optimizado mediante técnicas evolutivas que tienen en cuenta las particularidades de la superficie del objeto, su forma global y la información relativa a la textura. El último problema abordado está relacionado con el reconocimiento de objetos. Con la intención de abastecer al robot con la mayor información posible sobre el entorno, se ha implementado un meta clasificador que diferencia de manera eficaz los objetos observados. Ha sido capacitado para distinguir objetos confundibles como tazas o platos con formas similares pero con diferentes colores o tamaños. Las contribuciones presentes en esta tesis han sido completamente implementadas y probadas de manera empírica en la plataforma. Se ha desarrollado un sistema que cubre el problema de agarre desde la percepción al cálculo de la trayectoria incluyendo el sistema de reconocimiento de objetos confundibles. Para ello, se ha presentado una mesa con objetos en un entorno cerrado cercano al robot. Los elementos son comparados con una base de datos y si se desea agarrar uno de ellos, el robot estimará cómo cogerlo teniendo en cuenta las restricciones cinemáticas asociadas a una mano antropomórfica y el modelo tridimensional generado del objeto en cuestión