244 research outputs found

    A robotic engine assembly pick-place system based on machine learning

    Get PDF
    Industrial revolution brought humans and machines together in building a better future. Where in one hand there is need to replace the repetitive jobs with machines to increase efficiency and volume of production, on the other hand intelligent and autonomous machines have still a long way to go to achieve dexterity of a human. The current scenario requires a system which can utilise best of both the human and the machine. This thesis studies a industrial use case scenario where human-machine combine their skills to build an autonomous pick place system. This study takes a small step towards the human-robot consortium primarily focusing on developing a vision based system for object detection followed by a manipulator pick place operation. This thesis can be divided into two parts : 1. Scene analysis, where a Convolutional Neural Network (CNN) is used for object detection followed by generation of grasping points using object edge image and an algorithm developed during this thesis. 2. Implementation, it focuses on motion generation while taking care of external disturbances to perform successful pick-place operation. In addition human involvement is required which includes teaching trajectory points for the robot to follow. This trajectory is used to generate image data-set for a new object type and thereafter generating new object detection model. The author primarily focuses on building a system framework where the complexities related to robot programming such as generating trajectory points and informing grasping position is not required. The system automatically detects object and performs a pick place operation, resulting in relieving user from robot programming. The system is composed of a depth camera and a manipulator. Camera is the only sensor available for scene analysis and the action is performed using a Franka manipulator. The two components work in request-response mode over ROS. This thesis introduces a newer approaches such as, dividing an workspace image into its constituent object images and performing object detection, creating training data, generating grasp points based on object shape along length of an object. The thesis also presents a case study where three different objects are chosen as test objects. The experiments are a demonstration of the methods applied and efficiency attained. The case study also provides a glimpse of the future research and development areas

    Obstacle Avoidance and Path Planning for Smart Indoor Agents

    Get PDF
    Although joysticks on motorized wheelchairs have improved the lives of so many, patients with Parkinson\u27s, stroke, limb injury, or vision problems need alternate solutions. Further, navigating wheelchairs through cluttered environments without colliding into objects or people can be a challenging task. Due to these reasons, many patients are reliant on a caretaker for daily tasks. To aid persons with disabilities, the Machine Intelligence Laboratory Personal Electronic Transport (Milpet), provides a solution. Milpet is an effective access wheelchair with speech recognition capabilities. Commands such as ``Milpet, take me to room 237’’ or ``Milpet, move forward’’ can be given. As Milpet executes the patient’s commands, it will calculate the optimal route, avoid obstacles, and recalculate a path if necessary. This thesis describes the development of modular obstacle avoidance and path planning algorithms for indoor agents. Due to the modularity of the system, the navigation system is expandable for different robots. The obstacle avoidance system is configurable to exhibit various behaviors. According to need, the agent can be influenced by a path or the environment, exhibit wall following or hallway centering, or just wander in free space while avoiding obstacles. This navigation system has been tested under various conditions to demonstrate the robustness of the obstacle and path planning modules. A measurement of obstacle proximity and destination proximity have been introduced for showing the practicality of the navigation system. The capabilities introduced to Milpet are a big step in giving the independence and privacy back to so many who are reliant on care givers or loved ones

    Real-time object detection using monocular vision for low-cost automotive sensing systems

    Get PDF
    This work addresses the problem of real-time object detection in automotive environments using monocular vision. The focus is on real-time feature detection, tracking, depth estimation using monocular vision and finally, object detection by fusing visual saliency and depth information. Firstly, a novel feature detection approach is proposed for extracting stable and dense features even in images with very low signal-to-noise ratio. This methodology is based on image gradients, which are redefined to take account of noise as part of their mathematical model. Each gradient is based on a vector connecting a negative to a positive intensity centroid, where both centroids are symmetric about the centre of the area for which the gradient is calculated. Multiple gradient vectors define a feature with its strength being proportional to the underlying gradient vector magnitude. The evaluation of the Dense Gradient Features (DeGraF) shows superior performance over other contemporary detectors in terms of keypoint density, tracking accuracy, illumination invariance, rotation invariance, noise resistance and detection time. The DeGraF features form the basis for two new approaches that perform dense 3D reconstruction from a single vehicle-mounted camera. The first approach tracks DeGraF features in real-time while performing image stabilisation with minimal computational cost. This means that despite camera vibration the algorithm can accurately predict the real-world coordinates of each image pixel in real-time by comparing each motion-vector to the ego-motion vector of the vehicle. The performance of this approach has been compared to different 3D reconstruction methods in order to determine their accuracy, depth-map density, noise-resistance and computational complexity. The second approach proposes the use of local frequency analysis of i ii gradient features for estimating relative depth. This novel method is based on the fact that DeGraF gradients can accurately measure local image variance with subpixel accuracy. It is shown that the local frequency by which the centroid oscillates around the gradient window centre is proportional to the depth of each gradient centroid in the real world. The lower computational complexity of this methodology comes at the expense of depth map accuracy as the camera velocity increases, but it is at least five times faster than the other evaluated approaches. This work also proposes a novel technique for deriving visual saliency maps by using Division of Gaussians (DIVoG). In this context, saliency maps express the difference of each image pixel is to its surrounding pixels across multiple pyramid levels. This approach is shown to be both fast and accurate when evaluated against other state-of-the-art approaches. Subsequently, the saliency information is combined with depth information to identify salient regions close to the host vehicle. The fused map allows faster detection of high-risk areas where obstacles are likely to exist. As a result, existing object detection algorithms, such as the Histogram of Oriented Gradients (HOG) can execute at least five times faster. In conclusion, through a step-wise approach computationally-expensive algorithms have been optimised or replaced by novel methodologies to produce a fast object detection system that is aligned to the requirements of the automotive domain

    Machine learning algorithms for structured decision making

    Get PDF

    Using encoder-decoder architecture for material segmentation based on beam profile analysis

    Get PDF
    Abstract. Recognition and segmentation of materials has proven to be a challenging problem because of the wide divergence in appearance within and between categories. Many recent material segmentation approaches treat materials as yet another set of labels like objects. However, materials are basically different from objects as they have no basic shape or defined spatial extent. Our approach roughly ignores this and can primarily take advantage of limited implicit context (local appearance) as it seems during training, because our training images that almost do not have a global image context; such as (I) where the used materials have no inherent shape or defined spatial extent like apple, orange and potato approximately have the same spherical shape; (II) besides, images where taken under a black background, which roughly removes the spatial features of the materials. We introduce a new materials segmentation dataset, which was taken with a Beam Profile Analysis sensing device. The dataset contains 10 material categories, and it has image pair samples consisting of grayscale images with and without the laser spots (grayscale and laser images) in addition to annotated segmented images. To the best of our knowledge, this is the first material segmentation dataset for Beam Profile Analysis images. As a second step, we proposed a deep learning approach to perform material segmentation on our dataset; our proposed CNNs is an encoder-decoder model, which is based on the DeeplabV3+ model. Our main goal is to obtain segmented material maps and discover how the laser spots contribute to the segmentation results; therefore, we perform a comparative analysis across different types of architectures to observe how the laser spots contribute to the whole segmentation. We built our experiments on three main types of models that use a different type of input; for each model, we implemented various types of backbone architectures. Our experiments results show that the laser spots have an efficient contribution on the segmentation results. GrayLaser model achieves a significant accuracy improvement compared to other models, where the fine-tuned architecture of this model has reached an accuracy of 94% over MIoU metric, and one trained from the scratch has reached an accuracy of 62% over MIoU

    Local user mapping via multi-modal fusion for social robots

    Get PDF
    User detection, recognition and tracking is at the heart of Human Robot Interaction, and yet, to date, no universal robust method exists for being aware of the people in a robot surroundings. The presented work aims at importing into existing social robotics platforms different techniques, some of them classical, and other novel, for detecting, recognizing and tracking human users. These algorithms are based on a variety of sensors, mainly cameras and depth imaging devices, but also lasers and microphones. The results of these parallel algorithms are then merged so as to obtain a modular, expandable and fast architecture. This results in a local user mapping thanks to multi-modal fusion. Thanks to this user awareness architecture, user detection, recognition and tracking capabilities can be easily and quickly given to any robot by re-using the modules that match its sensors and its processing performance. The architecture provides all the relevant information about the users around the robot, that can then be used for end-user applications that adapt their behavior to the users around the robot. The variety of social robots in which the architecture has been successfully implemented includes a car-like mobile robot, an articulated flower and a humanoid assistance robot. Some modules of the architecture are very lightweight but have a low reliability, others need more CPU but the associated confidence is higher. All configurations of modules are possible, and fit the range of possible robotics hardware configurations. All the modules are independent and highly configurable, therefore no code needs to be developed for building a new configuration, the user only writes a ROS launch file. This simple text file contains all wanted modules. The architecture has been developed with modularity and speed in mind. It is based on the Robot Operating System (ROS) architecture, a de facto software standard in robotics. The different people detectors comply with a common interface called PeoplePoseList Publisher, while the people recognition algorithms comply with an interface called PeoplePoseList Matcher. The fusion of all these different modules is based on Unscented Kalman Filter techniques. Extensive benchmarks of the sub-components and of the whole architecture, using both academic datasets and data acquired in our lab, and end-user application samples demonstrate the validity and interest of all levels of the architecture.La detección, el reconocimiento y el seguimiento de los usuarios es un problema clave para la Interacción Humano-Robot. Sin embargo, al día de hoy, no existe ningún método robusto universal para para lograr que un robot sea consciente de la gente que le rodea. Esta tesis tiene como objetivo implementar, dentro de robots sociales, varias técnicas, algunas clásicas, otras novedosas, para detectar, reconocer y seguir a los usuarios humanos. Estos algoritmos se basan en sensores muy variados, principalmente cámaras y fuentes de imágenes de profundidad, aunque también en láseres y micrófonos. Los resultados parciales, suministrados por estos algoritmos corriendo en paralelo, luego son mezcladas usando técnicas probabilísticas para obtener una arquitectura modular, extensible y rápida. Esto resulta en un mapa local de los usuarios, obtenido por técnicas de fusión de datos. Gracias a esta arquitectura, las habilidades de detección, reconocimiento y seguimiento de los usuarios podrían ser integradas fácil y rápidamente dentro de un nuevo robot, reusando los módulos que corresponden a sus sensores y el rendimiento de su procesador. La arquitectura suministra todos los datos útiles sobre los usuarios en el alrededor del robot y se puede usar por aplicaciones de más alto nivel en nuestros robots sociales de manera que el robot adapte su funcionamiento a las personas que le rodean. Los robots sociales en los cuales la arquitectura se pudo importar con éxito son: un robot en forma de coche, una flor articulada, y un robot humanoide asistencial. Algunos módulos de la arquitectura son muy ligeros pero con una fiabilidad baja, mientras otros requieren más CPU pero son más fiables. Todas las configuraciones de los módulos son posibles y se ajustan a las diferentes configuraciones hardware que puede tener el robot. Los módulos son independientes entre ellos y altamente configurables, por lo que no hay que desarrollar código para una nueva configuración. El usuario sólo tiene que escribir un fichero launch de ROS. Este sencillo fichero de texto contiene todos los módulos que se quieren lanzar. Esta arquitectura se desarrolló teniendo en mente que fuese modular y rápida. Se basa en la arquitectura Robot Operating System (ROS), un estándar software de facto en la robótica. Todos los detectores de personas tienen una interfaz común llamada PeoplePoseList Publisher, mientras los algoritmos de reconocimiento siguen una interfaz llamada PeoplePoseList Matcher. La fusión de todos estos módulos se basa en técnicas de filtros de Kalman no lineares (Unscented Kalman Filters). Se han realizado pruebas exhaustivas de precisión y de velocidad de cada componente y de la arquitectura completa (realizadas sobre ambos bases de datos académicas además de sobre datos grabados en nuestro laboratorio), así como prototipos sencillos de aplicaciones finales. Así se comprueba la validez y el interés de la arquitectura a todos los niveles.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Fernando Torres Medina.- Secretario: María Dolores Blanco Rojas.- Vocal: Jorge Manuel Miranda Día

    OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

    Full text link
    Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale realscanned 3D databases. To facilitate the development of 3D perception, reconstruction, and generation in the real world, we propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. OmniObject3D has several appealing properties: 1) Large Vocabulary: It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets (e.g., ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations. 2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos. 3) Realistic Scans: The professional scanners support highquality object scans with precise shapes and realistic appearances. With the vast exploration space offered by OmniObject3D, we carefully set up four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c) neural surface reconstruction, and d) 3D object generation. Extensive studies are performed on these four benchmarks, revealing new observations, challenges, and opportunities for future research in realistic 3D vision.Comment: Project page: https://omniobject3d.github.io

    Toward a Deep Learning Approach for Automatic Semantic Segmentation of 3D Lidar Point Clouds in Urban Areas

    Full text link
    peer reviewedSemantic segmentation of Lidar data using Deep Learning (DL) is a fundamental step for a deep and rigorous understanding of large-scale urban areas. Indeed, the increasing development of Lidar technology in terms of accuracy and spatial resolution offers a best opportunity for delivering a reliable semantic segmentation in large-scale urban environments. Significant progress has been reported in this direction. However, the literature lacks a deep comparison of the existing methods and algorithms in terms of strengths and weakness. The aim of the present paper is therefore to propose an objective review about these methods by highlighting their strengths and limitations. We then propose a new approach based on the combination of Lidar data and other sources in conjunction with a Deep Learning technique whose objective is to automatically extract semantic information from airborne Lidar point clouds by enhancing both accuracy and semantic precision compared to the existing methods. We finally present the first results of our approach
    corecore