141 research outputs found

    Towards Visual Ego-motion Learning in Robots

    Full text link
    Many model-based Visual Odometry (VO) algorithms have been proposed in the past decade, often restricted to the type of camera optics, or the underlying motion manifold observed. We envision robots to be able to learn and perform these tasks, in a minimally supervised setting, as they gain more experience. To this end, we propose a fully trainable solution to visual ego-motion estimation for varied camera optics. We propose a visual ego-motion learning architecture that maps observed optical flow vectors to an ego-motion density estimate via a Mixture Density Network (MDN). By modeling the architecture as a Conditional Variational Autoencoder (C-VAE), our model is able to provide introspective reasoning and prediction for ego-motion induced scene-flow. Additionally, our proposed model is especially amenable to bootstrapped ego-motion learning in robots where the supervision in ego-motion estimation for a particular camera sensor can be obtained from standard navigation-based sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through experiments, we show the utility of our proposed approach in enabling the concept of self-supervised learning for visual ego-motion estimation in autonomous robots.Comment: Conference paper; Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017, Vancouver CA; 8 pages, 8 figures, 2 table

    Egomotion estimation using binocular spatiotemporal oriented energy

    Get PDF
    Camera egomotion estimation is concerned with the recovery of a camera's motion (e.g., instantaneous translation and rotation) as it moves through its environment. It has been demonstrated to be of both theoretical and practical interest. This thesis documents a novel algorithm for egomotion estimation based on binocularly matched spatiotemporal oriented energy distributions. Basing the estimation on oriented energy measurements makes it possible to recover egomotion without the need to establish temporal correspondences or convert disparity into 3D world coordinates. There sulting algorithm has been realized in software and evaluated quantitatively on a novel laboratory dataset with ground truth as well as qualitatively on both indoor and outdoor real-world datasets. Performance is evaluated relative to comparable alternative algorithms and shown to exhibit best overall performance

    Automatic Food Intake Assessment Using Camera Phones

    Get PDF
    Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors

    Precise and Robust Visual SLAM with Inertial Sensors and Deep Learning.

    Get PDF
    Dotar a los robots con el sentido de la percepción destaca como el componente más importante para conseguir máquinas completamente autónomas. Una vez que las máquinas sean capaces de percibir el mundo, podrán interactuar con él. A este respecto, la localización y la reconstrucción de mapas de manera simultánea, SLAM (por sus siglas en inglés) comprende todas las técnicas que permiten a los robots estimar su posición y reconstruir el mapa de su entorno al mismo tiempo, usando únicamente el conjunto de sensores a bordo. El SLAM constituye el elemento clave para la percepción de las máquinas, estando ya presente en diferentes tecnologías y aplicaciones como la conducción autónoma, la realidad virtual y aumentada o los robots de servicio. Incrementar la robustez del SLAM expandiría su uso y aplicación, haciendo las máquinas más seguras y requiriendo una menor intervención humana.En esta tesis hemos combinado sensores inerciales (IMU) y visuales para incrementar la robustez del SLAM ante movimientos rápidos, oclusiones breves o entornos con poca textura. Primero hemos propuesto dos técnicas rápidas para la inicialización del sensor inercial, con un bajo error de escala. Estas han permitido empezar a usar la IMU tan pronto como 2 segundos después de lanzar el sistema. Una de estas inicializaciones ha sido integrada en un nuevo sistema de SLAM visual inercial, acuñado como ORB-SLAM3, el cual representa la mayor contribución de esta tesis. Este es el sistema de SLAM visual-inercial de código abierto más completo hasta la fecha, que funciona con cámaras monoculares o estéreo, estenopeicas o de ojo de pez, y con capacidades multimapa. ORB-SLAM3 se basa en una formulación de Máximo a Posteriori, tanto en la inicialización como en el refinamiento y el ajuste de haces visual-inercial. También explota la asociación de datos en el corto, medio y largo plazo. Todo esto hace que ORB-SLAM3 sea el sistema SLAM visual-inercial más preciso, como así demuestran nuestros resultados en experimentos públicos.Además, hemos explorado la aplicación de técnicas de aprendizaje profundo para mejorar la robustez del SLAM. En este aspecto, primero hemos propuesto DynaSLAM II, un sistema SLAM estéreo para entornos dinámicos. Los objetos dinámicos son segmentados mediante una red neuronal, y sus puntos y medidas son incluidas eficientemente en la optimización de ajuste de haces. Esto permite estimar y hacer seguimiento de los objetos en movimiento, al mismo tiempo que se mejora la estimación de la trayectoria de la cámara. En segundo lugar, hemos desarrollado un SLAM monocular y directo basado en predicciones de profundidad a través de redes neuronales. Optimizamos de manera conjunta tanto los residuos de predicción de profundidad como los fotométricos de distintas vistas, lo que da lugar a un sistema monocular capaz de estimar la escala. No sufre el problema de deriva de escala, siendo más robusto y varias veces más preciso que los sistemas monoculares clásicos.<br /
    • …