74,839 research outputs found

    Unsupervised Learning-based Depth Estimation aided Visual SLAM Approach

    Full text link
    The RGB-D camera maintains a limited range for working and is hard to accurately measure the depth information in a far distance. Besides, the RGB-D camera will easily be influenced by strong lighting and other external factors, which will lead to a poor accuracy on the acquired environmental depth information. Recently, deep learning technologies have achieved great success in the visual SLAM area, which can directly learn high-level features from the visual inputs and improve the estimation accuracy of the depth information. Therefore, deep learning technologies maintain the potential to extend the source of the depth information and improve the performance of the SLAM system. However, the existing deep learning-based methods are mainly supervised and require a large amount of ground-truth depth data, which is hard to acquire because of the realistic constraints. In this paper, we first present an unsupervised learning framework, which not only uses image reconstruction for supervising but also exploits the pose estimation method to enhance the supervised signal and add training constraints for the task of monocular depth and camera motion estimation. Furthermore, we successfully exploit our unsupervised learning framework to assist the traditional ORB-SLAM system when the initialization module of ORB-SLAM method could not match enough features. Qualitative and quantitative experiments have shown that our unsupervised learning framework performs the depth estimation task comparable to the supervised methods and outperforms the previous state-of-the-art approach by 13.5%13.5\% on KITTI dataset. Besides, our unsupervised learning framework could significantly accelerate the initialization process of ORB-SLAM system and effectively improve the accuracy on environmental mapping in strong lighting and weak texture scenes.Comment: 27 page

    Dynamic Pose-Robust Facial Expression Recognition by Multi-View Pairwise Conditional Random Forests

    Full text link
    Automatic facial expression classification (FER) from videos is a critical problem for the development of intelligent human-computer interaction systems. Still, it is a challenging problem that involves capturing high-dimensional spatio-temporal patterns describing the variation of one's appearance over time. Such representation undergoes great variability of the facial morphology and environmental factors as well as head pose variations. In this paper, we use Conditional Random Forests to capture low-level expression transition patterns. More specifically, heterogeneous derivative features (e.g. feature point movements or texture variations) are evaluated upon pairs of images. When testing on a video frame, pairs are created between this current frame and previous ones and predictions for each previous frame are used to draw trees from Pairwise Conditional Random Forests (PCRF) whose pairwise outputs are averaged over time to produce robust estimates. Moreover, PCRF collections can also be conditioned on head pose estimation for multi-view dynamic FER. As such, our approach appears as a natural extension of Random Forests for learning spatio-temporal patterns, potentially from multiple viewpoints. Experiments on popular datasets show that our method leads to significant improvements over standard Random Forests as well as state-of-the-art approaches on several scenarios, including a novel multi-view video corpus generated from a publicly available database.Comment: Extension of an ICCV 2015 pape

    Deep Auxiliary Learning for Visual Localization and Odometry

    Full text link
    Localization is an indispensable component of a robot's autonomy stack that enables it to determine where it is in the environment, essentially making it a precursor for any action execution or planning. Although convolutional neural networks have shown promising results for visual localization, they are still grossly outperformed by state-of-the-art local feature-based techniques. In this work, we propose VLocNet, a new convolutional neural network architecture for 6-DoF global pose regression and odometry estimation from consecutive monocular images. Our multitask model incorporates hard parameter sharing, thus being compact and enabling real-time inference, in addition to being end-to-end trainable. We propose a novel loss function that utilizes auxiliary learning to leverage relative pose information during training, thereby constraining the search space to obtain consistent pose estimates. We evaluate our proposed VLocNet on indoor as well as outdoor datasets and show that even our single task model exceeds the performance of state-of-the-art deep architectures for global localization, while achieving competitive performance for visual odometry estimation. Furthermore, we present extensive experimental evaluations utilizing our proposed Geometric Consistency Loss that show the effectiveness of multitask learning and demonstrate that our model is the first deep learning technique to be on par with, and in some cases outperforms state-of-the-art SIFT-based approaches.Comment: Accepted for ICRA 201

    The Intelligent ICU Pilot Study: Using Artificial Intelligence Technology for Autonomous Patient Monitoring

    Full text link
    Currently, many critical care indices are repetitively assessed and recorded by overburdened nurses, e.g. physical function or facial pain expressions of nonverbal patients. In addition, many essential information on patients and their environment are not captured at all, or are captured in a non-granular manner, e.g. sleep disturbance factors such as bright light, loud background noise, or excessive visitations. In this pilot study, we examined the feasibility of using pervasive sensing technology and artificial intelligence for autonomous and granular monitoring of critically ill patients and their environment in the Intensive Care Unit (ICU). As an exemplar prevalent condition, we also characterized delirious and non-delirious patients and their environment. We used wearable sensors, light and sound sensors, and a high-resolution camera to collected data on patients and their environment. We analyzed collected data using deep learning and statistical analysis. Our system performed face detection, face recognition, facial action unit detection, head pose detection, facial expression recognition, posture recognition, actigraphy analysis, sound pressure and light level detection, and visitation frequency detection. We were able to detect patient's face (Mean average precision (mAP)=0.94), recognize patient's face (mAP=0.80), and their postures (F1=0.94). We also found that all facial expressions, 11 activity features, visitation frequency during the day, visitation frequency during the night, light levels, and sound pressure levels during the night were significantly different between delirious and non-delirious patients (p-value<0.05). In summary, we showed that granular and autonomous monitoring of critically ill patients and their environment is feasible and can be used for characterizing critical care conditions and related environment factors

    Physics-based Scene-level Reasoning for Object Pose Estimation in Clutter

    Full text link
    This paper focuses on vision-based pose estimation for multiple rigid objects placed in clutter, especially in cases involving occlusions and objects resting on each other. Progress has been achieved recently in object recognition given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large number of objects and variety of conditions. Moreover, the combinatorial nature of the scenes that could arise from the placement of multiple objects is hard to capture in the training dataset. Thus, the learned models might not produce the desired level of precision required for tasks, such as robotic manipulation. This work proposes an autonomous process for pose estimation that spans from data generation to scene-level reasoning and self-learning. In particular, the proposed framework first generates a labeled dataset for training a Convolutional Neural Network (CNN) for object detection in clutter. These detections are used to guide a scene-level optimization process, which considers the interactions between the different objects present in the clutter to output pose estimates of high precision. Furthermore, confident estimates are used to label online real images from multiple views and re-train the process in a self-learning pipeline. Experimental results indicate that this process is quickly able to identify in cluttered scenes physically-consistent object poses that are more precise than the ones found by reasoning over individual instances of objects. Furthermore, the quality of pose estimates increases over time given the self-learning process.Comment: 18 pages, 13 figures, International Journal of Robotics Research (IJRR) 2019. arXiv admin note: text overlap with arXiv:1710.0857

    Mini-Unmanned Aerial Vehicle-Based Remote Sensing: Techniques, Applications, and Prospects

    Full text link
    The past few decades have witnessed the great progress of unmanned aircraft vehicles (UAVs) in civilian fields, especially in photogrammetry and remote sensing. In contrast with the platforms of manned aircraft and satellite, the UAV platform holds many promising characteristics: flexibility, efficiency, high-spatial/temporal resolution, low cost, easy operation, etc., which make it an effective complement to other remote-sensing platforms and a cost-effective means for remote sensing. Considering the popularity and expansion of UAV-based remote sensing in recent years, this paper provides a systematic survey on the recent advances and future prospectives of UAVs in the remote-sensing community. Specifically, the main challenges and key technologies of remote-sensing data processing based on UAVs are discussed and summarized firstly. Then, we provide an overview of the widespread applications of UAVs in remote sensing. Finally, some prospects for future work are discussed. We hope this paper will provide remote-sensing researchers an overall picture of recent UAV-based remote sensing developments and help guide the further research on this topic

    Multi-Expert Gender Classification on Age Group by Integrating Deep Neural Networks

    Full text link
    Generally, facial age variations affect gender classification accuracy significantly, because facial shape and skin texture change as they grow old. This requires re-examination on the gender classification system to consider facial age information. In this paper, we propose Multi-expert Gender Classification on Age Group (MGA), an end-to-end multi-task learning schemes of age estimation and gender classification. First, two types of deep neural networks are utilized; Convolutional Appearance Network (CAN) for facial appearance feature and Deep Geometry Network (DGN) for facial geometric feature. Then, CAN and DGN are integrated by the proposed model integration strategy and fine-tuned in order to improve age and gender classification accuracy. The facial images are categorized into one of three age groups (young, adult and elder group) based on their estimated age, and the system makes a gender prediction according to average fusion strategy of three gender classification experts, which are trained to fit gender characteristics of each age group. Rigorous experimental results conducted on the challenging databases suggest that the proposed MGA outperforms several state-of-art researches with smaller computational cost.Comment: 12 page

    Toward Low-Flying Autonomous MAV Trail Navigation using Deep Neural Networks for Environmental Awareness

    Full text link
    We present a micro aerial vehicle (MAV) system, built with inexpensive off-the-shelf hardware, for autonomously following trails in unstructured, outdoor environments such as forests. The system introduces a deep neural network (DNN) called TrailNet for estimating the view orientation and lateral offset of the MAV with respect to the trail center. The DNN-based controller achieves stable flight without oscillations by avoiding overconfident behavior through a loss function that includes both label smoothing and entropy reward. In addition to the TrailNet DNN, the system also utilizes vision modules for environmental awareness, including another DNN for object detection and a visual odometry component for estimating depth for the purpose of low-level obstacle detection. All vision systems run in real time on board the MAV via a Jetson TX1. We provide details on the hardware and software used, as well as implementation details. We present experiments showing the ability of our system to navigate forest trails more robustly than previous techniques, including autonomous flights of 1 km.Comment: 7 pages, 9 figures, IROS2017 conference submission 1657, accompanying videos are posted on YouTube at: https://www.youtube.com/watch?v=H7Ym3DMSGms , https://www.youtube.com/watch?v=USYlt9t0lZ

    DeepLO: Geometry-Aware Deep LiDAR Odometry

    Full text link
    Recently, learning-based ego-motion estimation approaches have drawn strong interest from studies mostly focusing on visual perception. These groundbreaking works focus on unsupervised learning for odometry estimation but mostly for visual sensors. Compared to images, a learning-based approach using Light Detection and Ranging (LiDAR) has been reported in a few studies where, most often, a supervised learning framework is proposed. In this paper, we propose a novel approach to geometry-aware deep LiDAR odometry trainable via both supervised and unsupervised frameworks. We incorporate the Iterated Closest Point (ICP) algorithm into a deep-learning framework and show the reliability of the proposed pipeline. We provide two loss functions that allow switching between supervised and unsupervised learning depending on the ground-truth validity in the training phase. An evaluation using the KITTI and Oxford RobotCar dataset demonstrates the prominent performance and efficiency of the proposed method when achieving pose accuracy.Comment: 8 page

    VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry

    Full text link
    Semantic understanding and localization are fundamental enablers of robot autonomy that have for the most part been tackled as disjoint problems. While deep learning has enabled recent breakthroughs across a wide spectrum of scene understanding tasks, its applicability to state estimation tasks has been limited due to the direct formulation that renders it incapable of encoding scene-specific constrains. In this work, we propose the VLocNet++ architecture that employs a multitask learning approach to exploit the inter-task relationship between learning semantics, regressing 6-DoF global pose and odometry, for the mutual benefit of each of these tasks. Our network overcomes the aforementioned limitation by simultaneously embedding geometric and semantic knowledge of the world into the pose regression network. We propose a novel adaptive weighted fusion layer to aggregate motion-specific temporal information and to fuse semantic features into the localization stream based on region activations. Furthermore, we propose a self-supervised warping technique that uses the relative motion to warp intermediate network representations in the segmentation stream for learning consistent semantics. Finally, we introduce a first-of-a-kind urban outdoor localization dataset with pixel-level semantic labels and multiple loops for training deep networks. Extensive experiments on the challenging Microsoft 7-Scenes benchmark and our DeepLoc dataset demonstrate that our approach exceeds the state-of-the-art outperforming local feature-based methods while simultaneously performing multiple tasks and exhibiting substantial robustness in challenging scenarios.Comment: Demo and dataset available at http://deeploc.cs.uni-freiburg.d
    corecore