74,839 research outputs found
Unsupervised Learning-based Depth Estimation aided Visual SLAM Approach
The RGB-D camera maintains a limited range for working and is hard to
accurately measure the depth information in a far distance. Besides, the RGB-D
camera will easily be influenced by strong lighting and other external factors,
which will lead to a poor accuracy on the acquired environmental depth
information. Recently, deep learning technologies have achieved great success
in the visual SLAM area, which can directly learn high-level features from the
visual inputs and improve the estimation accuracy of the depth information.
Therefore, deep learning technologies maintain the potential to extend the
source of the depth information and improve the performance of the SLAM system.
However, the existing deep learning-based methods are mainly supervised and
require a large amount of ground-truth depth data, which is hard to acquire
because of the realistic constraints. In this paper, we first present an
unsupervised learning framework, which not only uses image reconstruction for
supervising but also exploits the pose estimation method to enhance the
supervised signal and add training constraints for the task of monocular depth
and camera motion estimation. Furthermore, we successfully exploit our
unsupervised learning framework to assist the traditional ORB-SLAM system when
the initialization module of ORB-SLAM method could not match enough features.
Qualitative and quantitative experiments have shown that our unsupervised
learning framework performs the depth estimation task comparable to the
supervised methods and outperforms the previous state-of-the-art approach by
on KITTI dataset. Besides, our unsupervised learning framework could
significantly accelerate the initialization process of ORB-SLAM system and
effectively improve the accuracy on environmental mapping in strong lighting
and weak texture scenes.Comment: 27 page
Dynamic Pose-Robust Facial Expression Recognition by Multi-View Pairwise Conditional Random Forests
Automatic facial expression classification (FER) from videos is a critical
problem for the development of intelligent human-computer interaction systems.
Still, it is a challenging problem that involves capturing high-dimensional
spatio-temporal patterns describing the variation of one's appearance over
time. Such representation undergoes great variability of the facial morphology
and environmental factors as well as head pose variations. In this paper, we
use Conditional Random Forests to capture low-level expression transition
patterns. More specifically, heterogeneous derivative features (e.g. feature
point movements or texture variations) are evaluated upon pairs of images. When
testing on a video frame, pairs are created between this current frame and
previous ones and predictions for each previous frame are used to draw trees
from Pairwise Conditional Random Forests (PCRF) whose pairwise outputs are
averaged over time to produce robust estimates. Moreover, PCRF collections can
also be conditioned on head pose estimation for multi-view dynamic FER. As
such, our approach appears as a natural extension of Random Forests for
learning spatio-temporal patterns, potentially from multiple viewpoints.
Experiments on popular datasets show that our method leads to significant
improvements over standard Random Forests as well as state-of-the-art
approaches on several scenarios, including a novel multi-view video corpus
generated from a publicly available database.Comment: Extension of an ICCV 2015 pape
Deep Auxiliary Learning for Visual Localization and Odometry
Localization is an indispensable component of a robot's autonomy stack that
enables it to determine where it is in the environment, essentially making it a
precursor for any action execution or planning. Although convolutional neural
networks have shown promising results for visual localization, they are still
grossly outperformed by state-of-the-art local feature-based techniques. In
this work, we propose VLocNet, a new convolutional neural network architecture
for 6-DoF global pose regression and odometry estimation from consecutive
monocular images. Our multitask model incorporates hard parameter sharing, thus
being compact and enabling real-time inference, in addition to being end-to-end
trainable. We propose a novel loss function that utilizes auxiliary learning to
leverage relative pose information during training, thereby constraining the
search space to obtain consistent pose estimates. We evaluate our proposed
VLocNet on indoor as well as outdoor datasets and show that even our single
task model exceeds the performance of state-of-the-art deep architectures for
global localization, while achieving competitive performance for visual
odometry estimation. Furthermore, we present extensive experimental evaluations
utilizing our proposed Geometric Consistency Loss that show the effectiveness
of multitask learning and demonstrate that our model is the first deep learning
technique to be on par with, and in some cases outperforms state-of-the-art
SIFT-based approaches.Comment: Accepted for ICRA 201
The Intelligent ICU Pilot Study: Using Artificial Intelligence Technology for Autonomous Patient Monitoring
Currently, many critical care indices are repetitively assessed and recorded
by overburdened nurses, e.g. physical function or facial pain expressions of
nonverbal patients. In addition, many essential information on patients and
their environment are not captured at all, or are captured in a non-granular
manner, e.g. sleep disturbance factors such as bright light, loud background
noise, or excessive visitations. In this pilot study, we examined the
feasibility of using pervasive sensing technology and artificial intelligence
for autonomous and granular monitoring of critically ill patients and their
environment in the Intensive Care Unit (ICU). As an exemplar prevalent
condition, we also characterized delirious and non-delirious patients and their
environment. We used wearable sensors, light and sound sensors, and a
high-resolution camera to collected data on patients and their environment. We
analyzed collected data using deep learning and statistical analysis. Our
system performed face detection, face recognition, facial action unit
detection, head pose detection, facial expression recognition, posture
recognition, actigraphy analysis, sound pressure and light level detection, and
visitation frequency detection. We were able to detect patient's face (Mean
average precision (mAP)=0.94), recognize patient's face (mAP=0.80), and their
postures (F1=0.94). We also found that all facial expressions, 11 activity
features, visitation frequency during the day, visitation frequency during the
night, light levels, and sound pressure levels during the night were
significantly different between delirious and non-delirious patients
(p-value<0.05). In summary, we showed that granular and autonomous monitoring
of critically ill patients and their environment is feasible and can be used
for characterizing critical care conditions and related environment factors
Physics-based Scene-level Reasoning for Object Pose Estimation in Clutter
This paper focuses on vision-based pose estimation for multiple rigid objects
placed in clutter, especially in cases involving occlusions and objects resting
on each other. Progress has been achieved recently in object recognition given
advancements in deep learning. Nevertheless, such tools typically require a
large amount of training data and significant manual effort to label objects.
This limits their applicability in robotics, where solutions must scale to a
large number of objects and variety of conditions. Moreover, the combinatorial
nature of the scenes that could arise from the placement of multiple objects is
hard to capture in the training dataset. Thus, the learned models might not
produce the desired level of precision required for tasks, such as robotic
manipulation. This work proposes an autonomous process for pose estimation that
spans from data generation to scene-level reasoning and self-learning. In
particular, the proposed framework first generates a labeled dataset for
training a Convolutional Neural Network (CNN) for object detection in clutter.
These detections are used to guide a scene-level optimization process, which
considers the interactions between the different objects present in the clutter
to output pose estimates of high precision. Furthermore, confident estimates
are used to label online real images from multiple views and re-train the
process in a self-learning pipeline. Experimental results indicate that this
process is quickly able to identify in cluttered scenes physically-consistent
object poses that are more precise than the ones found by reasoning over
individual instances of objects. Furthermore, the quality of pose estimates
increases over time given the self-learning process.Comment: 18 pages, 13 figures, International Journal of Robotics Research
(IJRR) 2019. arXiv admin note: text overlap with arXiv:1710.0857
Mini-Unmanned Aerial Vehicle-Based Remote Sensing: Techniques, Applications, and Prospects
The past few decades have witnessed the great progress of unmanned aircraft
vehicles (UAVs) in civilian fields, especially in photogrammetry and remote
sensing. In contrast with the platforms of manned aircraft and satellite, the
UAV platform holds many promising characteristics: flexibility, efficiency,
high-spatial/temporal resolution, low cost, easy operation, etc., which make it
an effective complement to other remote-sensing platforms and a cost-effective
means for remote sensing. Considering the popularity and expansion of UAV-based
remote sensing in recent years, this paper provides a systematic survey on the
recent advances and future prospectives of UAVs in the remote-sensing
community. Specifically, the main challenges and key technologies of
remote-sensing data processing based on UAVs are discussed and summarized
firstly. Then, we provide an overview of the widespread applications of UAVs in
remote sensing. Finally, some prospects for future work are discussed. We hope
this paper will provide remote-sensing researchers an overall picture of recent
UAV-based remote sensing developments and help guide the further research on
this topic
Multi-Expert Gender Classification on Age Group by Integrating Deep Neural Networks
Generally, facial age variations affect gender classification accuracy
significantly, because facial shape and skin texture change as they grow old.
This requires re-examination on the gender classification system to consider
facial age information. In this paper, we propose Multi-expert Gender
Classification on Age Group (MGA), an end-to-end multi-task learning schemes of
age estimation and gender classification. First, two types of deep neural
networks are utilized; Convolutional Appearance Network (CAN) for facial
appearance feature and Deep Geometry Network (DGN) for facial geometric
feature. Then, CAN and DGN are integrated by the proposed model integration
strategy and fine-tuned in order to improve age and gender classification
accuracy. The facial images are categorized into one of three age groups
(young, adult and elder group) based on their estimated age, and the system
makes a gender prediction according to average fusion strategy of three gender
classification experts, which are trained to fit gender characteristics of each
age group. Rigorous experimental results conducted on the challenging databases
suggest that the proposed MGA outperforms several state-of-art researches with
smaller computational cost.Comment: 12 page
Toward Low-Flying Autonomous MAV Trail Navigation using Deep Neural Networks for Environmental Awareness
We present a micro aerial vehicle (MAV) system, built with inexpensive
off-the-shelf hardware, for autonomously following trails in unstructured,
outdoor environments such as forests. The system introduces a deep neural
network (DNN) called TrailNet for estimating the view orientation and lateral
offset of the MAV with respect to the trail center. The DNN-based controller
achieves stable flight without oscillations by avoiding overconfident behavior
through a loss function that includes both label smoothing and entropy reward.
In addition to the TrailNet DNN, the system also utilizes vision modules for
environmental awareness, including another DNN for object detection and a
visual odometry component for estimating depth for the purpose of low-level
obstacle detection. All vision systems run in real time on board the MAV via a
Jetson TX1. We provide details on the hardware and software used, as well as
implementation details. We present experiments showing the ability of our
system to navigate forest trails more robustly than previous techniques,
including autonomous flights of 1 km.Comment: 7 pages, 9 figures, IROS2017 conference submission 1657, accompanying
videos are posted on YouTube at: https://www.youtube.com/watch?v=H7Ym3DMSGms
, https://www.youtube.com/watch?v=USYlt9t0lZ
DeepLO: Geometry-Aware Deep LiDAR Odometry
Recently, learning-based ego-motion estimation approaches have drawn strong
interest from studies mostly focusing on visual perception. These
groundbreaking works focus on unsupervised learning for odometry estimation but
mostly for visual sensors. Compared to images, a learning-based approach using
Light Detection and Ranging (LiDAR) has been reported in a few studies where,
most often, a supervised learning framework is proposed. In this paper, we
propose a novel approach to geometry-aware deep LiDAR odometry trainable via
both supervised and unsupervised frameworks. We incorporate the Iterated
Closest Point (ICP) algorithm into a deep-learning framework and show the
reliability of the proposed pipeline. We provide two loss functions that allow
switching between supervised and unsupervised learning depending on the
ground-truth validity in the training phase. An evaluation using the KITTI and
Oxford RobotCar dataset demonstrates the prominent performance and efficiency
of the proposed method when achieving pose accuracy.Comment: 8 page
VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry
Semantic understanding and localization are fundamental enablers of robot
autonomy that have for the most part been tackled as disjoint problems. While
deep learning has enabled recent breakthroughs across a wide spectrum of scene
understanding tasks, its applicability to state estimation tasks has been
limited due to the direct formulation that renders it incapable of encoding
scene-specific constrains. In this work, we propose the VLocNet++ architecture
that employs a multitask learning approach to exploit the inter-task
relationship between learning semantics, regressing 6-DoF global pose and
odometry, for the mutual benefit of each of these tasks. Our network overcomes
the aforementioned limitation by simultaneously embedding geometric and
semantic knowledge of the world into the pose regression network. We propose a
novel adaptive weighted fusion layer to aggregate motion-specific temporal
information and to fuse semantic features into the localization stream based on
region activations. Furthermore, we propose a self-supervised warping technique
that uses the relative motion to warp intermediate network representations in
the segmentation stream for learning consistent semantics. Finally, we
introduce a first-of-a-kind urban outdoor localization dataset with pixel-level
semantic labels and multiple loops for training deep networks. Extensive
experiments on the challenging Microsoft 7-Scenes benchmark and our DeepLoc
dataset demonstrate that our approach exceeds the state-of-the-art
outperforming local feature-based methods while simultaneously performing
multiple tasks and exhibiting substantial robustness in challenging scenarios.Comment: Demo and dataset available at http://deeploc.cs.uni-freiburg.d
- …