30,444 research outputs found
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live
stream of RGB-D images as input and segments the scene into different objects
(using either motion or semantic cues) while simultaneously tracking and
reconstructing their 3D shape in real time. We use a multiple model fitting
approach where each object can move independently from the background and still
be effectively tracked and its shape fused over time using only the information
from pixels associated with that object label. Previous attempts to deal with
dynamic scenes have typically considered moving regions as outliers, and
consequently do not model their shape or track their motion over time. In
contrast, we enable the robot to maintain 3D models for each of the segmented
objects and to improve them over time through fusion. As a result, our system
can enable a robot to maintain a scene description at the object level which
has the potential to allow interactions with its working environment; even in
the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017,
http://visual.cs.ucl.ac.uk/pubs/cofusion,
https://github.com/martinruenz/co-fusio
Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
Visual scene understanding is an important capability that enables robots to
purposefully act in their environment. In this paper, we propose a novel
approach to object-class segmentation from multiple RGB-D views using deep
learning. We train a deep neural network to predict object-class semantics that
is consistent from several view points in a semi-supervised way. At test time,
the semantics predictions of our network can be fused more consistently in
semantic keyframe maps than predictions of a network trained on individual
views. We base our network architecture on a recent single-view deep learning
approach to RGB and depth fusion for semantic object-class segmentation and
enhance it with multi-scale loss minimization. We obtain the camera trajectory
using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth
annotated frames in order to enforce multi-view consistency during training. At
test time, predictions from multiple views are fused into keyframes. We propose
and analyze several methods for enforcing multi-view consistency during
training and testing. We evaluate the benefit of multi-view consistency
training and demonstrate that pooling of deep features and fusion over multiple
views outperforms single-view baselines on the NYUDv2 benchmark for semantic
segmentation. Our end-to-end trained network achieves state-of-the-art
performance on the NYUDv2 dataset in single-view segmentation as well as
multi-view semantic fusion.Comment: the 2017 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS 2017
EchoFusion: Tracking and Reconstruction of Objects in 4D Freehand Ultrasound Imaging without External Trackers
Ultrasound (US) is the most widely used fetal imaging technique. However, US
images have limited capture range, and suffer from view dependent artefacts
such as acoustic shadows. Compounding of overlapping 3D US acquisitions into a
high-resolution volume can extend the field of view and remove image artefacts,
which is useful for retrospective analysis including population based studies.
However, such volume reconstructions require information about relative
transformations between probe positions from which the individual volumes were
acquired. In prenatal US scans, the fetus can move independently from the
mother, making external trackers such as electromagnetic or optical tracking
unable to track the motion between probe position and the moving fetus. We
provide a novel methodology for image-based tracking and volume reconstruction
by combining recent advances in deep learning and simultaneous localisation and
mapping (SLAM). Tracking semantics are established through the use of a
Residual 3D U-Net and the output is fed to the SLAM algorithm. As a proof of
concept, experiments are conducted on US volumes taken from a whole body fetal
phantom, and from the heads of real fetuses. For the fetal head segmentation,
we also introduce a novel weak annotation approach to minimise the required
manual effort for ground truth annotation. We evaluate our method
qualitatively, and quantitatively with respect to tissue discrimination
accuracy and tracking robustness.Comment: MICCAI Workshop on Perinatal, Preterm and Paediatric Image analysis
(PIPPI), 201
WPU-Net: Boundary Learning by Using Weighted Propagation in Convolution Network
Deep learning has driven a great progress in natural and biological image
processing. However, in material science and engineering, there are often some
flaws and indistinctions in material microscopic images induced from complex
sample preparation, even due to the material itself, hindering the detection of
target objects. In this work, we propose WPU-net that redesigns the
architecture and weighted loss of U-Net, which forces the network to integrate
information from adjacent slices and pays more attention to the topology in
boundary detection task. Then, the WPU-net is applied into a typical material
example, i.e., the grain boundary detection of polycrystalline material.
Experiments demonstrate that the proposed method achieves promising performance
and outperforms state-of-the-art methods. Besides, we propose a new method for
object tracking between adjacent slices, which can effectively reconstruct 3D
structure of the whole material. Finally, we present a material microscopic
image dataset with the goal of advancing the state-of-the-art in image
processing for material science.Comment: technical repor
Human mobility monitoring in very low resolution visual sensor network
This paper proposes an automated system for monitoring mobility patterns using a network of very low resolution visual sensors (30 30 pixels). The use of very low resolution sensors reduces privacy concern, cost, computation requirement and power consumption. The core of our proposed system is a robust people tracker that uses low resolution videos provided by the visual sensor network. The distributed processing architecture of our tracking system allows all image processing tasks to be done on the digital signal controller in each visual sensor. In this paper, we experimentally show that reliable tracking of people is possible using very low resolution imagery. We also compare the performance of our tracker against a state-of-the-art tracking method and show that our method outperforms. Moreover, the mobility statistics of tracks such as total distance traveled and average speed derived from trajectories are compared with those derived from ground truth given by Ultra-Wide Band sensors. The results of this comparison show that the trajectories from our system are accurate enough to obtain useful mobility statistics
Box-level Segmentation Supervised Deep Neural Networks for Accurate and Real-time Multispectral Pedestrian Detection
Effective fusion of complementary information captured by multi-modal sensors
(visible and infrared cameras) enables robust pedestrian detection under
various surveillance situations (e.g. daytime and nighttime). In this paper, we
present a novel box-level segmentation supervised learning framework for
accurate and real-time multispectral pedestrian detection by incorporating
features extracted in visible and infrared channels. Specifically, our method
takes pairs of aligned visible and infrared images with easily obtained
bounding box annotations as input and estimates accurate prediction maps to
highlight the existence of pedestrians. It offers two major advantages over the
existing anchor box based multispectral detection methods. Firstly, it
overcomes the hyperparameter setting problem occurred during the training phase
of anchor box based detectors and can obtain more accurate detection results,
especially for small and occluded pedestrian instances. Secondly, it is capable
of generating accurate detection results using small-size input images, leading
to improvement of computational efficiency for real-time autonomous driving
applications. Experimental results on KAIST multispectral dataset show that our
proposed method outperforms state-of-the-art approaches in terms of both
accuracy and speed
- …