84,908 research outputs found
Real-Time Human Motion Capture with Multiple Depth Cameras
Commonly used human motion capture systems require intrusive attachment of
markers that are visually tracked with multiple cameras. In this work we
present an efficient and inexpensive solution to markerless motion capture
using only a few Kinect sensors. Unlike the previous work on 3d pose estimation
using a single depth camera, we relax constraints on the camera location and do
not assume a co-operative user. We apply recent image segmentation techniques
to depth images and use curriculum learning to train our system on purely
synthetic data. Our method accurately localizes body parts without requiring an
explicit shape model. The body joint locations are then recovered by combining
evidence from multiple views in real-time. We also introduce a dataset of ~6
million synthetic depth frames for pose estimation from multiple cameras and
exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201
Sensor Fusion of Structure-from-Motion, Bathymetric 3D, and Beacon-Based Navigation Modalities
This paper describes an approach for the fusion of 30
data underwater obtained from multiple sensing modalities.
In particular, we examine the combination of imagebased
Structure-From-Motion (SFM) data with bathymetric
data obtained using pencil-beam underwater sonar, in
order to recover the shape of the seabed terrain. We also
combine image-based egomotion estimation with acousticbased
and inertial navigation data on board the underwater
vehicle.
We examine multiple types of fusion. When fusion is
pe?$ormed at the data level, each modality is used to extract
30 information independently. The 30 representations
are then aligned and compared. In this case, we use
the bathymetric data as ground truth to measure the accuracy
and drijl of the SFM approach. Similarly we use
the navigation data as ground truth against which we measure
the accuracy of the image-based ego-motion estimation.
To our knowledge, this is the frst quantitative evaluation
of image-based SFM and egomotion accuracy in a
large-scale outdoor environment.
Fusion at the signal level uses the raw signals from multiple
sensors to produce a single coherent 30 representation
which takes optimal advantage of the sensors' complementary
strengths. In this papel; we examine how lowresolution
bathymetric data can be used to seed the higherresolution
SFM algorithm, improving convergence rates,
and reducing drift error. Similarly, acoustic-based and inertial
navigation data improves the convergence and driji
properties of egomotion estimation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/86044/1/hsingh-35.pd
Extrinsic Calibration and Ego-Motion Estimation for Mobile Multi-Sensor Systems
Autonomous robots and vehicles are often equipped with multiple sensors to perform vital tasks such as localization or mapping. The joint system of various sensors with different sensing modalities can often provide better localization or mapping results than individual sensor alone in terms of accuracy or completeness. However, to enable improved performance, two important challenges have to be addressed when dealing with multi-sensor systems. Firstly, how to accurately determine the spatial relationship between individual sensor on the robot? This is a vital task known as extrinsic calibration. Without this calibration information, measurements from different sensors cannot be fused. Secondly, how to combine data from multiple sensors to correct for the deficiencies of each sensor, and thus, provides better estimations? This is another important task known as data fusion. The core of this thesis is to provide answers to these two questions. We cover, in the first part of the thesis, aspects related to improving the extrinsic calibration accuracy, and present, in the second part, novel data fusion algorithms designed to address the ego-motion estimation problem using data from a laser scanner and a monocular camera. In the extrinsic calibration part, we contribute by revealing and quantifying the relative calibration accuracies of three common types of calibration methods, so as to offer an insight into choosing the best calibration method when multiple options are available. Following that, we propose an optimization approach for solving common motion-based calibration problems. By exploiting the Gauss-Helmert model, our approach is more accurate and robust than classical least squares model. In the data fusion part, we focus on camera-laser data fusion and contribute with two new ego-motion estimation algorithms that combine complementary information from a laser scanner and a monocular camera. The first algorithm utilizes camera image information to guide the laser scan-matching. It can provide accurate motion estimates and yet can work in general conditions without requiring a field-of-view overlap between the camera and laser scanner, nor an initial guess of the motion parameters. The second algorithm combines the camera and the laser scanner information in a direct way, assuming the field-of-view overlap between the sensors is substantial. By maximizing the information usage of both the sparse laser point cloud and the dense image, the second algorithm is able to achieve state-of-the-art estimation accuracy. Experimental results confirm that both algorithms offer excellent alternatives to state-of-the-art camera-laser ego-motion estimation algorithms
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Shift Estimation Algorithm for Dynamic Sensors With Frame-to-Frame Variation in Their Spectral Response
This study is motivated by the emergence of a new class of tunable infrared spectral-imaging sensors that offer the ability to dynamically vary the sensor\u27s intrinsic spectral response from frame to frame in an electronically controlled fashion. A manifestation of this is when a sequence of dissimilar spectral responses is periodically realized, whereby in every period of acquired imagery, each frame is associated with a distinct spectral band. Traditional scene-based global shift estimation algorithms are not applicable to such spectrally heterogeneous video sequences, as a pixel value may change from frame to frame as a result of both global motion and varying spectral response. In this paper, a novel algorithm is proposed and examined to fuse a series of coarse global shift estimates between periodically sampled pairs of nonadjacent frames to estimate motion between consecutive frames; each pair corresponds to two nonadjacent frames of the same spectral band. The proposed algorithm outperforms three alternative methods, with the average error being one half of that obtained by using an equal weights version of the proposed algorithm, one-fourth of that obtained by using a simple linear interpolation method, and one-twentieth of that obtained by using a naiÂżve correlation-based direct method
Temporal shape super-resolution by intra-frame motion encoding using high-fps structured light
One of the solutions of depth imaging of moving scene is to project a static
pattern on the object and use just a single image for reconstruction. However,
if the motion of the object is too fast with respect to the exposure time of
the image sensor, patterns on the captured image are blurred and reconstruction
fails. In this paper, we impose multiple projection patterns into each single
captured image to realize temporal super resolution of the depth image
sequences. With our method, multiple patterns are projected onto the object
with higher fps than possible with a camera. In this case, the observed pattern
varies depending on the depth and motion of the object, so we can extract
temporal information of the scene from each single image. The decoding process
is realized using a learning-based approach where no geometric calibration is
needed. Experiments confirm the effectiveness of our method where sequential
shapes are reconstructed from a single image. Both quantitative evaluations and
comparisons with recent techniques were also conducted.Comment: 9 pages, Published at the International Conference on Computer Vision
(ICCV 2017
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
This paper proposes a novel system to estimate and track the 3D poses of
multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D
pose of each person is computed by a central node which receives the
single-view outcomes from each camera of the network. Each single-view outcome
is computed by using a CNN for 2D pose estimation and extending the resulting
skeletons to 3D by means of the sensor depth. The proposed system is
marker-less, multi-person, independent of background and does not make any
assumption on people appearance and initial pose. The system provides real-time
outcomes, thus being perfectly suited for applications requiring user
interaction. Experimental results show the effectiveness of this work with
respect to a baseline multi-view approach in different scenarios. To foster
research and applications based on this work, we released the source code in
OpenPTrack, an open source project for RGB-D people tracking.Comment: Submitted to the 2018 IEEE International Conference on Robotics and
Automatio
- …