84,908 research outputs found

    Real-Time Human Motion Capture with Multiple Depth Cameras

    Full text link
    Commonly used human motion capture systems require intrusive attachment of markers that are visually tracked with multiple cameras. In this work we present an efficient and inexpensive solution to markerless motion capture using only a few Kinect sensors. Unlike the previous work on 3d pose estimation using a single depth camera, we relax constraints on the camera location and do not assume a co-operative user. We apply recent image segmentation techniques to depth images and use curriculum learning to train our system on purely synthetic data. Our method accurately localizes body parts without requiring an explicit shape model. The body joint locations are then recovered by combining evidence from multiple views in real-time. We also introduce a dataset of ~6 million synthetic depth frames for pose estimation from multiple cameras and exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201

    Sensor Fusion of Structure-from-Motion, Bathymetric 3D, and Beacon-Based Navigation Modalities

    Full text link
    This paper describes an approach for the fusion of 30 data underwater obtained from multiple sensing modalities. In particular, we examine the combination of imagebased Structure-From-Motion (SFM) data with bathymetric data obtained using pencil-beam underwater sonar, in order to recover the shape of the seabed terrain. We also combine image-based egomotion estimation with acousticbased and inertial navigation data on board the underwater vehicle. We examine multiple types of fusion. When fusion is pe?$ormed at the data level, each modality is used to extract 30 information independently. The 30 representations are then aligned and compared. In this case, we use the bathymetric data as ground truth to measure the accuracy and drijl of the SFM approach. Similarly we use the navigation data as ground truth against which we measure the accuracy of the image-based ego-motion estimation. To our knowledge, this is the frst quantitative evaluation of image-based SFM and egomotion accuracy in a large-scale outdoor environment. Fusion at the signal level uses the raw signals from multiple sensors to produce a single coherent 30 representation which takes optimal advantage of the sensors' complementary strengths. In this papel; we examine how lowresolution bathymetric data can be used to seed the higherresolution SFM algorithm, improving convergence rates, and reducing drift error. Similarly, acoustic-based and inertial navigation data improves the convergence and driji properties of egomotion estimation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/86044/1/hsingh-35.pd

    Extrinsic Calibration and Ego-Motion Estimation for Mobile Multi-Sensor Systems

    Get PDF
    Autonomous robots and vehicles are often equipped with multiple sensors to perform vital tasks such as localization or mapping. The joint system of various sensors with different sensing modalities can often provide better localization or mapping results than individual sensor alone in terms of accuracy or completeness. However, to enable improved performance, two important challenges have to be addressed when dealing with multi-sensor systems. Firstly, how to accurately determine the spatial relationship between individual sensor on the robot? This is a vital task known as extrinsic calibration. Without this calibration information, measurements from different sensors cannot be fused. Secondly, how to combine data from multiple sensors to correct for the deficiencies of each sensor, and thus, provides better estimations? This is another important task known as data fusion. The core of this thesis is to provide answers to these two questions. We cover, in the first part of the thesis, aspects related to improving the extrinsic calibration accuracy, and present, in the second part, novel data fusion algorithms designed to address the ego-motion estimation problem using data from a laser scanner and a monocular camera. In the extrinsic calibration part, we contribute by revealing and quantifying the relative calibration accuracies of three common types of calibration methods, so as to offer an insight into choosing the best calibration method when multiple options are available. Following that, we propose an optimization approach for solving common motion-based calibration problems. By exploiting the Gauss-Helmert model, our approach is more accurate and robust than classical least squares model. In the data fusion part, we focus on camera-laser data fusion and contribute with two new ego-motion estimation algorithms that combine complementary information from a laser scanner and a monocular camera. The first algorithm utilizes camera image information to guide the laser scan-matching. It can provide accurate motion estimates and yet can work in general conditions without requiring a field-of-view overlap between the camera and laser scanner, nor an initial guess of the motion parameters. The second algorithm combines the camera and the laser scanner information in a direct way, assuming the field-of-view overlap between the sensors is substantial. By maximizing the information usage of both the sparse laser point cloud and the dense image, the second algorithm is able to achieve state-of-the-art estimation accuracy. Experimental results confirm that both algorithms offer excellent alternatives to state-of-the-art camera-laser ego-motion estimation algorithms

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Shift Estimation Algorithm for Dynamic Sensors With Frame-to-Frame Variation in Their Spectral Response

    Get PDF
    This study is motivated by the emergence of a new class of tunable infrared spectral-imaging sensors that offer the ability to dynamically vary the sensor\u27s intrinsic spectral response from frame to frame in an electronically controlled fashion. A manifestation of this is when a sequence of dissimilar spectral responses is periodically realized, whereby in every period of acquired imagery, each frame is associated with a distinct spectral band. Traditional scene-based global shift estimation algorithms are not applicable to such spectrally heterogeneous video sequences, as a pixel value may change from frame to frame as a result of both global motion and varying spectral response. In this paper, a novel algorithm is proposed and examined to fuse a series of coarse global shift estimates between periodically sampled pairs of nonadjacent frames to estimate motion between consecutive frames; each pair corresponds to two nonadjacent frames of the same spectral band. The proposed algorithm outperforms three alternative methods, with the average error being one half of that obtained by using an equal weights version of the proposed algorithm, one-fourth of that obtained by using a simple linear interpolation method, and one-twentieth of that obtained by using a naiÂżve correlation-based direct method

    Temporal shape super-resolution by intra-frame motion encoding using high-fps structured light

    Full text link
    One of the solutions of depth imaging of moving scene is to project a static pattern on the object and use just a single image for reconstruction. However, if the motion of the object is too fast with respect to the exposure time of the image sensor, patterns on the captured image are blurred and reconstruction fails. In this paper, we impose multiple projection patterns into each single captured image to realize temporal super resolution of the depth image sequences. With our method, multiple patterns are projected onto the object with higher fps than possible with a camera. In this case, the observed pattern varies depending on the depth and motion of the object, so we can extract temporal information of the scene from each single image. The decoding process is realized using a learning-based approach where no geometric calibration is needed. Experiments confirm the effectiveness of our method where sequential shapes are reconstructed from a single image. Both quantitative evaluations and comparisons with recent techniques were also conducted.Comment: 9 pages, Published at the International Conference on Computer Vision (ICCV 2017

    Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks

    Get PDF
    This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by means of the sensor depth. The proposed system is marker-less, multi-person, independent of background and does not make any assumption on people appearance and initial pose. The system provides real-time outcomes, thus being perfectly suited for applications requiring user interaction. Experimental results show the effectiveness of this work with respect to a baseline multi-view approach in different scenarios. To foster research and applications based on this work, we released the source code in OpenPTrack, an open source project for RGB-D people tracking.Comment: Submitted to the 2018 IEEE International Conference on Robotics and Automatio
    • …
    corecore