1,033 research outputs found

    Video Processing with Additional Information

    Get PDF
    Cameras are frequently deployed along with many additional sensors in aerial and ground-based platforms. Many video datasets have metadata containing measurements from inertial sensors, GPS units, etc. Hence the development of better video processing algorithms using additional information attains special significance. We first describe an intensity-based algorithm for stabilizing low resolution and low quality aerial videos. The primary contribution is the idea of minimizing the discrepancy in the intensity of selected pixels between two images. This is an application of inverse compositional alignment for registering images of low resolution and low quality, for which minimizing the intensity difference over salient pixels with high gradients results in faster and better convergence than when using all the pixels. Secondly, we describe a feature-based method for stabilization of aerial videos and segmentation of small moving objects. We use the coherency of background motion to jointly track features through the sequence. This enables accurate tracking of large numbers of features in the presence of repetitive texture, lack of well conditioned feature windows etc. We incorporate the segmentation problem within the joint feature tracking framework and propose the first combined joint-tracking and segmentation algorithm. The proposed approach enables highly accurate tracking, and segmentation of feature tracks that is used in a MAP-MRF framework for obtaining dense pixelwise labeling of the scene. We demonstrate competitive moving object detection in challenging video sequences of the VIVID dataset containing moving vehicles and humans that are small enough to cause background subtraction approaches to fail. Structure from Motion (SfM) has matured to a stage, where the emphasis is on developing fast, scalable and robust algorithms for large reconstruction problems. The availability of additional sensors such as inertial units and GPS along with video cameras motivate the development of SfM algorithms that leverage these additional measurements. In the third part, we study the benefits of the availability of a specific form of additional information - the vertical direction (gravity) and the height of the camera both of which can be conveniently measured using inertial sensors, and a monocular video sequence for 3D urban modeling. We show that in the presence of this information, the SfM equations can be rewritten in a bilinear form. This allows us to derive a fast, robust, and scalable SfM algorithm for large scale applications. The proposed SfM algorithm is experimentally demonstrated to have favorable properties compared to the sparse bundle adjustment algorithm. We provide experimental evidence indicating that the proposed algorithm converges in many cases to solutions with lower error than state-of-art implementations of bundle adjustment. We also demonstrate that for the case of large reconstruction problems, the proposed algorithm takes lesser time to reach its solution compared to bundle adjustment. We also present SfM results using our algorithm on the Google StreetView research dataset, and several other datasets

    Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image

    Full text link
    We consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image. Since depth estimation from monocular images alone is inherently ambiguous and unreliable, to attain a higher level of robustness and accuracy, we introduce additional sparse depth samples, which are either acquired with a low-resolution depth sensor or computed via visual Simultaneous Localization and Mapping (SLAM) algorithms. We propose the use of a single deep regression network to learn directly from the RGB-D raw data, and explore the impact of number of depth samples on prediction accuracy. Our experiments show that, compared to using only RGB images, the addition of 100 spatially random depth samples reduces the prediction root-mean-square error by 50% on the NYU-Depth-v2 indoor dataset. It also boosts the percentage of reliable prediction from 59% to 92% on the KITTI dataset. We demonstrate two applications of the proposed algorithm: a plug-in module in SLAM to convert sparse maps to dense maps, and super-resolution for LiDARs. Software and video demonstration are publicly available.Comment: accepted to ICRA 2018. 8 pages, 8 figures, 3 tables. Video at https://www.youtube.com/watch?v=vNIIT_M7x7Y. Code at https://github.com/fangchangma/sparse-to-dens

    High Speed and High Dynamic Range Video with an Event Camera

    Full text link
    Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality (> 20%), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos (> 5,000 frames per second) of high-speed phenomena (e.g. a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. We also demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data.Comment: arXiv admin note: substantial text overlap with arXiv:1904.0829

    Reconstruction and analysis of dynamic shapes

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 122-141).Motion capture has revolutionized entertainment and influenced fields as diverse as the arts, sports, and medicine. This is despite the limitation that it tracks only a small set of surface points. On the other hand, 3D scanning techniques digitize complete surfaces of static objects, but are not applicable to moving shapes. I present methods that overcome both limitations, and can obtain the moving geometry of dynamic shapes (such as people and clothes in motion) and analyze it in order to advance computer animation. Further understanding of dynamic shapes will enable various industries to enhance virtual characters, advance robot locomotion, improve sports performance, and aid in medical rehabilitation, thus directly affecting our daily lives. My methods efficiently recover much of the expressiveness of dynamic shapes from the silhouettes alone. Furthermore, the reconstruction quality is greatly improved by including surface orientations (normals). In order to make reconstruction more practical, I strive to capture dynamic shapes in their natural environment, which I do by using hybrid inertial and acoustic sensors. After capture, the reconstructed dynamic shapes are analyzed in order to enhance their utility. My algorithms then allow animators to generate novel motions, such as transferring facial performances from one actor onto another using multi-linear models. The presented research provides some of the first and most accurate reconstructions of complex moving surfaces, and is among the few approaches that establish a relationship between different dynamic shapes.by Daniel Vlasic.Ph.D

    GPS-DENIED GEO-LOCALISATION USING VISUAL ODOMETRY

    Get PDF

    High Speed and High Dynamic Range Video with an Event Camera

    Full text link
    Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality (>20%), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos (> 5,000 frames per second) of high-speed phenomena (e.g. a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-t..
    corecore