2,093 research outputs found
Object-Based Rendering and 3D reconstruction Using a Moveable Image-Based System
published_or_final_versio
GlobalFlowNet: Video Stabilization using Deep Distilled Global Motion Estimates
Videos shot by laymen using hand-held cameras contain undesirable shaky
motion. Estimating the global motion between successive frames, in a manner not
influenced by moving objects, is central to many video stabilization
techniques, but poses significant challenges. A large body of work uses 2D
affine transformations or homography for the global motion. However, in this
work, we introduce a more general representation scheme, which adapts any
existing optical flow network to ignore the moving objects and obtain a
spatially smooth approximation of the global motion between video frames. We
achieve this by a knowledge distillation approach, where we first introduce a
low pass filter module into the optical flow network to constrain the predicted
optical flow to be spatially smooth. This becomes our student network, named as
\textsc{GlobalFlowNet}. Then, using the original optical flow network as the
teacher network, we train the student network using a robust loss function.
Given a trained \textsc{GlobalFlowNet}, we stabilize videos using a two stage
process. In the first stage, we correct the instability in affine parameters
using a quadratic programming approach constrained by a user-specified cropping
limit to control loss of field of view. In the second stage, we stabilize the
video further by smoothing global motion parameters, expressed using a small
number of discrete cosine transform coefficients. In extensive experiments on a
variety of different videos, our technique outperforms state of the art
techniques in terms of subjective quality and different quantitative measures
of video stability. The source code is publicly available at
\href{https://github.com/GlobalFlowNet/GlobalFlowNet}{https://github.com/GlobalFlowNet/GlobalFlowNet}Comment: Accepted in WACV 202
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking
In this paper, we propose a generative framework that unifies depth-based 3D
facial pose tracking and face model adaptation on-the-fly, in the unconstrained
scenarios with heavy occlusions and arbitrary facial expression variations.
Specifically, we introduce a statistical 3D morphable model that flexibly
describes the distribution of points on the surface of the face model, with an
efficient switchable online adaptation that gradually captures the identity of
the tracked subject and rapidly constructs a suitable face model when the
subject changes. Moreover, unlike prior art that employed ICP-based facial pose
estimation, to improve robustness to occlusions, we propose a ray visibility
constraint that regularizes the pose based on the face model's visibility with
respect to the input point cloud. Ablation studies and experimental results on
Biwi and ICT-3DHP datasets demonstrate that the proposed framework is effective
and outperforms completing state-of-the-art depth-based methods
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Recommended from our members
Panoramic Video Stitching
Digital camera and smartphone technologies have made high quality images and video pervasive and abundant. Combining or stitching collections of images from a variety of viewpoints into an extended panoramic image is a common and popular function for such devices. Extending this functionality to video however, poses many new challenges due to the demand for both spatial and temporal continuity. Multi-view video stitching (also called panoramic video stitching) is an emerging, common research area in computer vision, image/video processing and computer graphics and has wide applications in virtual reality, virtual tourism, surveillance, and human computer interaction. In this thesis, I will explore the technical and practical problems in the complete process of stitching a high-resolution multiview video into a high-resolution panoramic video. The challenges addressed include video stabilization, efficient multi-view video alignment and panoramic video stitching, color correction, and blurred frame detection and repair.
Specifically, I propose a continuity aware Kalman filtering scheme for rotation angles for video stabilization and jitter removal. For efficient stitching of long, high-resolution panoramic videos, I propose constrained and multigrid SIFT matching schemes, concatenated image projection and warping and min-space feathering. These three approaches together can greatly reduce the computational time and memory requirement in panoramic video stitching, which makes it feasible to stitch high-resolution (e.g., 1920x1080 pixels) and long panoramic video sequences using standard workstations.
Color correction is the emphasis of my research. On this topic I first performed a systematic survey and performance evaluation of nine state of the art color correction approaches in the context of two-view image stitching. My evaluation work not only gives useful insights and conclusions about the relative performance of these approaches, but also points out the remaining challenges and possible directions for future color correction research. Based on the conclusions from this evaluation work, I proposed a hybrid and scalable color correction approach for general n-view image stitching, and designed a two-view video color correction approach for panoramic video stitching.
For blurred frame detection and repair, I have completed preliminary work on image partial blur detection and classification, in which I proposed a SVM-based blur block classifier using improved and new local blur features. Then, based on partial blur classification results, I designed a statistical thresholding scheme for blurred frame identification. For the detected blurred frames, I repaired them using polynomial data fitting from neighboring unblurred frames.
Many of the techniques and ideas in this thesis are novel and general solutions to the technical or practical problems in panoramic video stitching. At the end of this thesis, I conclude the contributions made by this thesis to the research and popularization of panoramic video stitching, and describe those open research issues
Video Stabilisation Based on Spatial Transformer Networks
User-Generated Content is normally recorded with mobile phones by non-professionals, which leads to a low viewing experience due to artifacts such as jitter and blur. Other jittery videos are those recorded with mounted cameras or moving platforms. In these scenarios, Digital Video Stabilization (DVS) has been utilized, to create high quality, professional level videos. In the industry and academia, there are a number of traditional and Deep Learning (DL)-based DVS systems, however both approaches have limitations: the former struggles to extract and track features in a number of scenarios, and the latter struggles with camera path smoothing, a hard problem to define in this context. On the other hand, traditional methods have shown good performance in smoothing camera path whereas DL methods are effective in feature extraction, tracking, and motion parameter estimation. Hence, to the best of our knowledge the available DVS systems struggle to stabilize videos in a wide variety of scenarios, especially with high motion and certain scene content, such as textureless areas, dark scenes, close object, lack of depth, amongst others. Another challenge faced by current DVS implementations is the resulting artifacts that such systems add to the stabilized videos, degrading the viewing experience. These artifacts are mainly distortion, blur, zoom, and ghosting effects. In this thesis, we utilize the strengths of Deep Learning and traditional methods for video stabilization. Our approach is robust to a wide variety of scene content and camera motion, and avoids adding artifacts to the stabilized video. First, we provide a dataset and evaluation framework for Deep Learning-based DVS. Then, we present our image alignment module, which contains a Spatial Transformer Network (STN). Next, we leverage this module to propose a homography-based video stabilization system. Aiming at avoiding blur and distortion caused by homographies, our next proposal is a translation-based video stabilization method, which contains Exponential Weighted Moving Averages (EWMAs) to smooth the camera path. Finally, instead of using EWMAs, we study the utilization of filters in our approach. In this case, we compare a number of filters and choose the filters with best performance. Since the quality of experience of a viewer does not only consist of video stability, but also of blur and distortion, we consider it is a good trade off to allow some jitter left on the video while avoiding adding distortion and blur. In all three cases, we show that this approach pays off, since our systems ourperform the state-of-the-art proposals
Video Magnification for Structural Analysis Testing
The goal of this thesis is to allow a user to see minute motion of an object at different frequencies, using a computer program, to aid in vibration testing analysis without the use of complex setups of accelerometers or expensive laser vibrometers. MIT’s phase-based video motion processing Âwas modified to enable modal determination of structures in the field using a cell phone camera. The algorithm was modified by implementing a stabilization algorithm and permitting the magnification filter to operate on multiple frequency ranges to enable visualization of the natural frequencies of structures in the field. To implement multiple frequency ranges a new function was developed to implement the magnification filter at each relevant frequency range within the original video. The stabilization algorithm would allow for a camera to be hand-held instead of requiring a tripod mount. The following methods for stabilization were tested: fixed point video stabilization and image registration. Neither method removed the global motion from the hand-held video, even after masking was implemented, which resulted in poor results. Specifically, fixed point did not remove much motion or created sharp motions and image registration introduced a pulsing effect. The best results occurred when the object being observed had contrast from the background, was the largest feature in the video frame, and the video was captured from a tripod at an appropriate angle. The final program can amplify the motion in user selected frequency bands and can be used as an aid in structural analysis testing
- …