2,184 research outputs found
Multiple View Geometry For Video Analysis And Post-production
Multiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the currently used manual methods for correctly aligning an artificially inserted object in a scene. However, existing single view methods typically require multiple vanishing points, and therefore would fail when only one vanishing point is available. In addition, current multiple view techniques, making use of either epipolar geometry or trifocal tensor, do not exploit fully the properties of constant or known camera motion. Finally, there does not exist a general solution to the problem of synchronization of N video sequences of distinct general scenes captured by cameras undergoing similar ego-motions, which is the necessary step for video post-production among different input videos. This dissertation proposes several advancements that overcome these limitations. These advancements are used to develop an efficient framework for video analysis and post-production in multiple cameras. In the first part of the dissertation, the novel inter-image constraints are introduced that are particularly useful for scenes where minimal information is available. This result extends the current state-of-the-art in single view geometry techniques to situations where only one vanishing point is available. The property of constant or known camera motion is also described in this dissertation for applications such as calibration of a network of cameras in video surveillance systems, and Euclidean reconstruction from turn-table image sequences in the presence of zoom and focus. We then propose a new framework for the estimation and alignment of camera motions, including both simple (panning, tracking and zooming) and complex (e.g. hand-held) camera motions. Accuracy of these results is demonstrated by applying our approach to video post-production applications such as video cut-and-paste and shadow synthesis. As realistic image-based rendering problems, these applications require extreme accuracy in the estimation of camera geometry, the position and the orientation of the light source, and the photometric properties of the resulting cast shadows. In each case, the theoretical results are fully supported and illustrated by both numerical simulations and thorough experimentation on real data
Self-Calibration of Cameras with Euclidean Image Plane in Case of Two Views and Known Relative Rotation Angle
The internal calibration of a pinhole camera is given by five parameters that
are combined into an upper-triangular calibration matrix. If the
skew parameter is zero and the aspect ratio is equal to one, then the camera is
said to have Euclidean image plane. In this paper, we propose a non-iterative
self-calibration algorithm for a camera with Euclidean image plane in case the
remaining three internal parameters --- the focal length and the principal
point coordinates --- are fixed but unknown. The algorithm requires a set of point correspondences in two views and also the measured relative
rotation angle between the views. We show that the problem generically has six
solutions (including complex ones).
The algorithm has been implemented and tested both on synthetic data and on
publicly available real dataset. The experiments demonstrate that the method is
correct, numerically stable and robust.Comment: 13 pages, 7 eps-figure
Recommended from our members
Motion Segmentation - Segmentation of Independently Moving Objects in Video
The ability to recognize motion is one of the most important functions of our visual system. Motion allows us both to recognize objects and to get a better understanding of the 3D world in which we are moving. Because of its importance, motion is used to answer a wide variety of fundamental questions in computer vision such as: (1) Which objects are moving independently in the world? (2) Which objects are close and which objects are far away? (3) How is the camera moving?My work addresses the problem of moving object segmentation in unconstrained videos. I developed a probabilistic approach to segment independently moving objects in a video sequence, connecting aspects of camera motion estimation, relative depth and flow statistics. My work consists of three major parts: Modeling motion using a simple (rigid) motion model strictly following the principles of perspective projection and segmenting the video into its different motion components by assigning each pixel to its most likely motion model in a Bayesian fashion. Combining piecewise rigid motions to more complex, deformable and articulated objects, guided by learned semantic object segmentations. Learning highly variable motion patterns using a neural network trained on synthetic (unlimited) training data. Training data is automatically generated strictly following the principles of perspective projection. In this way well-known geometric constraints are precisely characterized during training to learn the principles of motion segmentation rather than identifying well-known structures that are likely to move.
This work shows that a careful analysis of the motion field not only leads to a consistent segmentation of moving objects in a video sequence, but also helps us understand the scene geometry of the world we are moving in
Fast Multi-frame Stereo Scene Flow with Motion Segmentation
We propose a new multi-frame method for efficiently computing scene flow
(dense depth and optical flow) and camera ego-motion for a dynamic scene
observed from a moving stereo camera rig. Our technique also segments out
moving objects from the rigid scene. In our method, we first estimate the
disparity map and the 6-DOF camera motion using stereo matching and visual
odometry. We then identify regions inconsistent with the estimated camera
motion and compute per-pixel optical flow only at these regions. This flow
proposal is fused with the camera motion-based flow proposal using fusion moves
to obtain the final optical flow and motion segmentation. This unified
framework benefits all four tasks - stereo, optical flow, visual odometry and
motion segmentation leading to overall higher accuracy and efficiency. Our
method is currently ranked third on the KITTI 2015 scene flow benchmark.
Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3
orders of magnitude faster than the top six methods. We also report a thorough
evaluation on challenging Sintel sequences with fast camera and object motion,
where our method consistently outperforms OSF [Menze and Geiger, 2015], which
is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo
Scene Flow Benchmark in November 201
Image-Based View Synthesis
We present a new method for rendering novel images of flexible 3D objects from a small number of example images in correspondence. The strength of the method is the ability to synthesize images whose viewing position is significantly far away from the viewing cone of the example images ("view extrapolation"), yet without ever modeling the 3D structure of the scene. The method relies on synthesizing a chain of "trilinear tensors" that governs the warping function from the example images to the novel image, together with a multi-dimensional interpolation function that synthesizes the non-rigid motions of the viewed object from the virtual camera position. We show that two closely spaced example images alone are sufficient in practice to synthesize a significant viewing cone, thus demonstrating the ability of representing an object by a relatively small number of model images --- for the purpose of cheap and fast viewers that can run on standard hardware
EV-IMO: Motion Segmentation Dataset and Learning Pipeline for Event Cameras
We present the first event-based learning approach for motion segmentation in
indoor scenes and the first event-based dataset - EV-IMO - which includes
accurate pixel-wise motion masks, egomotion and ground truth depth. Our
approach is based on an efficient implementation of the SfM learning pipeline
using a low parameter neural network architecture on event data. In addition to
camera egomotion and a dense depth map, the network estimates pixel-wise
independently moving object segmentation and computes per-object 3D
translational velocities for moving objects. We also train a shallow network
with just 40k parameters, which is able to compute depth and egomotion.
Our EV-IMO dataset features 32 minutes of indoor recording with up to 3 fast
moving objects simultaneously in the camera field of view. The objects and the
camera are tracked by the VICON motion capture system. By 3D scanning the room
and the objects, accurate depth map ground truth and pixel-wise object masks
are obtained, which are reliable even in poor lighting conditions and during
fast motion. We then train and evaluate our learning pipeline on EV-IMO and
demonstrate that our approach far surpasses its rivals and is well suited for
scene constrained robotics applications.Comment: 8 pages, 6 figures. Submitted to 2019 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS 2019
Comparison of in vivo human knee joint kinematics using axodes
The human knee is of particular interest because of its importance in mobility. Pain and stability can be directly related to the motion, or kinematics, of the knee. Many studies have been conducted to quantify human knee kinematics, both in vitro and in vivo. One of the inherent issues with in vivo, skin mounted measurement systems is that they do not account for soft tissue artifact. Compensation for soft tissue artifact has been a difficult challenge for skin mounted tracking systems and has not yet been achieved. Therefore, bone mounted skeletal pins were chosen as the method of gathering kinematic data for this study. Mounting bone pins is not the quintessential method to study motion due to its invasive nature; nevertheless, it provides a great amount of trustworthy, useful insight. Murphy conducted an in vivo experiment to capture the 3D kinematics of the normal human knee. The kinematic data were used to find the Instantaneous Screw Axis or Instantaneous Helical Axes (IHA). If progressive IHA’s are plotted on the same plot, the surface that is created is called the moving axode of the motion. Several degrees of freedom are needed to accurately describe the kinematics of the human knee during normal movement. The current study further analyzes the data that Murphy reported in 1990. The goal is to find an effective way to express kinematic information in a coordinate system-independent manner so that comparison is meaningful and feasible between gait/ROM trials, subjects, and knee repair/replacement methods. Axodes were used to compare knee kinematics, trial to trial, for gait, range of motion (ROM), and pivot step. It was established that 6 independent screws are required to fully describe the motion during gait. Thus, the knee behaves like a 6 DOF mechanism during gait and, therefore, two-, three-, four-, or five-screw system models are insufficient to adequately and uniquely define the screw system. Screw invariants were found to be a viable option of understanding knee kinematics. Axodes were plotted with pre-stance, stance phase, and post-stance phase indicated. Screw invariants, pitch and moment, were plotted as a function of flexion angle
- …