3,089 research outputs found
Multi-Scale 3D Scene Flow from Binocular Stereo Sequences
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108
A flexible and versatile studio for synchronized multi-view video recording
In recent years, the convergence of Computer Vision and Computer Graphics has put forth new research areas that work on scene reconstruction from and analysis of multi-view video footage. In free-viewpoint video, for example, new views of a scene are generated from an arbitrary viewpoint in real-time from a set of real multi-view input video streams. The analysis of real-world scenes from multi-view video to extract motion information or reflection models is another field of research that greatly benefits from high-quality input data. Building a recording setup for multi-view video involves a great effort on the hardware as well as the software side. The amount of image data to be processed is huge, a decent lighting and camera setup is essential for a naturalistic scene appearance and robust background subtraction, and the computing infrastructure has to enable real-time processing of the recorded material. This paper describes the recording setup for multi-view video acquisition that enables the synchronized recording of dynamic scenes from multiple camera positions under controlled conditions. The requirements to the room and their implementation in the separate components of the studio are described in detail. The efficiency and flexibility of the room is demonstrated on the basis of the results that we obtain with a real-time 3D scene reconstruction system, a system for non-intrusive optical motion capture and a model-based free-viewpoint video system for human actors.
Multi-body Non-rigid Structure-from-Motion
Conventional structure-from-motion (SFM) research is primarily concerned with
the 3D reconstruction of a single, rigidly moving object seen by a static
camera, or a static and rigid scene observed by a moving camera --in both cases
there are only one relative rigid motion involved. Recent progress have
extended SFM to the areas of {multi-body SFM} (where there are {multiple rigid}
relative motions in the scene), as well as {non-rigid SFM} (where there is a
single non-rigid, deformable object or scene). Along this line of thinking,
there is apparently a missing gap of "multi-body non-rigid SFM", in which the
task would be to jointly reconstruct and segment multiple 3D structures of the
multiple, non-rigid objects or deformable scenes from images. Such a multi-body
non-rigid scenario is common in reality (e.g. two persons shaking hands,
multi-person social event), and how to solve it represents a natural
{next-step} in SFM research. By leveraging recent results of subspace
clustering, this paper proposes, for the first time, an effective framework for
multi-body NRSFM, which simultaneously reconstructs and segments each 3D
trajectory into their respective low-dimensional subspace. Under our
formulation, 3D trajectories for each non-rigid structure can be well
approximated with a sparse affine combination of other 3D trajectories from the
same structure (self-expressiveness). We solve the resultant optimization with
the alternating direction method of multipliers (ADMM). We demonstrate the
efficacy of the proposed framework through extensive experiments on both
synthetic and real data sequences. Our method clearly outperforms other
alternative methods, such as first clustering the 2D feature tracks to groups
and then doing non-rigid reconstruction in each group or first conducting 3D
reconstruction by using single subspace assumption and then clustering the 3D
trajectories into groups.Comment: 21 pages, 16 figure
A middleware for a large array of cameras
Large arrays of cameras are increasingly being employed for producing high quality image sequences needed for motion analysis research. This leads to the logistical problem with coordination and control of a large number of cameras. In this paper, we used a lightweight multi-agent system for coordinating such camera arrays. The agent framework provides more than a remote sensor access API. It allows reconfigurable and transparent access to cameras, as well as software agents capable of intelligent processing. Furthermore, it eases maintenance by encouraging code reuse. Additionally, our agent system includes an automatic discovery mechanism at startup, and multiple language bindings. Performance tests showed the lightweight nature of the framework while validating its correctness and scalability. Two different camera agents were implemented to provide access to a large array of distributed cameras. Correct operation of these camera agents was confirmed via several image processing agents
A middleware for a large array of cameras
Large arrays of cameras are increasingly being employed for producing high quality image sequences needed for motion analysis research. This leads to the logistical problem with coordination and control of a large number of cameras. In this paper, we used a lightweight multi-agent system for coordinating such camera arrays. The agent framework provides more than a remote sensor access API. It allows reconfigurable and transparent access to cameras, as well as software agents capable of intelligent processing. Furthermore, it eases maintenance by encouraging code reuse. Additionally, our agent system includes an automatic discovery mechanism at startup, and multiple language bindings. Performance tests showed the lightweight nature of the framework while validating its correctness and scalability. Two different camera agents were implemented to provide access to a large array of distributed cameras. Correct operation of these camera agents was confirmed via several image processing agents
Identifying First-person Camera Wearers in Third-person Videos
We consider scenarios in which we wish to perform joint scene understanding,
object tracking, activity recognition, and other tasks in environments in which
multiple people are wearing body-worn cameras while a third-person static
camera also captures the scene. To do this, we need to establish person-level
correspondences across first- and third-person videos, which is challenging
because the camera wearer is not visible from his/her own egocentric video,
preventing the use of direct feature matching. In this paper, we propose a new
semi-Siamese Convolutional Neural Network architecture to address this novel
challenge. We formulate the problem as learning a joint embedding space for
first- and third-person videos that considers both spatial- and motion-domain
cues. A new triplet loss function is designed to minimize the distance between
correct first- and third-person matches while maximizing the distance between
incorrect ones. This end-to-end approach performs significantly better than
several baselines, in part by learning the first- and third-person features
optimized for matching jointly with the distance measure itself
- …