59 research outputs found
Flight Dynamics-based Recovery of a UAV Trajectory using Ground Cameras
We propose a new method to estimate the 6-dof trajectory of a flying object
such as a quadrotor UAV within a 3D airspace monitored using multiple fixed
ground cameras. It is based on a new structure from motion formulation for the
3D reconstruction of a single moving point with known motion dynamics. Our main
contribution is a new bundle adjustment procedure which in addition to
optimizing the camera poses, regularizes the point trajectory using a prior
based on motion dynamics (or specifically flight dynamics). Furthermore, we can
infer the underlying control input sent to the UAV's autopilot that determined
its flight trajectory.
Our method requires neither perfect single-view tracking nor appearance
matching across views. For robustness, we allow the tracker to generate
multiple detections per frame in each video. The true detections and the data
association across videos is estimated using robust multi-view triangulation
and subsequently refined during our bundle adjustment procedure. Quantitative
evaluation on simulated data and experiments on real videos from indoor and
outdoor scenes demonstrates the effectiveness of our method
Reconstruction of the pose of uncalibrated cameras via user-generated videos
Extraction of 3D geometry from hand-held unsteady uncalibrated cameras faces multiple difficulties: finding usable frames, feature-matching and unknown variable focal length to name three. We have built a prototype system to allow a user to spatially navigate playback viewpoints of an event of interest, using geometry automatically recovered from casually captured videos. The system, whose workings we present in this paper, necessarily estimates not only scene geometry, but also relative viewpoint position, overcoming the mentioned difficulties in the process. The only inputs required are video sequences from various viewpoints of a common scene, as are readily available online from sporting and music events. Our methods make no assumption of the synchronization of the input and do not require file metadata, instead exploiting the video to self-calibrate. The footage need only contain some camera rotation with little translationâfor hand-held event footage a likely occurrence.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1145/2659021.265902
Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation
3D human pose estimation has been a long-standing challenge in computer
vision and graphics, where multi-view methods have significantly progressed but
are limited by the tedious calibration processes. Existing multi-view methods
are restricted to fixed camera pose and therefore lack generalization ability.
This paper presents a novel Probabilistic Triangulation module that can be
embedded in a calibrated 3D human pose estimation method, generalizing it to
uncalibration scenes. The key idea is to use a probability distribution to
model the camera pose and iteratively update the distribution from 2D features
instead of using camera pose. Specifically, We maintain a camera pose
distribution and then iteratively update this distribution by computing the
posterior probability of the camera pose through Monte Carlo sampling. This
way, the gradients can be directly back-propagated from the 3D pose estimation
to the 2D heatmap, enabling end-to-end training. Extensive experiments on
Human3.6M and CMU Panoptic demonstrate that our method outperforms other
uncalibration methods and achieves comparable results with state-of-the-art
calibration methods. Thus, our method achieves a trade-off between estimation
accuracy and generalizability. Our code is in
https://github.com/bymaths/probabilistic_triangulationComment: 9pages, 5figures, conferenc
SmartMocap: Joint Estimation of Human and Camera Motion using Uncalibrated RGB Cameras
Markerless human motion capture (mocap) from multiple RGB cameras is a widely
studied problem. Existing methods either need calibrated cameras or calibrate
them relative to a static camera, which acts as the reference frame for the
mocap system. The calibration step has to be done a priori for every capture
session, which is a tedious process, and re-calibration is required whenever
cameras are intentionally or accidentally moved. In this paper, we propose a
mocap method which uses multiple static and moving extrinsically uncalibrated
RGB cameras. The key components of our method are as follows. First, since the
cameras and the subject can move freely, we select the ground plane as a common
reference to represent both the body and the camera motions unlike existing
methods which represent bodies in the camera coordinate. Second, we learn a
probability distribution of short human motion sequences (1sec) relative
to the ground plane and leverage it to disambiguate between the camera and
human motion. Third, we use this distribution as a motion prior in a novel
multi-stage optimization approach to fit the SMPL human body model and the
camera poses to the human body keypoints on the images. Finally, we show that
our method can work on a variety of datasets ranging from aerial cameras to
smartphones. It also gives more accurate results compared to the
state-of-the-art on the task of monocular human mocap with a static camera. Our
code is available for research purposes on
https://github.com/robot-perception-group/SmartMocap
Camera Network Calibration and Synchronization from Silhouettes in Archived Video
In this paper we present an automatic method for calibrating a network of cameras that works by analyzing only the motion of silhouettes in the multiple video streams. This is particularly useful for automatic reconstruction of a dynamic event using a camera network in a situation where precalibration of the cameras is impractical or even impossible. The key contribution of this work is a RANSAC-based algorithm that simultaneously computes the epipolar geometry and synchronization of a pair of cameras only from the motion of silhouettes in video. Our approach involves first independently computing the fundamental matrix and synchronization for multiple pairs of cameras in the network. In the next stage the calibration and synchronization for the complete network is recovered from the pairwise information. Finally, a visual-hull algorithm is used to reconstruct the shape of the dynamic object from its silhouettes in video. For unsynchronized video streams with sub-frame temporal offsets, we interpolate silhouettes between successive frames to get more accurate visual hulls. We show the effectiveness of our method by remotely calibrating several different indoor camera networks from archived video streams
Raum-Zeit Interpolationstechniken
The photo-realistic modeling and animation of complex scenes in 3D requires a lot of work and skill of artists even with modern acquisition techniques. This is especially true if the rendering should additionally be performed in real-time. In this thesis we follow another direction in computer graphics to generate photo-realistic results based on recorded video sequences of one or multiple cameras. We propose several methods to handle scenes showing natural phenomena and also multi-view footage of general complex 3D scenes. In contrast to other approaches, we make use of relaxed geometric constraints and focus especially on image properties important to create perceptually plausible in-between images. The results are novel photo-realistic video sequences rendered in real-time allowing for interactive manipulation or to interactively explore novel view and time points.Das Modellieren und die Animation von 3D Szenen in fotorealistischer QualitĂ€t ist sehr arbeitsaufwĂ€ndig, auch wenn moderne Verfahren benutzt werden. Wenn die Bilder in Echtzeit berechnet werden sollen ist diese Aufgabe um so schwieriger zu lösen. In dieser Dissertation verfolgen wir einen alternativen Ansatz der Computergrafik, um neue photorealistische Ergebnisse aus einer oder mehreren aufgenommenen Videosequenzen zu gewinnen. Es werden mehrere Methoden entwickelt die fĂŒr natĂŒrlicher PhĂ€nomene und fĂŒr generelle Szenen einsetzbar sind. Im Unterschied zu anderen Verfahren nutzen wir abgeschwĂ€chte geometrische EinschrĂ€nkungen und berechnen eine genaue Lösung nur dort wo sie wichtig fĂŒr die menschliche Wahrnehmung ist. Die Ergebnisse sind neue fotorealistische Videosequenzen, die in Echtzeit berechnet und interaktiv manipuliert, oder in denen neue Blick- und Zeitpunkte der Szenen frei erkundet werden können
- âŠ