363 research outputs found
On the Two-View Geometry of Unsynchronized Cameras
We present new methods for simultaneously estimating camera geometry and time
shift from video sequences from multiple unsynchronized cameras. Algorithms for
simultaneous computation of a fundamental matrix or a homography with unknown
time shift between images are developed. Our methods use minimal correspondence
sets (eight for fundamental matrix and four and a half for homography) and
therefore are suitable for robust estimation using RANSAC. Furthermore, we
present an iterative algorithm that extends the applicability on sequences
which are significantly unsynchronized, finding the correct time shift up to
several seconds. We evaluated the methods on synthetic and wide range of real
world datasets and the results show a broad applicability to the problem of
camera synchronization.Comment: 12 pages, 9 figures, Computer Vision and Pattern Recognition (CVPR)
201
Automatic alignment of surgical videos using kinematic data
Over the past one hundred years, the classic teaching methodology of "see
one, do one, teach one" has governed the surgical education systems worldwide.
With the advent of Operation Room 2.0, recording video, kinematic and many
other types of data during the surgery became an easy task, thus allowing
artificial intelligence systems to be deployed and used in surgical and medical
practice. Recently, surgical videos has been shown to provide a structure for
peer coaching enabling novice trainees to learn from experienced surgeons by
replaying those videos. However, the high inter-operator variability in
surgical gesture duration and execution renders learning from comparing novice
to expert surgical videos a very difficult task. In this paper, we propose a
novel technique to align multiple videos based on the alignment of their
corresponding kinematic multivariate time series data. By leveraging the
Dynamic Time Warping measure, our algorithm synchronizes a set of videos in
order to show the same gesture being performed at different speed. We believe
that the proposed approach is a valuable addition to the existing learning
tools for surgery.Comment: Accepted at AIME 201
Circulant temporal encoding for video retrieval and temporal alignment
We address the problem of specific video event retrieval. Given a query video
of a specific event, e.g., a concert of Madonna, the goal is to retrieve other
videos of the same event that temporally overlap with the query. Our approach
encodes the frame descriptors of a video to jointly represent their appearance
and temporal order. It exploits the properties of circulant matrices to
efficiently compare the videos in the frequency domain. This offers a
significant gain in complexity and accurately localizes the matching parts of
videos. The descriptors can be compressed in the frequency domain with a
product quantizer adapted to complex numbers. In this case, video retrieval is
performed without decompressing the descriptors. We also consider the temporal
alignment of a set of videos. We exploit the matching confidence and an
estimate of the temporal offset computed for all pairs of videos by our
retrieval approach. Our robust algorithm aligns the videos on a global timeline
by maximizing the set of temporally consistent matches. The global temporal
alignment enables synchronous playback of the videos of a given scene
Reconstruction of the pose of uncalibrated cameras via user-generated videos
Extraction of 3D geometry from hand-held unsteady uncalibrated cameras faces multiple difficulties: finding usable frames, feature-matching and unknown variable focal length to name three. We have built a prototype system to allow a user to spatially navigate playback viewpoints of an event of interest, using geometry automatically recovered from casually captured videos. The system, whose workings we present in this paper, necessarily estimates not only scene geometry, but also relative viewpoint position, overcoming the mentioned difficulties in the process. The only inputs required are video sequences from various viewpoints of a common scene, as are readily available online from sporting and music events. Our methods make no assumption of the synchronization of the input and do not require file metadata, instead exploiting the video to self-calibrate. The footage need only contain some camera rotation with little translation—for hand-held event footage a likely occurrence.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1145/2659021.265902
Flight Dynamics-based Recovery of a UAV Trajectory using Ground Cameras
We propose a new method to estimate the 6-dof trajectory of a flying object
such as a quadrotor UAV within a 3D airspace monitored using multiple fixed
ground cameras. It is based on a new structure from motion formulation for the
3D reconstruction of a single moving point with known motion dynamics. Our main
contribution is a new bundle adjustment procedure which in addition to
optimizing the camera poses, regularizes the point trajectory using a prior
based on motion dynamics (or specifically flight dynamics). Furthermore, we can
infer the underlying control input sent to the UAV's autopilot that determined
its flight trajectory.
Our method requires neither perfect single-view tracking nor appearance
matching across views. For robustness, we allow the tracker to generate
multiple detections per frame in each video. The true detections and the data
association across videos is estimated using robust multi-view triangulation
and subsequently refined during our bundle adjustment procedure. Quantitative
evaluation on simulated data and experiments on real videos from indoor and
outdoor scenes demonstrates the effectiveness of our method
Marker-less motion capture in general scenes with sparse multi-camera setups
Human motion-capture from videos is one of the fundamental problems in computer vision and computer graphics. Its applications can be found in a wide range of industries. Even with all the developments in the past years, industry and academia alike still rely on complex and expensive marker-based systems. Many state-of-the-art marker-less motioncapture methods come close to the performance of marker-based algorithms, but only when recording in highly controlled studio environments with exactly synchronized, static and sufficiently many cameras. While relative to marker-based systems, this yields an easier apparatus with a reduced setup time, the hurdles towards practical application are still large and the costs are considerable. By being constrained to a controlled studio, marker-less methods fail to fully play out their advantage of being able to capture scenes without actively modifying them. In the area of marker-less human motion-capture, this thesis proposes several novel algorithms for simplifying the motion-capture to be applicable in new general outdoor scenes. The first is an optical multi-video synchronization method which achieves subframe accuracy in general scenes. In this step, the synchronization parameters of multiple videos are estimated. Then, we propose a spatio-temporal motion-capture method which uses the synchronization parameters for accurate motion-capture with unsynchronized cameras. Afterwards, we propose a motion capture method that works with moving cameras, where multiple people are tracked even in front of cluttered and dynamic backgrounds with potentially moving cameras. Finally, we reduce the number of cameras employed by proposing a novel motion-capture method which uses as few as two cameras to capture high-quality motion in general environments, even outdoors. The methods proposed in this thesis can be adopted in many practical applications to achieve similar performance as complex motion-capture studios with a few consumer-grade cameras, such as mobile phones or GoPros, even for uncontrolled outdoor scenes.Die videobasierte Bewegungserfassung (Motion Capture) menschlicher Darsteller ist ein fundamentales Problem in Computer Vision und Computergrafik, das in einer Vielzahl von Branchen Anwendung findet. Trotz des Fortschritts der letzten Jahre verlassen sich Wirtschaft und Wissenschaft noch immer auf komplexe und teure markerbasierte Systeme. Viele aktuelle markerlose Motion-Capture-Verfahren kommen der Leistung von markerbasierten Algorithmen nahe, aber nur bei Aufnahmen in stark kontrollierten Studio-Umgebungen mit genügend genau synchronisierten, statischen Kameras. Im Vergleich zu markerbasierten Systemen wird der Aufbau erheblich vereinfacht, was Zeit beim Aufbau spart, aber die Hürden für die praktische Anwendung sind noch immer groß und die Kosten beträchtlich. Durch die Beschränkung auf ein kontrolliertes Studio können markerlose Verfahren nicht vollständig ihren Vorteil ausspielen, Szenen aufzunehmen zu können, ohne sie aktiv zu verändern. Diese Arbeit schlägt mehrere neuartige markerlose Motion-Capture-Verfahren vor, welche die Erfassung menschlicher Darsteller in allgemeinen Außenaufnahmen vereinfachen. Das erste ist ein optisches Videosynchronisierungsverfahren, welches die Synchronisationsparameter mehrerer Videos genauer als die Bildwiederholrate schätzt. Anschließend wird ein Raum-Zeit-Motion-Capture-Verfahren vorgeschlagen, welches die Synchronisationsparameter für präzises Motion Capture mit nicht synchronisierten Kameras verwendet. Außerdem wird ein Motion-Capture-Verfahren für bewegliche Kameras vorgestellt, das mehrere Menschen auch vor unübersichtlichen und dynamischen Hintergründen erfasst. Schließlich wird die Anzahl der erforderlichen Kameras durch ein neues MotionCapture-Verfahren, auf lediglich zwei Kameras reduziert, um Bewegungen qualitativ hochwertig auch in allgemeinen Umgebungen wie im Freien zu erfassen. Die in dieser Arbeit vorgeschlagenen Verfahren können in viele praktische Anwendungen übernommen werden, um eine ähnliche Leistung wie komplexe Motion-Capture-Studios mit lediglich einigen Videokameras der Verbraucherklasse, zum Beispiel Mobiltelefonen oder GoPros, auch in unkontrollierten Außenaufnahmen zu erzielen
- …