363 research outputs found

    On the Two-View Geometry of Unsynchronized Cameras

    Full text link
    We present new methods for simultaneously estimating camera geometry and time shift from video sequences from multiple unsynchronized cameras. Algorithms for simultaneous computation of a fundamental matrix or a homography with unknown time shift between images are developed. Our methods use minimal correspondence sets (eight for fundamental matrix and four and a half for homography) and therefore are suitable for robust estimation using RANSAC. Furthermore, we present an iterative algorithm that extends the applicability on sequences which are significantly unsynchronized, finding the correct time shift up to several seconds. We evaluated the methods on synthetic and wide range of real world datasets and the results show a broad applicability to the problem of camera synchronization.Comment: 12 pages, 9 figures, Computer Vision and Pattern Recognition (CVPR) 201

    Automatic alignment of surgical videos using kinematic data

    Full text link
    Over the past one hundred years, the classic teaching methodology of "see one, do one, teach one" has governed the surgical education systems worldwide. With the advent of Operation Room 2.0, recording video, kinematic and many other types of data during the surgery became an easy task, thus allowing artificial intelligence systems to be deployed and used in surgical and medical practice. Recently, surgical videos has been shown to provide a structure for peer coaching enabling novice trainees to learn from experienced surgeons by replaying those videos. However, the high inter-operator variability in surgical gesture duration and execution renders learning from comparing novice to expert surgical videos a very difficult task. In this paper, we propose a novel technique to align multiple videos based on the alignment of their corresponding kinematic multivariate time series data. By leveraging the Dynamic Time Warping measure, our algorithm synchronizes a set of videos in order to show the same gesture being performed at different speed. We believe that the proposed approach is a valuable addition to the existing learning tools for surgery.Comment: Accepted at AIME 201

    Circulant temporal encoding for video retrieval and temporal alignment

    Get PDF
    We address the problem of specific video event retrieval. Given a query video of a specific event, e.g., a concert of Madonna, the goal is to retrieve other videos of the same event that temporally overlap with the query. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to efficiently compare the videos in the frequency domain. This offers a significant gain in complexity and accurately localizes the matching parts of videos. The descriptors can be compressed in the frequency domain with a product quantizer adapted to complex numbers. In this case, video retrieval is performed without decompressing the descriptors. We also consider the temporal alignment of a set of videos. We exploit the matching confidence and an estimate of the temporal offset computed for all pairs of videos by our retrieval approach. Our robust algorithm aligns the videos on a global timeline by maximizing the set of temporally consistent matches. The global temporal alignment enables synchronous playback of the videos of a given scene

    Reconstruction of the pose of uncalibrated cameras via user-generated videos

    Get PDF
    Extraction of 3D geometry from hand-held unsteady uncalibrated cameras faces multiple difficulties: finding usable frames, feature-matching and unknown variable focal length to name three. We have built a prototype system to allow a user to spatially navigate playback viewpoints of an event of interest, using geometry automatically recovered from casually captured videos. The system, whose workings we present in this paper, necessarily estimates not only scene geometry, but also relative viewpoint position, overcoming the mentioned difficulties in the process. The only inputs required are video sequences from various viewpoints of a common scene, as are readily available online from sporting and music events. Our methods make no assumption of the synchronization of the input and do not require file metadata, instead exploiting the video to self-calibrate. The footage need only contain some camera rotation with little translation—for hand-held event footage a likely occurrence.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1145/2659021.265902

    Flight Dynamics-based Recovery of a UAV Trajectory using Ground Cameras

    Get PDF
    We propose a new method to estimate the 6-dof trajectory of a flying object such as a quadrotor UAV within a 3D airspace monitored using multiple fixed ground cameras. It is based on a new structure from motion formulation for the 3D reconstruction of a single moving point with known motion dynamics. Our main contribution is a new bundle adjustment procedure which in addition to optimizing the camera poses, regularizes the point trajectory using a prior based on motion dynamics (or specifically flight dynamics). Furthermore, we can infer the underlying control input sent to the UAV's autopilot that determined its flight trajectory. Our method requires neither perfect single-view tracking nor appearance matching across views. For robustness, we allow the tracker to generate multiple detections per frame in each video. The true detections and the data association across videos is estimated using robust multi-view triangulation and subsequently refined during our bundle adjustment procedure. Quantitative evaluation on simulated data and experiments on real videos from indoor and outdoor scenes demonstrates the effectiveness of our method

    Marker-less motion capture in general scenes with sparse multi-camera setups

    Get PDF
    Human motion-capture from videos is one of the fundamental problems in computer vision and computer graphics. Its applications can be found in a wide range of industries. Even with all the developments in the past years, industry and academia alike still rely on complex and expensive marker-based systems. Many state-of-the-art marker-less motioncapture methods come close to the performance of marker-based algorithms, but only when recording in highly controlled studio environments with exactly synchronized, static and sufficiently many cameras. While relative to marker-based systems, this yields an easier apparatus with a reduced setup time, the hurdles towards practical application are still large and the costs are considerable. By being constrained to a controlled studio, marker-less methods fail to fully play out their advantage of being able to capture scenes without actively modifying them. In the area of marker-less human motion-capture, this thesis proposes several novel algorithms for simplifying the motion-capture to be applicable in new general outdoor scenes. The first is an optical multi-video synchronization method which achieves subframe accuracy in general scenes. In this step, the synchronization parameters of multiple videos are estimated. Then, we propose a spatio-temporal motion-capture method which uses the synchronization parameters for accurate motion-capture with unsynchronized cameras. Afterwards, we propose a motion capture method that works with moving cameras, where multiple people are tracked even in front of cluttered and dynamic backgrounds with potentially moving cameras. Finally, we reduce the number of cameras employed by proposing a novel motion-capture method which uses as few as two cameras to capture high-quality motion in general environments, even outdoors. The methods proposed in this thesis can be adopted in many practical applications to achieve similar performance as complex motion-capture studios with a few consumer-grade cameras, such as mobile phones or GoPros, even for uncontrolled outdoor scenes.Die videobasierte Bewegungserfassung (Motion Capture) menschlicher Darsteller ist ein fundamentales Problem in Computer Vision und Computergrafik, das in einer Vielzahl von Branchen Anwendung findet. Trotz des Fortschritts der letzten Jahre verlassen sich Wirtschaft und Wissenschaft noch immer auf komplexe und teure markerbasierte Systeme. Viele aktuelle markerlose Motion-Capture-Verfahren kommen der Leistung von markerbasierten Algorithmen nahe, aber nur bei Aufnahmen in stark kontrollierten Studio-Umgebungen mit genügend genau synchronisierten, statischen Kameras. Im Vergleich zu markerbasierten Systemen wird der Aufbau erheblich vereinfacht, was Zeit beim Aufbau spart, aber die Hürden für die praktische Anwendung sind noch immer groß und die Kosten beträchtlich. Durch die Beschränkung auf ein kontrolliertes Studio können markerlose Verfahren nicht vollständig ihren Vorteil ausspielen, Szenen aufzunehmen zu können, ohne sie aktiv zu verändern. Diese Arbeit schlägt mehrere neuartige markerlose Motion-Capture-Verfahren vor, welche die Erfassung menschlicher Darsteller in allgemeinen Außenaufnahmen vereinfachen. Das erste ist ein optisches Videosynchronisierungsverfahren, welches die Synchronisationsparameter mehrerer Videos genauer als die Bildwiederholrate schätzt. Anschließend wird ein Raum-Zeit-Motion-Capture-Verfahren vorgeschlagen, welches die Synchronisationsparameter für präzises Motion Capture mit nicht synchronisierten Kameras verwendet. Außerdem wird ein Motion-Capture-Verfahren für bewegliche Kameras vorgestellt, das mehrere Menschen auch vor unübersichtlichen und dynamischen Hintergründen erfasst. Schließlich wird die Anzahl der erforderlichen Kameras durch ein neues MotionCapture-Verfahren, auf lediglich zwei Kameras reduziert, um Bewegungen qualitativ hochwertig auch in allgemeinen Umgebungen wie im Freien zu erfassen. Die in dieser Arbeit vorgeschlagenen Verfahren können in viele praktische Anwendungen übernommen werden, um eine ähnliche Leistung wie komplexe Motion-Capture-Studios mit lediglich einigen Videokameras der Verbraucherklasse, zum Beispiel Mobiltelefonen oder GoPros, auch in unkontrollierten Außenaufnahmen zu erzielen
    corecore