7,356 research outputs found

    Calibration by correlation using metric embedding from non-metric similarities

    Get PDF
    This paper presents a new intrinsic calibration method that allows us to calibrate a generic single-view point camera just by waving it around. From the video sequence obtained while the camera undergoes random motion, we compute the pairwise time correlation of the luminance signal for a subset of the pixels. We show that, if the camera undergoes a random uniform motion, then the pairwise correlation of any pixels pair is a function of the distance between the pixel directions on the visual sphere. This leads to formalizing calibration as a problem of metric embedding from non-metric measurements: we want to find the disposition of pixels on the visual sphere from similarities that are an unknown function of the distances. This problem is a generalization of multidimensional scaling (MDS) that has so far resisted a comprehensive observability analysis (can we reconstruct a metrically accurate embedding?) and a solid generic solution (how to do so?). We show that the observability depends both on the local geometric properties (curvature) as well as on the global topological properties (connectedness) of the target manifold. We show that, in contrast to the Euclidean case, on the sphere we can recover the scale of the points distribution, therefore obtaining a metrically accurate solution from non-metric measurements. We describe an algorithm that is robust across manifolds and can recover a metrically accurate solution when the metric information is observable. We demonstrate the performance of the algorithm for several cameras (pin-hole, fish-eye, omnidirectional), and we obtain results comparable to calibration using classical methods. Additional synthetic benchmarks show that the algorithm performs as theoretically predicted for all corner cases of the observability analysis

    Automatic camera selection for activity monitoring in a multi-camera system for tennis

    Get PDF
    In professional tennis training matches, the coach needs to be able to view play from the most appropriate angle in order to monitor players' activities. In this paper, we describe and evaluate a system for automatic camera selection from a network of synchronised cameras within a tennis sporting arena. This work combines synchronised video streams from multiple cameras into a single summary video suitable for critical review by both tennis players and coaches. Using an overhead camera view, our system automatically determines the 2D tennis-court calibration resulting in a mapping that relates a player's position in the overhead camera to their position and size in another camera view in the network. This allows the system to determine the appearance of a player in each of the other cameras and thereby choose the best view for each player via a novel technique. The video summaries are evaluated in end-user studies and shown to provide an efficient means of multi-stream visualisation for tennis player activity monitoring

    A group-theoretic approach to formalizing bootstrapping problems

    Get PDF
    The bootstrapping problem consists in designing agents that learn a model of themselves and the world, and utilize it to achieve useful tasks. It is different from other learning problems as the agent starts with uninterpreted observations and commands, and with minimal prior information about the world. In this paper, we give a mathematical formalization of this aspect of the problem. We argue that the vague constraint of having "no prior information" can be recast as a precise algebraic condition on the agent: that its behavior is invariant to particular classes of nuisances on the world, which we show can be well represented by actions of groups (diffeomorphisms, permutations, linear transformations) on observations and commands. We then introduce the class of bilinear gradient dynamics sensors (BGDS) as a candidate for learning generic robotic sensorimotor cascades. We show how framing the problem as rejection of group nuisances allows a compact and modular analysis of typical preprocessing stages, such as learning the topology of the sensors. We demonstrate learning and using such models on real-world range-finder and camera data from publicly available datasets

    Learned Multi-Patch Similarity

    Full text link
    Estimating a depth map from multiple views of a scene is a fundamental task in computer vision. As soon as more than two viewpoints are available, one faces the very basic question how to measure similarity across >2 image patches. Surprisingly, no direct solution exists, instead it is common to fall back to more or less robust averaging of two-view similarities. Encouraged by the success of machine learning, and in particular convolutional neural networks, we propose to learn a matching function which directly maps multiple image patches to a scalar similarity score. Experiments on several multi-view datasets demonstrate that this approach has advantages over methods based on pairwise patch similarity.Comment: 10 pages, 7 figures, Accepted at ICCV 201

    Per-Pixel Calibration for RGB-Depth Natural 3D Reconstruction on GPU

    Get PDF
    Ever since the Kinect brought low-cost depth cameras into consumer market, great interest has been invigorated into Red-Green-Blue-Depth (RGBD) sensors. Without calibration, a RGBD camera’s horizontal and vertical field of view (FoV) could help generate 3D reconstruction in camera space naturally on graphics processing unit (GPU), which however is badly deformed by the lens distortions and imperfect depth resolution (depth distortion). The camera’s calibration based on a pinhole-camera model and a high-order distortion removal model requires a lot of calculations in the fragment shader. In order to get rid of both the lens distortion and the depth distortion while still be able to do simple calculations in the GPU fragment shader, a novel per-pixel calibration method with look-up table based 3D reconstruction in real-time is proposed, using a rail calibration system. This rail calibration system offers possibilities of collecting infinite calibrating points of dense distributions that can cover all pixels in a sensor, such that not only lens distortions, but depth distortion can also be handled by a per-pixel D to ZW mapping. Instead of utilizing the traditional pinhole camera model, two polynomial mapping models are employed. One is a two-dimensional high-order polynomial mapping from R/C to XW=YW respectively, which handles lens distortions; and the other one is a per-pixel linear mapping from D to ZW, which can handle depth distortion. With only six parameters and three linear equations in the fragment shader, the undistorted 3D world coordinates (XW, YW, ZW) for every single pixel could be generated in real-time. The per-pixel calibration method could be applied universally on any RGBD cameras. With the alignment of RGB values using a pinhole camera matrix, it could even work on a combination of a random Depth sensor and a random RGB sensor

    Real-time refocusing using an FPGA-based standard plenoptic camera

    Get PDF
    Plenoptic cameras are receiving increased attention in scientific and commercial applications because they capture the entire structure of light in a scene, enabling optical transforms (such as focusing) to be applied computationally after the fact, rather than once and for all at the time a picture is taken. In many settings, real-time inter active performance is also desired, which in turn requires significant computational power due to the large amount of data required to represent a plenoptic image. Although GPUs have been shown to provide acceptable performance for real-time plenoptic rendering, their cost and power requirements make them prohibitive for embedded uses (such as in-camera). On the other hand, the computation to accomplish plenoptic rendering is well structured, suggesting the use of specialized hardware. Accordingly, this paper presents an array of switch-driven finite impulse response filters, implemented with FPGA to accomplish high-throughput spatial-domain rendering. The proposed architecture provides a power-efficient rendering hardware design suitable for full-video applications as required in broadcasting or cinematography. A benchmark assessment of the proposed hardware implementation shows that real-time performance can readily be achieved, with a one order of magnitude performance improvement over a GPU implementation and three orders ofmagnitude performance improvement over a general-purpose CPU implementation

    FVV Live: A real-time free-viewpoint video system with consumer electronics hardware

    Full text link
    FVV Live is a novel end-to-end free-viewpoint video system, designed for low cost and real-time operation, based on off-the-shelf components. The system has been designed to yield high-quality free-viewpoint video using consumer-grade cameras and hardware, which enables low deployment costs and easy installation for immersive event-broadcasting or videoconferencing. The paper describes the architecture of the system, including acquisition and encoding of multiview plus depth data in several capture servers and virtual view synthesis on an edge server. All the blocks of the system have been designed to overcome the limitations imposed by hardware and network, which impact directly on the accuracy of depth data and thus on the quality of virtual view synthesis. The design of FVV Live allows for an arbitrary number of cameras and capture servers, and the results presented in this paper correspond to an implementation with nine stereo-based depth cameras. FVV Live presents low motion-to-photon and end-to-end delays, which enables seamless free-viewpoint navigation and bilateral immersive communications. Moreover, the visual quality of FVV Live has been assessed through subjective assessment with satisfactory results, and additional comparative tests show that it is preferred over state-of-the-art DIBR alternatives

    Temporally coherent 4D reconstruction of complex dynamic scenes

    Get PDF
    This paper presents an approach for reconstruction of 4D temporally coherent models of complex dynamic scenes. No prior knowledge is required of scene structure or camera calibration allowing reconstruction from multiple moving cameras. Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects. Temporal coherence is exploited to overcome visual ambiguities resulting in improved reconstruction of complex scenes. Robust joint segmentation and reconstruction of dynamic objects is achieved by introducing a geodesic star convexity constraint. Comparative evaluation is performed on a variety of unstructured indoor and outdoor dynamic scenes with hand-held cameras and multiple people. This demonstrates reconstruction of complete temporally coherent 4D scene models with improved nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016 . Video available at: https://www.youtube.com/watch?v=bm_P13_-Ds

    General Dynamic Scene Reconstruction from Multiple View Video

    Get PDF
    This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. Existing techniques for dynamic scene reconstruction from multiple wide-baseline camera views primarily focus on accurate reconstruction in controlled environments, where the cameras are fixed and calibrated and background is known. These approaches are not robust for general dynamic scenes captured with sparse moving cameras. Previous approaches for outdoor dynamic scene reconstruction assume prior knowledge of the static background appearance and structure. The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras. Evaluation is performed on a variety of indoor and outdoor scenes with cluttered backgrounds and multiple dynamic non-rigid objects such as people. Comparison with state-of-the-art approaches demonstrates improved accuracy in both multiple view segmentation and dense reconstruction. The proposed approach also eliminates the requirement for prior knowledge of scene structure and appearance

    Design and construction of a configurable full-field range imaging system for mobile robotic applications

    Get PDF
    Mobile robotic devices rely critically on extrospection sensors to determine the range to objects in the robot’s operating environment. This provides the robot with the ability both to navigate safely around obstacles and to map its environment and hence facilitate path planning and navigation. There is a requirement for a full-field range imaging system that can determine the range to any obstacle in a camera lens’ field of view accurately and in real-time. This paper details the development of a portable full-field ranging system whose bench-top version has demonstrated sub-millimetre precision. However, this precision required non-real-time acquisition rates and expensive hardware. By iterative replacement of components, a portable, modular and inexpensive version of this full-field ranger has been constructed, capable of real-time operation with some (user-defined) trade-off with precision
    • 

    corecore