145 research outputs found

    Calibration and Metrology Using Still and Video Images

    Get PDF
    Metrology, the measurement of real world metrics, has been investigated extensively in computer vision for many applications. The prevalence of video cameras and sequences has led to the demand for fully automated systems. Most of the existing video metrology methods are simple extensions of still-image algorithms, which have certain limitations, requiring constraints such as parallelism of lines. New techniques are needed in order to achieve accurate results for broader applications. An important preprocessing step and a closely related topic to metrology is calibration using planar patterns. Existing approaches lack exibility and robustness when extended to video sequences. This dissertation advances the state of the art in calibration and video metrology in three directions: (1) the concept of partial rectification is proposed along with new calibration techniques using a circle with diverse types of constraints; (2) new calibration methods for video sequences using planar patterns undergoing planar motion are proposed; and (3) new algorithms to extend video metrology to a wide range of applications are presented. A fully automated system using the new technique has been built for measuring the wheelbases of vehicles

    Circular motion geometry using minimal data

    Full text link

    Relating vanishing points to catadioptric camera calibration

    Get PDF
    This paper presents the analysis and derivation of the geometric relation between vanishing points and camera parameters of central catadioptric camera systems. These vanishing points correspond to the three mutually orthogonal directions of 3D real world coordinate system (i.e. X, Y and Z axes). Compared to vanishing points (VPs) in the perspective projection, the advantages of VPs under central catadioptric projection are that there are normally two vanishing points for each set of parallel lines, since lines are projected to conics in the catadioptric image plane. Also, their vanishing points are usually located inside the image frame. We show that knowledge of the VPs corresponding to XYZ axes from a single image can lead to simple derivation of both intrinsic and extrinsic parameters of the central catadioptric system. This derived novel theory is demonstrated and tested on both synthetic and real data with respect to noise sensitivity

    Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations

    Full text link
    In this paper, we present two fast and interpretable decomposition methods for 2D homography, which are named Similarity-Kernel-Similarity (SKS) and Affine-Core-Affine (ACA) transformations respectively. Under the minimal 44-point configuration, the first and the last similarity transformations in SKS are computed by two anchor points on target and source planes, respectively. Then, the other two point correspondences can be exploited to compute the middle kernel transformation with only four parameters. Furthermore, ACA uses three anchor points to compute the first and the last affine transformations, followed by computation of the middle core transformation utilizing the other one point correspondence. ACA can compute a homography up to a scale with only 8585 floating-point operations (FLOPs), without even any division operations. Therefore, as a plug-in module, ACA facilitates the traditional feature-based Random Sample Consensus (RANSAC) pipeline, as well as deep homography pipelines estimating 44-point offsets. In addition to the advantages of geometric parameterization and computational efficiency, SKS and ACA can express each element of homography by a polynomial of input coordinates (77th degree to 99th degree), extend the existing essential Similarity-Affine-Projective (SAP) decomposition and calculate 2D affine transformations in a unified way. Source codes are released in https://github.com/cscvlab/SKS-Homography

    Euclidean Structure from N>=2 Parallel Circles: Theory and Algorithms

    Get PDF
    International audienceOur problem is that of recovering, in one view, the 2D Euclidean structure, induced by the projections of N parallel circles. This structure is a prerequisite for camera calibration and pose computation. Until now, no general method has been described for N > 2. The main contribution of this work is to state the problem in terms of a system of linear equations to solve.We give a closed-form solution as well as bundle adjustment-like refinements, increasing the technical applicability and numerical stability. Our theoretical approach generalizes and extends all those described in existing works for N = 2 in several respects, as we can treat simultaneously pairs of orthogonal lines and pairs of circles within a unified framework. The proposed algorithm may be easily implemented, using well-known numerical algorithms. Its performance is illustrated by simulations and experiments with real images

    Towards A Self-calibrating Video Camera Network For Content Analysis And Forensics

    Get PDF
    Due to growing security concerns, video surveillance and monitoring has received an immense attention from both federal agencies and private firms. The main concern is that a single camera, even if allowed to rotate or translate, is not sufficient to cover a large area for video surveillance. A more general solution with wide range of applications is to allow the deployed cameras to have a non-overlapping field of view (FoV) and to, if possible, allow these cameras to move freely in 3D space. This thesis addresses the issue of how cameras in such a network can be calibrated and how the network as a whole can be calibrated, such that each camera as a unit in the network is aware of its orientation with respect to all the other cameras in the network. Different types of cameras might be present in a multiple camera network and novel techniques are presented for efficient calibration of these cameras. Specifically: (i) For a stationary camera, we derive new constraints on the Image of the Absolute Conic (IAC). These new constraints are shown to be intrinsic to IAC; (ii) For a scene where object shadows are cast on a ground plane, we track the shadows on the ground plane cast by at least two unknown stationary points, and utilize the tracked shadow positions to compute the horizon line and hence compute the camera intrinsic and extrinsic parameters; (iii) A novel solution to a scenario where a camera is observing pedestrians is presented. The uniqueness of formulation lies in recognizing two harmonic homologies present in the geometry obtained by observing pedestrians; (iv) For a freely moving camera, a novel practical method is proposed for its self-calibration which even allows it to change its internal parameters by zooming; and (v) due to the increased application of the pan-tilt-zoom (PTZ) cameras, a technique is presented that uses only two images to estimate five camera parameters. For an automatically configurable multi-camera network, having non-overlapping field of view and possibly containing moving cameras, a practical framework is proposed that determines the geometry of such a dynamic camera network. It is shown that only one automatically computed vanishing point and a line lying on any plane orthogonal to the vertical direction is sufficient to infer the geometry of a dynamic network. Our method generalizes previous work which considers restricted camera motions. Using minimal assumptions, we are able to successfully demonstrate promising results on synthetic as well as on real data. Applications to path modeling, GPS coordinate estimation, and configuring mixed-reality environment are explored

    텍슀튞와 íŠč징점 Ʞ반의 ëȘ©ì í•šìˆ˜ 씜적화넌 읎용한 ëŹžì„œì™€ 텍슀튞 평활화 êž°ëȕ

    Get PDF
    í•™ìœ„ë…ŒëŹž (ë°•ì‚Ź)-- 서욞대학ꔐ 대학원 : ì „êž°Â·ì»Ží“ší„°êł”í•™ë¶€, 2014. 8. ìĄ°ë‚šì”.There are many techniques and applications that detect and recognize text information in the images, e.g., document retrieval using the camera-captured document image, book reader for visually impaired, and augmented reality based on text recognition. In these applications, the planar surfaces which contain the text are often distorted in the captured image due to the perspective view (e.g., road signs), curvature (e.g., unfolded books), and wrinkles (e.g., old documents). Specifically, recovering the original document texture by removing these distortions from the camera-captured document images is called the document rectification. In this dissertation, new text surface rectification algorithms are proposed, for improving text recognition accuracy and visual quality. The proposed methods are categorized into 3 types depending on the types of the input. The contributions of the proposed methods can be summarized as follows. In the first rectification algorithm, the dense text-lines in the documents are employed to rectify the images. Unlike the conventional approaches, the proposed method does not directly use the text-line. Instead, the proposed method use the discrete representation of text-lines and text-blocks which are the sets of connected components. Also, the geometric distortion caused by page curl and perspective view are modeled as generalized cylindrical surfaces and camera rotation respectively. With these distortion model and discrete representation of the features, a cost function whose minimization yields parameters of the distortion model is developed. In the cost function, the properties of the pages such as text-block alignment, line-spacing, and the straightness of text-lines are encoded. By describing the text features using the sets of discrete points, the cost function can be easily defined and well solved by Levenberg-Marquadt algorithm. Experiments show that the proposed method works well for the various layouts and curved surfaces, and compares favorably with the conventional methods on the standard dataset. The second algorithm is a unified framework to rectify and stitch multiple document images using visual feature points instead of text lines. This is similar to the method employed in general image stitching algorithm. However, the general image stitching algorithm usually assumes fixed center of camera, which is not taken for granted in capturing the document. To deal with the camera motion between images, a new parametric family of motion model is proposed in this dissertation. Besides, to remove the ambiguity in the reference plane, a new cost function is developed to impose the constraints on the reference plane. This enables the estimation of physically correct reference plane without prior knowledge. The estimated reference plane can also be used to rectify the stitching result. Furthermore, the proposed method can be applied to any other planar object such as building facades or mural paintings as well as the camera-captured document image since it employs the general features. The third rectification method is based on scene text detection algorithm, which is independent from the language model. The conventional methods assume that a character consists of a single connected component (CC) like English alphabet. However, this assumption is brittle in the Asian characters such as Korean, Chinese, and Japanese, where a single character consists of several CCs. Therefore, it is difficult to divide CCs into text lines without language model. To alleviate this problem, the proposed method clusters the candidate regions based on the similarity measure considering inter-character relation. The adjacency measure is trained on the data set labeled with the bounding box of text region. Non-text regions that remain after clustering are filtered out in text/non-text classification step. Final text regions are merged or divided into each text line considering the orientation and location. The detected text is rectified using the orientation of text-line and vertical strokes. The proposed method outperforms state-of-the-art algorithms in English as well as Asian characters in the extensive experiments.1 Introduction 1 1.1 Document rectification via text-line based optimization . . . . . . . 2 1.2 A unified approach of rectification and stitching for document images 4 1.3 Rectification via scene text detection . . . . . . . . . . . . . . . . . . 5 1.4 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Related work 9 2.1 Document rectification . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Document dewarping without text-lines . . . . . . . . . . . . 9 2.1.2 Document dewarping with text-lines . . . . . . . . . . . . . . 10 2.1.3 Text-block identification and text-line extraction . . . . . . . 11 2.2 Document stitching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Scene text detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Document rectification based on text-lines 15 3.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Image acquisition model . . . . . . . . . . . . . . . . . . . . . 16 3.1.2 Proposed approach to document dewarping . . . . . . . . . . 18 3.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 22 3.2.1 Design of Estr(·) . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Minimization of Estr(·) . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Alignment type classification . . . . . . . . . . . . . . . . . . 28 3.2.4 Design of Ealign(·) . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.5 Design of Espacing(·) . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Extension to unfolded book surfaces . . . . . . . . . . . . . . . . . . 32 3.4 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4.1 Experiments on synthetic data . . . . . . . . . . . . . . . . . 36 3.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 39 3.4.3 Comparison with existing methods . . . . . . . . . . . . . . . 43 3.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Document rectification based on feature detection 49 4.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 51 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 Homography between the i-th image and E . . . . . . . . . 52 4.2.3 Proposed cost function . . . . . . . . . . . . . . . . . . . . . . 53 4.2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.5 Relation to the model in [17] . . . . . . . . . . . . . . . . . . 55 4.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.1 Classification of two cases . . . . . . . . . . . . . . . . . . . . 56 4.3.2 Skew removal . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.1 Quantitative evaluation on metric reconstruction performance 57 4.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 58 5 Scene text detection and rectification 67 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.2 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Candidate region detection . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.1 CC extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.2 Computation of similarity between CCs . . . . . . . . . . . . 70 5.2.3 CC clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3 Rectification of candidate region . . . . . . . . . . . . . . . . . . . . 73 5.4 Text/non-text classification . . . . . . . . . . . . . . . . . . . . . . . 76 5.5 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.1 Experimental results on ICDAR 2011 dataset . . . . . . . . . 80 5.5.2 Experimental results on the Asian character dataset . . . . . 80 6 Conclusion 83 Bibliography 87 Abstract (Korean) 97Docto

    Efficient 2D SLAM for a Mobile Robot with a Downwards Facing Camera

    Get PDF
    As digital cameras become cheaper and better, computers more powerful, and robots more abundant the merging of these three techniques also becomes more common and capable. The combination of these techniques is often inspired by the human visual system and often strives to give machines the same capabilities that humans already have, such as object identification, navigation, limb coordination, and event detection. One such field that is particularly popular is that of SLAM, or Simultaneous Localization and Mapping, which has high-profile applications in self-driving cars and delivery drones. This thesis proposes and describes an online SLAM algorithm for a specific scenario: that of a robot with a downwards facing camera exploring a flat surface (e.g., a floor). The method is based on building homographies from robot odometry data, which are then used to rectify the images so that the tilt of the camera with regards to the floor is eliminated, thereby moving the problem from 3D to 2D. The 2D pose of the robot in the plane is estimated using registrations of SURF features, and then a bundle adjustment algorithm is used to consolidate the most recent measurements with the older ones in order to optimize the map. The algorithm is implemented and tested with an AR.Drone 2.0 quadcopter. The results are mixed, but hardware seems to be the limiting factor: the algorithm performs well and runs at 5-20 Hz on a i5 desktop computer; but the bad quality, high compression and low resolution of the drone’s bottom camera makes the algorithm unstable and this cannot be overcome, even with several tiers of outlier filtering.För att robotar skall vara praktiska behöver de ha en flexibel uppfattning om sin omgivning och deras egen position i den, men de metoder som finns för detta idag Ă€r ofta vĂ€ldigt krĂ€vande. I det hĂ€r projektet har en förenklad metod för kartlĂ€ggning i realtid med en drönare utvecklats. Algoritmen behandlar ett enklare problem Ă€n de vanliga tredimensionella problemen - istĂ€llet för att titta framĂ„t i rummet tittar drönaren nerĂ„t och försöker bygga en karta genom att pussla ihop bilder av golvet. Metoden Ă€r effektiv, men kvalitĂ©n pĂ„ drönarens kamera som anvĂ€ndes Ă€r för dĂ„lig för att metoden skall ge pĂ„litliga resultat

    Ground plane rectification from crowd motion

    Get PDF
    This work focuses on the estimation of the ground-plane parameters needed to rectify and reconstruct crowded pedestrian scenes, projected into 2D by an uncalibrated, monocular camera. Deformities introduced during the imaging process affect metrics such as size, velocity and distance, which are often useful when examining the behaviour of agents within the scene. A framework is presented to reverse “perspective distortion” by calculating the “groundplane”, upon which motion within the scene occurs. Existing methods use geometric features, such as parallel lines, or objects of known size, such as the height of individuals in the scene; however these features are often unavailable in densely crowded scenes due to occlusions. By measuring only the imaged velocity of tracked features, assumed to be constant in the world, the issue of occlusion can be largely overcome. A novel framework is presented for estimation of the ground-plane and camera focal-length for scenes modelled with a single plane. The above assumption is validated against simulations, outperforming an existing technique [12] against real-world benchmark data. This framework is extended into a two-plane world and the additional challenge of determining the respective topology of the planes is introduced. Several methods for locating the intersection-line between the two planes are evaluated on simulations, with the effect of variation in velocity and the height of tracked features on reconstruction accuracy being investigated, with the results indicating this technique is suitable in real-world conditions. This framework is generalised, removing the need for prior knowledge of the number of planes. The problem is reformulated as a linear-series of planes, each connected by a single hinge, allowing the calculation of a single rotation for each new plane. Again, results are shown against simulations on scenes of varying complexity, as well as realworld datasets validating the success of this method given realistic variations in velocity
