175,402 research outputs found

    Affine multi-view modelling for close range object measurement

    Get PDF
    In photogrammetry, sensor modelling with 3D point estimation is a fundamental topic of research. Perspective frame cameras offer the mathematical basis for close range modelling approaches. The norm is to employ robust bundle adjustments for simultaneous parameter estimation and 3D object measurement. In 2D to 3D modelling strategies image resolution, scale, sampling and geometric distortion are prior factors. Non-conventional image geometries that implement uncalibrated cameras are established in computer vision approaches; these aim for fast solutions at the expense of precision. The projective camera is defined in homogeneous terms and linear algorithms are employed. An attractive sensor model disembodied from projective distortions is the affine. Affine modelling has been studied in the contexts of geometry recovery, feature detection and texturing in vision, however multi-view approaches for precise object measurement are not yet widely available. This project investigates affine multi-view modelling from a photogrammetric standpoint. A new affine bundle adjustment system has been developed for point-based data observed in close range image networks. The system allows calibration, orientation and 3D point estimation. It is processed as a least squares solution with high redundancy providing statistical analysis. Starting values are recovered from a combination of implicit perspective and explicit affine approaches. System development focuses on retrieval of orientation parameters, 3D point coordinates and internal calibration with definition of system datum, sensor scale and radial lens distortion. Algorithm development is supported with method description by simulation. Initialization and implementation are evaluated with the statistical indicators, algorithm convergence and correlation of parameters. Object space is assessed with evaluation of the 3D point correlation coefficients and error ellipsoids. Sensor scale is checked with comparison of camera systems utilizing quality and accuracy metrics. For independent method evaluation, testing is implemented over a perspective bundle adjustment tool with similar indicators. Test datasets are initialized from precise reference image networks. Real affine image networks are acquired with an optical system (~1M pixel CCD cameras with 0.16x telecentric lens). Analysis of tests ascertains that the affine method results in an RMS image misclosure at a sub-pixel level and precisions of a few tenths of microns in object space

    Adding the Third Dimension to Spatial Relation Detection in 2D Images

    Get PDF
    Detection of spatial relations between objects in images is currently a popular subject in image description research. A range of different language and geometric object features have been used in this context, but methods have not so far used explicit information about the third dimension (depth), except when manually added to annotations. The lack of such information hampers detection of spatial relations that are inherently 3D. In this paper, we use a fully automatic method for creating a depth map of an image and derive several different object-level depth features from it which we add to an existing feature set to test the effect on spatial relation detection. We show that performance increases are obtained from adding depth features in all scenarios tested

    The application of range imaging for improved local feature representations

    Get PDF
    This thesis presents an investigation into the integration of information extracted from co-aligned range and intensity images to achieve pose invariant object recognition. Local feature matching is a fundamental technique in image analysis that underpins many computer vision-based applications; the approach comprises identifying a collection of interest points in an image, characterising the local image region surrounding the interest point by means of a descriptor, and matching these descriptors between example images. Such local feature descriptors are formed from a measure of the local image statistics in the region surrounding the interest point. The interest point locations and the means of measuring local image statistics should be chosen such that resultant descriptor remains stable across a range of common image transformations. Recently the availability of low cost, high quality range imaging devices has motivated an interest in local feature extraction from range images. It has been widely assumed in the vision community that the range imaging domain has properties which remain quasi-invariant through a wide range of changes in illumination and pose. Accordingly, it has been suggested that local feature extraction in the range domain should allow the calculation of local feature descriptors that are potentially more robust than those calculated from the intensity imaging domain alone. However, range images represent differing characteristics from those represented within intensity images which are frequently used, independently from range images, to create robust local features. Therefore, this work attempts to establish the best means of combining information from these two imaging modalities to further increase the reliability of matching local features. Local feature extraction comprises a series of processes applied to an image location such that a collection of repeatable descriptors can be established. By using co-aligned range and intensity images this work investigates the choice of modality and method for each step in the extraction process as an approach to optimising the resulting descriptor. Additionally, multimodal features are formed by combining information from both domains in a single stage in the extraction process. To further improve the quality of feature descriptors, a calculation of the surface normals and a use of the 3D structure from the range image are applied to correct the 3D appearance of a local sample patch, thereby increasing the similarity between observations. The matching performance of local features is evaluated using an experimental setup comprising a turntable and stereo pair of cameras. This experimental setup is used to create a database of intensity and range images for 5 objects imaged at 72 calibrated viewpoints, creating a database of 360 object observations. The use of a calibrated turntable in combination with the 3D object surface coordiantes, supplied by the range image allow location correspondences between object observations to be established; and therefore descriptor matches to be labelled as either true positive or false positive. Applying this methodology to the formulated local features show that two approaches demonstrate state-of-the-art performance, with a ~40% increase in area under ROC curve at a False Positive Rate of 10% when compared with standard SIFT. These approaches are range affine corrected intensity SIFT and element corrected surface gradients SIFT. Furthermore,this work uses the 3D structure encoded in the range image to organise collections of interest points from a series of observations into a collection of canonical views in a new model local feature. The canonical views for a interest point are stored in a view compartmentalised structure which allows the appearance of a local interest point to be characterised across the view sphere. Each canonical view is assigned a confidence measure based on the 3D pose of the interest point at observation, this confidence measure is then used to match similar canonical views of model and query interest points thereby achieving a pose invariant interest point description. This approach does not produce a statistically significant performance increase. However, does contribute a validated methodology for combining multiple descriptors with differing confidence weightings into a single keypoint

    Reverse-engineering of architectural buildings based on an hybrid modeling approach

    Get PDF
    We thank MENSI and REALVIZ companies for their helpful comments and the following people for providing us images from their works: Francesca De Domenico (Fig. 1), Kyung-Tae Kim (Fig. 9). The CMN (French national center of patrimony buildings) is also acknowledged for the opportunity given to demonstrate our approach on the Hotel de Sully in Paris. We thank Tudor Driscu for his help on the English translation.This article presents a set of theoretical reflections and technical demonstrations that constitute a new methodological base for the architectural surveying and representation using computer graphics techniques. The problem we treated relates to three distinct concerns: the surveying of architectural objects, the construction and the semantic enrichment of their geometrical models, and their handling for the extraction of dimensional information. A hybrid approach to 3D reconstruction is described. This new approach combines range-based modeling and image-based modeling techniques; it integrates the concept of architectural feature-based modeling. To develop this concept set up a first process of extraction and formalization of architectural knowledge based on the analysis of architectural treaties is carried on. Then, the identified features are used to produce a template shape library. Finally the problem of the overall model structure and organization is addressed

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

    T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects

    Full text link
    We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.Comment: WACV 201

    Disparity map generation based on trapezoidal camera architecture for multiview video

    Get PDF
    Visual content acquisition is a strategic functional block of any visual system. Despite its wide possibilities, the arrangement of cameras for the acquisition of good quality visual content for use in multi-view video remains a huge challenge. This paper presents the mathematical description of trapezoidal camera architecture and relationships which facilitate the determination of camera position for visual content acquisition in multi-view video, and depth map generation. The strong point of Trapezoidal Camera Architecture is that it allows for adaptive camera topology by which points within the scene, especially the occluded ones can be optically and geometrically viewed from several different viewpoints either on the edge of the trapezoid or inside it. The concept of maximum independent set, trapezoid characteristics, and the fact that the positions of cameras (with the exception of few) differ in their vertical coordinate description could very well be used to address the issue of occlusion which continues to be a major problem in computer vision with regards to the generation of depth map
    corecore