700 research outputs found

    Image based automatic vehicle damage detection

    No full text
    Automatically detecting vehicle damage using photographs taken at the accident scene is very useful as it can greatly reduce the cost of processing insurance claims, as well as provide greater convenience for vehicle users. An ideal scenario would be where the vehicle user can upload a few photographs of the damaged car taken from a mobile phone and have the dam- age assessment and insurance claim processing done automatically. However, such a solution remains a challenging task due to a number of factors. For a start, the scene of the accident is typically an unknown and uncontrolled outdoor environment with a plethora of factors beyond our control including scene illumination and the presence of surrounding objects which are not known a priori. In addition, since vehicles have very reflective metallic bodies the photographs taken in such an uncontrolled environment can be expected to have a considerable amount of inter object reflection. Therefore, the application of standard computer vision techniques in this context is a very challenging task. Moreover, solving this task opens up a fascinating repertoire of computer vision problems which need to be addressed in the context of a very challenging scenario. This thesis describes research undertaken to address the problem of au- tomatic vehicle damage detection using photographs. A pipeline addressing a vertical slice of the broad problem is considered while focusing on mild vehicle damage detection. We propose to use 3D CAD models of undamaged vehicles which are used to obtain ground truth information in order to infer what the vehicle with mild damage in the photograph should have looked like, if it had not been damaged. To this end, we develop 3D pose estimation algorithms to register an undamaged 3D CAD model over a photograph of the known dam- aged vehicle. We present a 3D pose estimation method using image gradient information of the photograph and the 3D model projection. We show how the 3D model projection at the recovered 3D pose can be used to identify components of a vehicle in the photograph which may have mild damage. In addition, we present a more robust 3D pose estimation method by minimizing a novel illumination invariant distance measure, which is based on a Mahalanobis distance between attributes of the 3D model projection and the pixels in the photograph. In principle, image edges which are not present in the 3D CAD model projection can be considered to be vehicle damage. However, since the vehicle body is very reflective, there is a large amount of inter object reflection in the photograph which may be misclassified as damage. In order to detect image edges caused by inter object reflection, we propose to apply multi-view geometry techniques on two photographs of the vehicle taken from different view points. To this end, we also develop a robust method to obtain reliable point correspondences across the photographs which are dominated by large reflective and mostly homogeneous regions. The performance of the proposed methods are experimentally evaluated on real photographs using 3D CAD models of varying accuracy. We expect that the research presented in this thesis will provide the groundwork for designing an automatic photograph based vehicle damage de- tection system. Moreover, we hope that our method will provide the foundation for interesting future research

    A Voxel-Based Approach for Imaging Voids in Three-Dimensional Point Clouds

    Get PDF
    Geographically accurate scene models have enormous potential beyond that of just simple visualizations in regard to automated scene generation. In recent years, thanks to ever increasing computational efficiencies, there has been significant growth in both the computer vision and photogrammetry communities pertaining to automatic scene reconstruction from multiple-view imagery. The result of these algorithms is a three-dimensional (3D) point cloud which can be used to derive a final model using surface reconstruction techniques. However, the fidelity of these point clouds has not been well studied, and voids often exist within the point cloud. Voids exist in texturally difficult areas, as well as areas where multiple views were not obtained during collection, constant occlusion existed due to collection angles or overlapping scene geometry, or in regions that failed to triangulate accurately. It may be possible to fill in small voids in the scene using surface reconstruction or hole-filling techniques, but this is not the case with larger more complex voids, and attempting to reconstruct them using only the knowledge of the incomplete point cloud is neither accurate nor aesthetically pleasing. A method is presented for identifying voids in point clouds by using a voxel-based approach to partition the 3D space. By using collection geometry and information derived from the point cloud, it is possible to detect unsampled voxels such that voids can be identified. This analysis takes into account the location of the camera and the 3D points themselves to capitalize on the idea of free space, such that voxels that lie on the ray between the camera and point are devoid of obstruction, as a clear line of sight is a necessary requirement for reconstruction. Using this approach, voxels are classified into three categories: occupied (contains points from the point cloud), free (rays from the camera to the point passed through the voxel), and unsampled (does not contain points and no rays passed through the area). Voids in the voxel space are manifested as unsampled voxels. A similar line-of-sight analysis can then be used to pinpoint locations at aircraft altitude at which the voids in the point clouds could theoretically be imaged. This work is based on the assumption that inclusion of more images of the void areas in the 3D reconstruction process will reduce the number of voids in the point cloud that were a result of lack of coverage. Voids resulting from texturally difficult areas will not benefit from more imagery in the reconstruction process, and thus are identified and removed prior to the determination of future potential imaging locations

    Efficient volumetric reconstruction from multiple calibrated cameras

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2005.Includes bibliographical references (p. 137-142).The automatic reconstruction of large scale 3-D models from real images is of significant value to the field of computer vision in the understanding of images. As a consequence, many techniques have emerged to perform scene reconstruction from calibrated images where the position and orientation of the camera are known. Feature based methods using points and lines have enjoyed much success and have been shown to be robust against noise and changing illumination conditions. The models produced by these techniques however, can often appear crude when untextured due to the sparse set of points from which they are created. Other reconstruction methods, such as volumetric techniques, use image pixel intensities rather than features, reconstructing the scene as small volumetric units called voxels. The direct use of pixel values in the images has restricted current methods to operating on scenes with static illumination conditions. Creating a volumetric representation of the scene may also require millions of interdependent voxels which must be efficiently processed. This has limited most techniques to constrained camera locations and small indoor scenes. The primary goal of this thesis is to perform efficient voxel-based reconstruction of urban environments using a large set of pose-instrumented images. In addition to the 3- D scene reconstruction, the algorithm will also generate estimates of surface reflectance and illumination. Designing an algorithm that operates in a discretized 3-D scene space allows for the recovery of intrinsic scene color and for the integration of visibility constraints, while avoiding the pitfalls of image based feature correspondence.(cont.) The algorithm demonstrates how in principle it is possible to reduce computational effort over more naive methods. The algorithm is intended to perform the reconstruction of large scale 3-D models from controlled imagery without human intervention.by Manish Jethwa.Ph.D

    Towards Object-Centric Scene Understanding

    Get PDF
    Visual perception for autonomous agents continues to attract community attention due to the disruptive technologies and the wide applicability of such solutions. Autonomous Driving (AD), a major application in this domain, promises to revolutionize our approach to mobility while bringing critical advantages in limiting accident fatalities. Fueled by recent advances in Deep Learning (DL), more computer vision tasks are being addressed using a learning paradigm. Deep Neural Networks (DNNs) succeeded consistently in pushing performances to unprecedented levels and demonstrating the ability of such approaches to generalize to an increasing number of difficult problems, such as 3D vision tasks. In this thesis, we address two main challenges arising from the current approaches. Namely, the computational complexity of multi-task pipelines, and the increasing need for manual annotations. On the one hand, AD systems need to perceive the surrounding environment on different levels of detail and, subsequently, take timely actions. This multitasking further limits the time available for each perception task. On the other hand, the need for universal generalization of such systems to massively diverse situations requires the use of large-scale datasets covering long-tailed cases. Such requirement renders the use of traditional supervised approaches, despite the data readily available in the AD domain, unsustainable in terms of annotation costs, especially for 3D tasks. Driven by the AD environment nature and the complexity dominated (unlike indoor scenes) by the presence of other scene elements (mainly cars and pedestrians) we focus on the above-mentioned challenges in object-centric tasks. We, then, situate our contributions appropriately in fast-paced literature, while supporting our claims with extensive experimental analysis leveraging up-to-date state-of-the-art results and community-adopted benchmarks

    View generated database

    Get PDF
    This document represents the final report for the View Generated Database (VGD) project, NAS7-1066. It documents the work done on the project up to the point at which all project work was terminated due to lack of project funds. The VGD was to provide the capability to accurately represent any real-world object or scene as a computer model. Such models include both an accurate spatial/geometric representation of surfaces of the object or scene, as well as any surface detail present on the object. Applications of such models are numerous, including acquisition and maintenance of work models for tele-autonomous systems, generation of accurate 3-D geometric/photometric models for various 3-D vision systems, and graphical models for realistic rendering of 3-D scenes via computer graphics

    Forest structure from terrestrial laser scanning – in support of remote sensing calibration/validation and operational inventory

    Get PDF
    Forests are an important part of the natural ecosystem, providing resources such as timber and fuel, performing services such as energy exchange and carbon storage, and presenting risks, such as fire damage and invasive species impacts. Improved characterization of forest structural attributes is desirable, as it could improve our understanding and management of these natural resources. However, the traditional, systematic collection of forest information – dubbed “forest inventory” – is time-consuming, expensive, and coarse when compared to novel 3-D measurement technologies. Remote sensing estimates, on the other hand, provide synoptic coverage, but often fail to capture the fine- scale structural variation of the forest environment. Terrestrial laser scanning (TLS) has demonstrated a potential to address these limitations, but its operational use has remained limited due to unsatisfactory performance characteristics vs. budgetary constraints of many end-users. To address this gap, my dissertation advanced affordable mobile laser scanning capabilities for operational forest structure assessment. We developed geometric reconstruction of forest structure from rapid-scan, low-resolution point cloud data, providing for automatic extraction of standard forest inventory metrics. To augment these results over larger areas, we designed a view-invariant feature descriptor to enable marker-free registration of TLS data pairs, without knowledge of the initial sensor pose. Finally, a graph-theory framework was integrated to perform multi-view registration between a network of disconnected scans, which provided improved assessment of forest inventory variables. This work addresses a major limitation related to the inability of TLS to assess forest structure at an operational scale, and may facilitate improved understanding of the phenomenology of airborne sensing systems, by providing fine-scale reference data with which to interpret the active or passive electromagnetic radiation interactions with forest structure. Outputs are being utilized to provide antecedent science data for NASA’s HyspIRI mission and to support the National Ecological Observatory Network’s (NEON) long-term environmental monitoring initiatives

    Learning to Predict Dense Correspondences for 6D Pose Estimation

    Get PDF
    Object pose estimation is an important problem in computer vision with applications in robotics, augmented reality and many other areas. An established strategy for object pose estimation consists of, firstly, finding correspondences between the image and the object’s reference frame, and, secondly, estimating the pose from outlier-free correspondences using Random Sample Consensus (RANSAC). The first step, namely finding correspondences, is difficult because object appearance varies depending on perspective, lighting and many other factors. Traditionally, correspondences have been established using handcrafted methods like sparse feature pipelines. In this thesis, we introduce a dense correspondence representation for objects, called object coordinates, which can be learned. By learning object coordinates, our pose estimation pipeline adapts to various aspects of the task at hand. It works well for diverse object types, from small objects to entire rooms, varying object attributes, like textured or texture-less objects, and different input modalities, like RGB-D or RGB images. The concept of object coordinates allows us to easily model and exploit uncertainty as part of the pipeline such that even repeating structures or areas with little texture can contribute to a good solution. Although we can train object coordinate predictors independent of the full pipeline and achieve good results, training the pipeline in an end-to-end fashion is desirable. It enables the object coordinate predictor to adapt its output to the specificities of following steps in the pose estimation pipeline. Unfortunately, the RANSAC component of the pipeline is non-differentiable which prohibits end-to-end training. Adopting techniques from reinforcement learning, we introduce Differentiable Sample Consensus (DSAC), a formulation of RANSAC which allows us to train the pose estimation pipeline in an end-to-end fashion by minimizing the expectation of the final pose error

    New 3D scanning techniques for complex scenes

    Get PDF
    This thesis presents new 3D scanning methods for complex scenes, such as surfaces with fine-scale geometric details, translucent objects, low-albedo objects, glossy objects, scenes with interreflection, and discontinuous scenes. Starting from the observation that specular reflection is a reliable visual cue for surface mesostructure perception, we propose a progressive acquisition system that captures a dense specularity field as the only information for mesostructure reconstruction. Our method can efficiently recover surfaces with fine-scale geometric details from complex real-world objects. Translucent objects pose a difficult problem for traditional optical-based 3D scanning techniques. We analyze and compare two descattering methods, phaseshifting and polarization, and further present several phase-shifting and polarization based methods for high quality 3D scanning of translucent objects. We introduce the concept of modulation based separation, where a high frequency signal is multiplied on top of another signal. The modulated signal inherits the separation properties of the high frequency signal and allows us to remove artifacts due to global illumination. Thismethod can be used for efficient 3D scanning of scenes with significant subsurface scattering and interreflections.Diese Dissertation präsentiert neuartige Verfahren für die 3D-Digitalisierung komplexer Szenen, wie z.B. Oberflächen mit sehr feinen Strukturen, durchscheinende Objekte, Gegenstände mit geringem Albedo, glänzende Objekte, Szenen mit Lichtinterreflektionen und unzusammenhängende Szenen. Ausgehend von der Beobachtung, daß die spekulare Reflektion ein zuverlässiger, visueller Hinweis für die Mesostruktur einer Oberfläche ist, stellen wir ein progressives Meßsystem vor, um Spekularitätsfelder zu messen. Aus diesen Feldern kann anschließend die Mesostruktur rekonstruiert werden. Mit unserer Methode können Oberflächen mit sehr feinen Strukturen von komplexen, realen Objekten effizient aufgenommen werden. Durchscheinende Objekte stellen ein großes Problem für traditionelle, optischbasierte 3D-Rekonstruktionsmethoden dar. Wir analysieren und vergleichen zwei verschiedene Methoden zum Eliminieren von Lichtstreuung (Descattering): Phasenverschiebung und Polarisation. Weiterhin präsentieren wir mehrere hochqualitative 3D-Rekonstruktionsmethoden für durchscheinende Objekte, die auf Phasenverschiebung und Polarisation basieren. Außerdem führen wir das Konzept der modulationsbasierten Signaltrennung ein. Hierzu wird ein hochfrequentes Signal zu einem anderes Signal multipliziert. Das so modulierte Signal erhält damit die separierenden Eigenschaften des hochfrequenten Signals. Dies erlaubt unsMeßartefakte aufgrund von globalen Beleuchtungseffekten zu vermeiden. Dieses Verfahren kann zum effizienten 3DScannen von Szenen mit durchscheinden Objekten und Interreflektionen benutzt werden

    Differential Tracking through Sampling and Linearizing the Local Appearance Manifold

    Get PDF
    Recovering motion information from input camera image sequences is a classic problem of computer vision. Conventional approaches estimate motion from either dense optical flow or sparse feature correspondences identified across successive image frames. Among other things, performance depends on the accuracy of the feature detection, which can be problematic in scenes that exhibit view-dependent geometric or photometric behaviors such as occlusion, semitransparancy, specularity and curved reflections. Beyond feature measurements, researchers have also developed approaches that directly utilize appearance (intensity) measurements. Such appearance-based approaches eliminate the need for feature extraction and avoid the difficulty of identifying correspondences. However the simplicity of on-line processing of image features is usually traded for complexity in off-line modeling of the appearance function. Because the appearance function is typically very nonlinear, learning it usually requires an impractically large number of training samples. I will present a novel appearance-based framework that can be used to estimate rigid motion in a manner that is computationally simple and does not require global modeling of the appearance function. The basic idea is as follows. An n-pixel image can be considered as a point in an n-dimensional appearance space. When an object in the scene or the camera moves, the image point moves along a low-dimensional appearance manifold. While globally nonlinear, the appearance manifold can be locally linearized using a small number of nearby image samples. This linear approximation of the local appearance manifold defines a mapping between the images and the underlying motion parameters, allowing the motion estimation to be formulated as solving a linear system. I will address three key issues related to motion estimation: how to acquire local appearance samples, how to derive a local linear approximation given appearance samples, and whether the linear approximation is sufficiently close to the real local appearance manifold. In addition I will present a novel approach to motion segmentation that utilizes the same appearance-based framework to classify individual image pixels into groups associated with different underlying rigid motions
    • …
    corecore