    Shape from periodic texture using the eigenvectors of local affine distortion

    This paper shows how the local slant and tilt angles of regularly textured curved surfaces can be estimated directly, without the need for iterative numerical optimization, We work in the frequency domain and measure texture distortion using the affine distortion of the pattern of spectral peaks. The key theoretical contribution is to show that the directions of the eigenvectors of the affine distortion matrices can be used to estimate local slant and tilt angles of tangent planes to curved surfaces. In particular, the leading eigenvector points in the tilt direction. Although not as geometrically transparent, the direction of the second eigenvector can be used to estimate the slant direction. The required affine distortion matrices are computed using the correspondences between spectral peaks, established on the basis of their energy ordering. We apply the method to a variety of real-world and synthetic imagery

    Computational theory of line drawing interpretation

    The recovery of the three dimensional structure of visible surfaces depicted in an image by emphasizing the role of geometric cues present in line drawings, was studied. Three key components are line classification, line interpretation, and surface interpolation. A model for three dimensional line interpretation and surface orientation was refined and a theory for the recovery of surface shape from surface marking geometry was developed. A new approach to the classification of edges was developed and implemented signatures were deduced for each of several edge types, expressed in terms of correlational properties of the image intensities in the vicinity of the edge. A computer program was developed that evaluates image edges as compared with these prototype signatures

    Relating Multimodal Imagery Data in 3D

    This research develops and improves the fundamental mathematical approaches and techniques required to relate imagery and imagery derived multimodal products in 3D. Image registration, in a 2D sense, will always be limited by the 3D effects of viewing geometry on the target. Therefore, effects such as occlusion, parallax, shadowing, and terrain/building elevation can often be mitigated with even a modest amounts of 3D target modeling. Additionally, the imaged scene may appear radically different based on the sensed modality of interest; this is evident from the differences in visible, infrared, polarimetric, and radar imagery of the same site. This thesis develops a `model-centric\u27 approach to relating multimodal imagery in a 3D environment. By correctly modeling a site of interest, both geometrically and physically, it is possible to remove/mitigate some of the most difficult challenges associated with multimodal image registration. In order to accomplish this feat, the mathematical framework necessary to relate imagery to geometric models is thoroughly examined. Since geometric models may need to be generated to apply this `model-centric\u27 approach, this research develops methods to derive 3D models from imagery and LIDAR data. Of critical note, is the implementation of complimentary techniques for relating multimodal imagery that utilize the geometric model in concert with physics based modeling to simulate scene appearance under diverse imaging scenarios. Finally, the often neglected final phase of mapping localized image registration results back to the world coordinate system model for final data archival are addressed. In short, once a target site is properly modeled, both geometrically and physically, it is possible to orient the 3D model to the same viewing perspective as a captured image to enable proper registration. If done accurately, the synthetic model\u27s physical appearance can simulate the imaged modality of interest while simultaneously removing the 3-D ambiguity between the model and the captured image. Once registered, the captured image can then be archived as a texture map on the geometric site model. In this way, the 3D information that was lost when the image was acquired can be regained and properly related with other datasets for data fusion and analysis

    Sedimentological characterization of Antarctic moraines using UAVs and Structure-from-Motion photogrammetry

    In glacial environments particle-size analysis of moraines provides insights into clast origin, transport history, depositional mechanism and processes of reworking. Traditional methods for grain-size classification are labour-intensive, physically intrusive and are limited to patch-scale (1m2) observation. We develop emerging, high-resolution ground- and unmanned aerial vehicle-based ‘Structure-from-Motion’ (UAV-SfM) photogrammetry to recover grain-size information across an moraine surface in the Heritage Range, Antarctica. SfM data products were benchmarked against equivalent datasets acquired using terrestrial laser scanning, and were found to be accurate to within 1.7 and 50mm for patch- and site-scale modelling, respectively. Grain-size distributions were obtained through digital grain classification, or ‘photo-sieving’, of patch-scale SfM orthoimagery. Photo-sieved distributions were accurate to <2mm compared to control distributions derived from dry sieving. A relationship between patch-scale median grain size and the standard deviation of local surface elevations was applied to a site-scale UAV-SfM model to facilitate upscaling and the production of a spatially continuous map of the median grain size across a 0.3 km2 area of moraine. This highly automated workflow for site scale sedimentological characterization eliminates much of the subjectivity associated with traditional methods and forms a sound basis for subsequent glaciological process interpretation and analysis

    Joint-SRVDNet: Joint Super Resolution and Vehicle Detection Network

    In many domestic and military applications, aerial vehicle detection and super-resolutionalgorithms are frequently developed and applied independently. However, aerial vehicle detection on super-resolved images remains a challenging task due to the lack of discriminative information in the super-resolved images. To address this problem, we propose a Joint Super-Resolution and Vehicle DetectionNetwork (Joint-SRVDNet) that tries to generate discriminative, high-resolution images of vehicles fromlow-resolution aerial images. First, aerial images are up-scaled by a factor of 4x using a Multi-scaleGenerative Adversarial Network (MsGAN), which has multiple intermediate outputs with increasingresolutions. Second, a detector is trained on super-resolved images that are upscaled by factor 4x usingMsGAN architecture and finally, the detection loss is minimized jointly with the super-resolution loss toencourage the target detector to be sensitive to the subsequent super-resolution training. The network jointlylearns hierarchical and discriminative features of targets and produces optimal super-resolution results. Weperform both quantitative and qualitative evaluation of our proposed network on VEDAI, xView and DOTAdatasets. The experimental results show that our proposed framework achieves better visual quality than thestate-of-the-art methods for aerial super-resolution with 4x up-scaling factor and improves the accuracy ofaerial vehicle detection

    A survey of visual preprocessing and shape representation techniques

    Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention)

    Models for Motion Perception

    As observers move through the environment or shift their direction of gaze, the world moves past them. In addition, there may be objects that are moving differently from the static background, either rigid-body motions or nonrigid (e.g., turbulent) ones. This dissertation discusses several models for motion perception. The models rely on first measuring motion energy, a multi-resolution representation of motion information extracted from image sequences. The image flow model combines the outputs of a set of spatiotemporal motion-energy filters to estimate image velocity, consonant with current views regarding the neurophysiology and psychophysics of motion perception. A parallel implementation computes a distributed representation of image velocity that encodes both a velocity estimate and the uncertainty in that estimate. In addition, a numerical measure of image-flow uncertainty is derived. The egomotion model poses the detection of moving objects and the recovery of depth from motion as sensor fusion problems that necessitate combining information from different sensors in the presence of noise and uncertainty. Image sequences are segmented by finding image regions corresponding to entire objects that are moving differently from the stationary background. The turbulent flow model utilizes a fractal-based model of turbulence, and estimates the fractal scaling parameter of fractal image sequences from the outputs of motion-energy filters. Some preliminary results demonstrate the model\u27s potential for discriminating image regions based on fractal scaling

    Statistical Approaches to Inferring Object Shape from Single Images

    Depth inference is a fundamental problem of computer vision with a broad range of potential applications. Monocular depth inference techniques, particularly shape from shading dates back to as early as the 40's when it was first used to study the shape of the lunar surface. Since then there has been ample research to develop depth inference algorithms using monocular cues. Most of these are based on physical models of image formation and rely on a number of simplifying assumptions that do not hold for real world and natural imagery. Very few make use of the rich statistical information contained in real world images and their 3D information. There have been a few notable exceptions though. The study of statistics of natural scenes has been concentrated on outdoor scenes which are cluttered. Statistics of scenes of single objects has been less studied, but is an essential part of daily human interaction with the environment. Inferring shape of single objects is a very important computer vision problem which has captured the interest of many researchers over the past few decades and has applications in object recognition, robotic grasping, fault detection and Content Based Image Retrieval (CBIR). This thesis focuses on studying the statistical properties of single objects and their range images which can benefit shape inference techniques. I acquired two databases: Single Object Range and HDR (SORH) and the Eton Myers Database of single objects, including laser-acquired depth, binocular stereo, photometric stereo and High Dynamic Range (HDR) photography. I took a data driven approach and studied the statistics of color and range images of real scenes of single objects along with whole 3D objects and uncovered some interesting trends in the data. The fractal structure of natural images was previously well known, and thought to be a universal property. However, my research showed that the fractal structure of single objects and surfaces is governed by a wholly different set of rules. Classical computer vision problems of binocular and multi-view stereo, photometric stereo, shape from shading, structure from motion, and others, all rely on accurate and complete models of which 3D shapes and textures are plausible in nature, to avoid producing unlikely outputs. Bayesian approaches are common for these problems, and hopefully the findings on the statistics of the shape of single objects from this work and others will both inform new and more accurate Bayesian priors on shape, and also enable more efficient probabilistic inference procedures

    Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework

    The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications

    Automatic Registration of Optical Aerial Imagery to a LiDAR Point Cloud for Generation of City Models

    This paper presents a framework for automatic registration of both the optical and 3D structural information extracted from oblique aerial imagery to a Light Detection and Ranging (LiDAR) point cloud without prior knowledge of an initial alignment. The framework employs a coarse to fine strategy in the estimation of the registration parameters. First, a dense 3D point cloud and the associated relative camera parameters are extracted from the optical aerial imagery using a state-of-the-art 3D reconstruction algorithm. Next, a digital surface model (DSM) is generated from both the LiDAR and the optical imagery-derived point clouds. Coarse registration parameters are then computed from salient features extracted from the LiDAR and optical imagery-derived DSMs. The registration parameters are further refined using the iterative closest point (ICP) algorithm to minimize global error between the registered point clouds. The novelty of the proposed approach is in the computation of salient features from the DSMs, and the selection of matching salient features using geometric invariants coupled with Normalized Cross Correlation (NCC) match validation. The feature extraction and matching process enables the automatic estimation of the coarse registration parameters required for initializing the fine registration process. The registration framework is tested on a simulated scene and aerial datasets acquired in real urban environments. Results demonstrates the robustness of the framework for registering optical and 3D structural information extracted from aerial imagery to a LiDAR point cloud, when co-existing initial registration parameters are unavailable
