1,101 research outputs found

    Fully Automatic Registration of 3D Point Clouds

    Get PDF
    We propose a novel technique for the registration of 3D point clouds which makes very few assumptions: we avoid any manual rough alignment or the use of landmarks, displacement can be arbitrarily large, and the two point sets can have very little overlap. Crude alignment is achieved by estimation of the 3D-rotation from two Extended Gaussian Images even when the data sets inducing them have partial overlap. The technique is based on the correlation of the two EGIs in the Fourier domain and makes use of the spherical and rotational harmonic transforms. For pairs with low overlap which fail a critical verification step, the rotational alignment can be obtained by the alignment of constellation images generated from the EGIs. Rotationally aligned sets are matched by correlation using the Fourier transform of volumetric functions. A fine alignment is acquired in the final step by running Iterative Closest Points with just few iterations

    Learning Equivariant Representations

    Get PDF
    State-of-the-art deep learning systems often require large amounts of data and computation. For this reason, leveraging known or unknown structure of the data is paramount. Convolutional neural networks (CNNs) are successful examples of this principle, their defining characteristic being the shift-equivariance. By sliding a filter over the input, when the input shifts, the response shifts by the same amount, exploiting the structure of natural images where semantic content is independent of absolute pixel positions. This property is essential to the success of CNNs in audio, image and video recognition tasks. In this thesis, we extend equivariance to other kinds of transformations, such as rotation and scaling. We propose equivariant models for different transformations defined by groups of symmetries. The main contributions are (i) polar transformer networks, achieving equivariance to the group of similarities on the plane, (ii) equivariant multi-view networks, achieving equivariance to the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving equivariance to the continuous 3D rotation group, (iv) cross-domain image embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v) spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving equivariance to 3D rotations for spherical vector fields. Applications include image classification, 3D shape classification and retrieval, panoramic image classification and segmentation, shape alignment and pose estimation. What these models have in common is that they leverage symmetries in the data to reduce sample and model complexity and improve generalization performance. The advantages are more significant on (but not limited to) challenging tasks where data is limited or input perturbations such as arbitrary rotations are present

    A Fast and Accurate Algorithm for Spherical Harmonic Analysis on HEALPix Grids with Applications to the Cosmic Microwave Background Radiation

    Get PDF
    The Hierarchical Equal Area isoLatitude Pixelation (HEALPix) scheme is used extensively in astrophysics for data collection and analysis on the sphere. The scheme was originally designed for studying the Cosmic Microwave Background (CMB) radiation, which represents the first light to travel during the early stages of the universe's development and gives the strongest evidence for the Big Bang theory to date. Refined analysis of the CMB angular power spectrum can lead to revolutionary developments in understanding the nature of dark matter and dark energy. In this paper, we present a new method for performing spherical harmonic analysis for HEALPix data, which is a central component to computing and analyzing the angular power spectrum of the massive CMB data sets. The method uses a novel combination of a non-uniform fast Fourier transform, the double Fourier sphere method, and Slevinsky's fast spherical harmonic transform (Slevinsky, 2019). For a HEALPix grid with NN pixels (points), the computational complexity of the method is O(Nlog2N)\mathcal{O}(N\log^2 N), with an initial set-up cost of O(N3/2logN)\mathcal{O}(N^{3/2}\log N). This compares favorably with O(N3/2)\mathcal{O}(N^{3/2}) runtime complexity of the current methods available in the HEALPix software when multiple maps need to be analyzed at the same time. Using numerical experiments, we demonstrate that the new method also appears to provide better accuracy over the entire angular power spectrum of synthetic data when compared to the current methods, with a convergence rate at least two times higher

    Analysing and Enhancing the Coarse Registration Pipeline

    Get PDF
    The current and continual development of sensors and imaging systems capable of acquiring three-dimensional data provides a novel form in which the world can be expressed and examined. The acquisition process, however, is often limited by imaging systems only being able to view a portion of a scene or object from a single pose at a given time. A full representation can still be produced by shifting the system and registering subsequent acquisitions together. While many solutions to the registration problem have been proposed, there is no quintessential approach appropriate for all situations. This dissertation aims to coarsely register range images or point-clouds of a priori unknown pose by matching their overlapping regions. Using spherical harmonics to correlate normals in a coarse registration pipeline has been shown previously to be an effective means for registering partially overlapping point-clouds. The advantage of normals is their translation invariance, which permits the rotation and translation to be decoupled and determined separately. Examining each step of this pipeline in depth allows its registration capability to be quantified and identifies aspects which can be enhanced to further improve registration performance. The pipeline consists of three primary steps: identifying the rotation using spherical harmonics, identifying the translation in the Fourier domain, and automatically verifying if alignment is correct. Having achieved coarse registration, a fine registration algorithm can be used to refine and complete the alignment. Major contributions to knowledge are provided by this dissertation at each step of the pipeline. Point-clouds with known ground-truth are used to examine the pipeline's capability, allowing its limitations to be determined; an analysis which has not been performed previously. This examination allowed modifications to individual components to be introduced and measured, establishing their provided benefit. The rotation step received the greatest attention as it is the primary weakness of the pipeline, especially as the nature of the overlap between point-clouds is unknown. Examining three schemes for binning normals found that equiangular binning, when appropriately normalised, only had a marginal decrease in accuracy with respect to the icosahedron and the introduced Fibonacci schemes. Overall, equiangular binning was the most appropriate due to its natural affinity for fast spherical-harmonic conversion. Weighting normals was found to provide the greatest benefit to registration performance. The introduction of a straightforward method of combining two different weighting schemes using the orthogonality of complex values increased correct alignments by approximately 80% with respect to the next best scheme; additionally, point-cloud pairs with overlap as low as 5% were able to be brought into correct alignment. Transform transitivity, one of two introduced verification strategies, correctly classified almost 100% of point-cloud pair registrations when there are sufficient correct alignments. The enhancements made to the coarse registration pipeline throughout this dissertation provide significant improvements to its performance. The result is a pipeline with state-of-the-art capabilities that allow it to register point-cloud with minimal overlap and correct for alignments that are classified as misaligned. Even with its exceptional performance, it is unlikely that this pipeline has yet reached its pinnacle, as the introduced enhancements have the potential for further development

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    On unifying sparsity and geometry for image-based 3D scene representation

    Get PDF
    Demand has emerged for next generation visual technologies that go beyond conventional 2D imaging. Such technologies should capture and communicate all perceptually relevant three-dimensional information about an environment to a distant observer, providing a satisfying, immersive experience. Camera networks offer a low cost solution to the acquisition of 3D visual information, by capturing multi-view images from different viewpoints. However, the camera's representation of the data is not ideal for common tasks such as data compression or 3D scene analysis, as it does not make the 3D scene geometry explicit. Image-based scene representations fundamentally require a multi-view image model that facilitates extraction of underlying geometrical relationships between the cameras and scene components. Developing new, efficient multi-view image models is thus one of the major challenges in image-based 3D scene representation methods. This dissertation focuses on defining and exploiting a new method for multi-view image representation, from which the 3D geometry information is easily extractable, and which is additionally highly compressible. The method is based on sparse image representation using an overcomplete dictionary of geometric features, where a single image is represented as a linear combination of few fundamental image structure features (edges for example). We construct the dictionary by applying a unitary operator to an analytic function, which introduces a composition of geometric transforms (translations, rotation and anisotropic scaling) to that function. The advantage of this approach is that the features across multiple views can be related with a single composition of transforms. We then establish a connection between image components and scene geometry by defining the transforms that satisfy the multi-view geometry constraint, and obtain a new geometric multi-view correlation model. We first address the construction of dictionaries for images acquired by omnidirectional cameras, which are particularly convenient for scene representation due to their wide field of view. Since most omnidirectional images can be uniquely mapped to spherical images, we form a dictionary by applying motions on the sphere, rotations, and anisotropic scaling to a function that lives on the sphere. We have used this dictionary and a sparse approximation algorithm, Matching Pursuit, for compression of omnidirectional images, and additionally for coding 3D objects represented as spherical signals. Both methods offer better rate-distortion performance than state of the art schemes at low bit rates. The novel multi-view representation method and the dictionary on the sphere are then exploited for the design of a distributed coding method for multi-view omnidirectional images. In a distributed scenario, cameras compress acquired images without communicating with each other. Using a reliable model of correlation between views, distributed coding can achieve higher compression ratios than independent compression of each image. However, the lack of a proper model has been an obstacle for distributed coding in camera networks for many years. We propose to use our geometric correlation model for distributed multi-view image coding with side information. The encoder employs a coset coding strategy, developed by dictionary partitioning based on atom shape similarity and multi-view geometry constraints. Our method results in significant rate savings compared to independent coding. An additional contribution of the proposed correlation model is that it gives information about the scene geometry, leading to a new camera pose estimation method using an extremely small amount of data from each camera. Finally, we develop a method for learning stereo visual dictionaries based on the new multi-view image model. Although dictionary learning for still images has received a lot of attention recently, dictionary learning for stereo images has been investigated only sparingly. Our method maximizes the likelihood that a set of natural stereo images is efficiently represented with selected stereo dictionaries, where the multi-view geometry constraint is included in the probabilistic modeling. Experimental results demonstrate that including the geometric constraints in learning leads to stereo dictionaries that give both better distributed stereo matching and approximation properties than randomly selected dictionaries. We show that learning dictionaries for optimal scene representation based on the novel correlation model improves the camera pose estimation and that it can be beneficial for distributed coding

    Discrimination analysis using Multi-object statistics of shape and pose

    Get PDF
    journal articleA main focus of statistical shape analysis is the description of variability of a population of geometric objects. In this paper, we present work towards modeling the shape and pose variability of sets of multiple objects. Principal geodesic analysis (PGA) is the extension of the standard technique of principal component analysis (PCA) into the nonlinear Riemannian symmetric space of pose and our medial m-rep shape description, a space in which use of PCA would be incorrect. In this paper, we discuss the decoupling of pose and shape in multi-object sets using different normalization settings. Further, we introduce methods of describing the statistics of object pose and object shape, both separately and simultaneously using a novel extension of PGA. We demonstrate our methods in an application to a longitudinal pediatric autism study with object sets of 10 subcortical structures in a population of 47 subjects. The results show that global scale accounts for most of the major mode of variation across time. Furthermore, the PGA components and the corresponding distribution of different subject groups vary significantly depending on the choice of normalization, which illustrates the importance of global and local pose alignment in multi-object shape analysis. Finally, we present results of using distance weighted discrimination analysis (DWD) in an attempt to use pose and shape features to separate subjects according to diagnosis, as well as visualize discriminating differences

    Bayesian Variational Regularisation for Dark Matter Reconstruction with Uncertainty Quantification

    Get PDF
    Despite the great wealth of cosmological knowledge accumulated since the early 20th century, the nature of dark-matter, which accounts for ~85% of the matter content of the universe, remains illusive. Unfortunately, though dark-matter is scientifically interesting, with implications for our fundamental understanding of the Universe, it cannot be directly observed. Instead, dark-matter may be inferred from e.g. the optical distortion (lensing) of distant galaxies which, at linear order, manifests as a perturbation to the apparent magnitude (convergence) and ellipticity (shearing). Ensemble observations of the shear are collected and leveraged to construct estimates of the convergence, which can directly be related to the universal dark-matter distribution. Imminent stage IV surveys are forecast to accrue an unprecedented quantity of cosmological information; a discriminative partition of which is accessible through the convergence, and is disproportionately concentrated at high angular resolutions, where the echoes of cosmological evolution under gravity are most apparent. Capitalising on advances in probability concentration theory, this thesis merges the paradigms of Bayesian inference and optimisation to develop hybrid convergence inference techniques which are scalable, statistically principled, and operate over the Euclidean plane, celestial sphere, and 3-dimensional ball. Such techniques can quantify the plausibility of inferences at one-millionth the computational overhead of competing sampling methods. These Bayesian techniques are applied to the hotly debated Abell-520 merging cluster, concluding that observational catalogues contain insufficient information to determine the existence of dark-matter self-interactions. Further, these techniques were applied to all public lensing catalogues, recovering the then largest global dark-matter mass-map. The primary methodological contributions of this thesis depend only on posterior log-concavity, paving the way towards a, potentially revolutionary, complete hybridisation with artificial intelligence techniques. These next-generation techniques are the first to operate over the full 3-dimensional ball, laying the foundations for statistically principled universal dark-matter cartography, and the cosmological insights such advances may provide
    corecore