110 research outputs found
Variational methods for shape and image registrations.
Estimating and analysis of deformation, either rigid or non-rigid, is an active area of research in various medical imaging and computer vision applications. Its importance stems from the inherent inter- and intra-variability in biological and biomedical object shapes and from the dynamic nature of the scenes usually dealt with in computer vision research. For instance, quantifying the growth of a tumor, recognizing a person\u27s face, tracking a facial expression, or retrieving an object inside a data base require the estimation of some sort of motion or deformation undergone by the object of interest. To solve these problems, and other similar problems, registration comes into play. This is the process of bringing into correspondences two or more data sets. Depending on the application at hand, these data sets can be for instance gray scale/color images or objects\u27 outlines. In the latter case, one talks about shape registration while in the former case, one talks about image/volume registration. In some situations, the combinations of different types of data can be used complementarily to establish point correspondences. One of most important image analysis tools that greatly benefits from the process of registration, and which will be addressed in this dissertation, is the image segmentation. This process consists of localizing objects in images. Several challenges are encountered in image segmentation, including noise, gray scale inhomogeneities, and occlusions. To cope with such issues, the shape information is often incorporated as a statistical model into the segmentation process. Building such statistical models requires a good and accurate shape alignment approach. In addition, segmenting anatomical structures can be accurately solved through the registration of the input data set with a predefined anatomical atlas. Variational approaches for shape/image registration and segmentation have received huge interest in the past few years. Unlike traditional discrete approaches, the variational methods are based on continuous modelling of the input data through the use of Partial Differential Equations (PDE). This brings into benefit the extensive literature on theory and numerical methods proposed to solve PDEs. This dissertation addresses the registration problem from a variational point of view, with more focus on shape registration. First, a novel variational framework for global-to-local shape registration is proposed. The input shapes are implicitly represented through their signed distance maps. A new Sumof- Squared-Differences (SSD) criterion which measures the disparity between the implicit representations of the input shapes, is introduced to recover the global alignment parameters. This new criteria has the advantages over some existing ones in accurately handling scale variations. In addition, the proposed alignment model is less expensive computationally. Complementary to the global registration field, the local deformation field is explicitly established between the two globally aligned shapes, by minimizing a new energy functional. This functional incrementally and simultaneously updates the displacement field while keeping the corresponding implicit representation of the globally warped source shape as close to a signed distance function as possible. This is done under some regularization constraints that enforce the smoothness of the recovered deformations. The overall process leads to a set of coupled set of equations that are simultaneously solved through a gradient descent scheme. Several applications, where the developed tools play a major role, are addressed throughout this dissertation. For instance, some insight is given as to how one can solve the challenging problem of three dimensional face recognition in the presence of facial expressions. Statistical modelling of shapes will be presented as a way of benefiting from the proposed shape registration framework. Second, this dissertation will visit th
Recommended from our members
Recognizing human activity using RGBD data
textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin
The anisotropic grain size effect on the mechanical response of polycrystals: The role of columnar grain morphology in additively manufactured metals
Additively manufactured (AM) metals exhibit highly complex microstructures,
particularly with respect to grain morphology which typically features
heterogeneous grain size distribution, anomalous and anisotropic grain shapes,
and the so-called columnar grains. In general, the conventional morphological
descriptors are not suitable to represent complex and anisotropic grain
morphology of AM microstructures. The principal aspect of microstructural grain
morphology is the state of grain boundary spacing or grain size whose effect on
the mechanical response is known to be crucial. In this paper, we formally
introduce the notions of axial grain size and grain size anisotropy as robust
morphological descriptors which can concisely represent highly complex grain
morphologies. We instantiated a discrete sample of polycrystalline aggregate as
a representative volume element (RVE) which has random crystallographic
orientation and misorientation distributions. However, the instantiated RVE
incorporates the typical morphological features of AM microstructures including
distinctive grain size heterogeneity and anisotropic grain size owing to its
pronounced columnar grain morphology. We ensured that any anisotropy arising in
the macroscopic mechanical response of the instantiated sample is mainly
associated with its underlying anisotropic grain size. The RVE was then used
for meso-scale full-field crystal plasticity simulations corresponding to
uniaxial tensile deformation along different axes via a spectral solver and a
physics-based crystal plasticity constitutive model. Through the numerical
analyses, we were able to isolate the contribution of anisotropic grain size to
the anisotropy in the mechanical response of polycrystalline aggregates,
particularly those with the characteristic complex grain morphology of AM
metals. Such a contribution can be described by an inverse square relation
Retrieval of 3-Dimensional Rigid and Non-Rigid Objects
Η παρούσα διδακτορική διατριβή εστιάζει στο πρόβλημα της ανάκτησης 3Δ
αντικειμένων από μεγάλες βάσεις δεδομένων σε σχεδόν πραγματικό χρόνο. Για την
αντιμετώπιση του προβλήματος αυτού, η έρευνα επικεντρώνεται σε τρία βασικά
υποπροβλήματα του χώρου: (α) κανονικοποίηση θέσης άκαμπτων 3Δ μοντέλων με
εφαρμογές στην ανάκτηση 3Δ αντικειμένων, (β) περιγραφή εύκαμπτων 3Δ
αντικειμένων και (γ) αναζήτηση από βάσεις δεδομένων 3Δ αντικειμένων βασιζόμενη
σε 2Δ εικόνες-ερώτησης. Σχετικά με το πρώτο υποπρόβλημα, την κανονικοποίηση
θέσης 3Δ μοντέλων, παρουσιάζονται τρεις νέες μέθοδοι οι οποίες βασίζονται στις
εξής αρχές: (α) Τριδιάστατη Ανακλαστική Συμμετρία Αντικειμένου (ROSy) και (β,
γ) Διδιάστατη Ανακλαστική Συμμετρία Αντικειμένου υπολογιζόμενη επί Πανοραμικών
Προβολών (SymPan και SymPan+). Όσον αφορά το δεύτερο υποπρόβλημα, αναπτύχθηκε
μια μέθοδος ανάκτησης εύκαμπτων 3Δ αντικειμένων, η οποία συνδυάζει τις
ιδιότητες της σύμμορφης γεωμετρίας και της τοπολογικής πληροφορίας βασιζόμενης
σε γράφους, με ενιαίο τρόπο (ConTopo++). Επιπλέον, προτείνεται μια στρατηγική
συνταιριασμού συμβολοσειρών, για τη σύγκριση των γράφων που αναπαριστούν 3Δ
αντικείμενα.
Σχετικά με το τρίτο υποπρόβλημα, παρουσιάζεται μια μέθοδος ανάκτησης 3Δ
αντικειμένων, βασιζόμενη σε 2Δ εικόνες-ερώτησης, οι οποίες αντιπροσωπεύουν
προβολές πραγματικών 3Δ αντικειμένων. Τα πλήρη 3Δ αντικείμενα της βάσης
δεδομένων περιγράφονται από ένα σύνολο πανοραμικών προβολών και ένα μοντέλο
Bag-of-Visual-Words δημιουργείται χρησιμοποιώντας τα χαρακτηριστικά SIFT που
προέρχονται από αυτά. Οι μεθοδολογίες που αναπτύχθηκαν και περιγράφονται στην
παρούσα διατριβή αξιολογούνται όσον αφορά την ακρίβεια ανάκτησης και
παρουσιάζονται κάνοντας χρήση ποσοτικών και ποιοτικών μέτρων μέσω μιας
εκτεταμένης και συνεκτικής αξιολόγησης σε σχέση με μεθόδους τρέχουσας
τεχνολογικής στάθμης επάνω σε τυποποιημένες βάσεις δεδομένων.This dissertation focuses on the problem of 3D object retrieval from large
datasets in a near realtime manner. In order to address this task we focus on
three major subproblems of the field: (i) pose normalization of rigid 3D models
with applications to 3D object retrieval, (ii) non-rigid 3D object description
and (iii) search over rigid 3D object datasets based on 2D image queries.
Regarding the first of the three subproblems, 3D model pose normalization,
three novel pose normalization methods are presented, based on: (i) 3D
Reflective Object Symmetry (ROSy) and (ii, iii) 2D Reflective Object Symmetry
computed on Panoramic Views (SymPan and SymPan+). Considering the second
subproblem, a non-rigid 3D object retrieval methodology, based on the
properties of conformal geometry and graph-based topological information
(ConTopo++) has been developed. Furthermore, a string matching strategy for the
comparison of graphs that describe 3D objects, is proposed. Regarding the third
subproblem a 3D object retrieval method, based on 2D range image queries that
represent partial views of real 3D objects, is presented. The complete 3D
objects of the database are described by a set of panoramic views and a
Bag-of-Visual-Words model is built using SIFT features extracted from them. The
methodologies developed and described in this dissertation are evaluated in
terms of retrieval accuracy and demonstrated using both quantitative and
qualitative measures via an extensive consistent evaluation against
state-of-the-art methods on standard datasets
Geometric Approaches for 3D Shape Denoising and Retrieval
A key issue in developing an accurate 3D shape recognition system is to design an efficient shape
descriptor for which an index can be built, and similarity queries can be answered efficiently. While
the overwhelming majority of prior work on 3D shape analysis has concentrated primarily on rigid
shape retrieval, many real objects such as articulated motions of humans are nonrigid and hence
can exhibit a variety of poses and deformations.
Motivated by the recent surge of interest in content-based analysis of 3D objects in computeraided
design and multimedia computing, we develop in this thesis a unified theoretical and computational
framework for 3D shape denoising and retrieval by incorporating insights gained from
algebraic graph theory and spectral geometry. We first present a regularized kernel diffusion for
3D shape denoising by solving partial differential equations in the weighted graph-theoretic framework.
Then, we introduce a computationally fast approach for surface denoising using the vertexcentered
finite volume method coupled with the mesh covariance fractional anisotropy. Additionally,
we propose a spectral-geometric shape skeleton for 3D object recognition based on the second
eigenfunction of the Laplace-Beltrami operator in a bid to capture the global and local geometry
of 3D shapes. To further enhance the 3D shape retrieval accuracy, we introduce a graph matching
approach by assigning geometric features to each endpoint of the shape skeleton. Extensive experiments
are carried out on two 3D shape benchmarks to assess the performance of the proposed
shape retrieval framework in comparison with state-of-the-art methods. The experimental results
show that the proposed shape descriptor delivers best-in-class shape retrieval performance
Signal processing with Fourier analysis, novel algorithms and applications
Fourier analysis is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions, also analogously known as sinusoidal modeling. The original idea of Fourier had a profound impact on mathematical analysis, physics and engineering because it diagonalizes time-invariant convolution operators. In the past signal processing was a topic that stayed almost exclusively in electrical engineering, where only the experts could cancel noise, compress and reconstruct signals. Nowadays it is almost ubiquitous, as everyone now deals with modern digital signals. Medical imaging, wireless communications and power systems of the future will experience more data processing conditions and wider range of applications requirements than the systems of today. Such systems will require more powerful, efficient and flexible signal processing algorithms that are well designed to handle such needs. No matter how advanced our hardware technology becomes we will still need intelligent and efficient algorithms to address the growing demands in signal processing. In this thesis, we investigate novel techniques to solve a suite of four fundamental problems in signal processing that have a wide range of applications. The relevant equations, literature of signal processing applications, analysis and final numerical algorithms/methods to solve them using Fourier analysis are discussed for different applications in the electrical engineering/computer science. The first four chapters cover the following topics of central importance in the field of signal processing: • Fast Phasor Estimation using Adaptive Signal Processing (Chapter 2) • Frequency Estimation from Nonuniform Samples (Chapter 3) • 2D Polar and 3D Spherical Polar Nonuniform Discrete Fourier Transform (Chapter 4) • Robust 3D registration using Spherical Polar Discrete Fourier Transform and Spherical Harmonics (Chapter 5) Even though each of these four methods discussed may seem completely disparate, the underlying motivation for more efficient processing by exploiting the Fourier domain signal structure remains the same. The main contribution of this thesis is the innovation in the analysis, synthesis, discretization of certain well known problems like phasor estimation, frequency estimation, computations of a particular non-uniform Fourier transform and signal registration on the transformed domain. We conduct propositions and evaluations of certain applications relevant algorithms such as, frequency estimation algorithm using non-uniform sampling, polar and spherical polar Fourier transform. The techniques proposed are also useful in the field of computer vision and medical imaging. From a practical perspective, the proposed algorithms are shown to improve the existing solutions in the respective fields where they are applied/evaluated. The formulation and final proposition is shown to have a variety of benefits. Future work with potentials in medical imaging, directional wavelets, volume rendering, video/3D object classifications, high dimensional registration are also discussed in the final chapter. Finally, in the spirit of reproducible research we release the implementation of these algorithms to the public using Github
On incorporating inductive biases into deep neural networks
A machine learning (ML) algorithm can be interpreted as a system that learns to capture patterns in data distributions. Before the modern \emph{deep learning era}, emulating the human brain, the use of structured representations and strong inductive bias have been prevalent in building ML models, partly due to the expensive computational resources and the limited availability of data. On the contrary, armed with increasingly cheaper hardware and abundant data, deep learning has made unprecedented progress during the past decade, showcasing incredible performance on a diverse set of ML tasks. In contrast to \emph{classical ML} models, the latter seeks to minimize structured representations and inductive bias when learning, implicitly favoring the flexibility of learning over manual intervention. Despite the impressive performance, attention is being drawn towards enhancing the (relatively) weaker areas of deep models such as learning with limited resources, robustness, minimal overhead to realize simple relationships, and ability to generalize the learned representations beyond the training conditions, which were (arguably) the forte of classical ML. Consequently, a recent hybrid trend is surfacing that aims to blend structured representations and substantial inductive bias into deep models, with the hope of improving them. Based on the above motivation, this thesis investigates methods to improve the performance of deep models using inductive bias and structured representations across multiple problem domains. To this end, we inject a priori knowledge into deep models in the form of enhanced feature extraction techniques, geometrical priors, engineered features, and optimization constraints. Especially, we show that by leveraging the prior knowledge about the task in hand and the structure of data, the performance of deep learning models can be significantly elevated. We begin by exploring equivariant representation learning. In general, the real-world observations are prone to fundamental transformations (e.g., translation, rotation), and deep models typically demand expensive data-augmentations and a high number of filters to tackle such variance. In comparison, carefully designed equivariant filters possess this ability by nature. Henceforth, we propose a novel \emph{volumetric convolution} operation that can convolve arbitrary functions in the unit-ball () while preserving rotational equivariance by projecting the input data onto the Zernike basis. We conduct extensive experiments and show that our formulations can be used to construct significantly cheaper ML models. Next, we study generative modeling of 3D objects and propose a principled approach to synthesize 3D point-clouds in the spectral-domain by obtaining a structured representation of 3D points as functions on the unit sphere (). Using the prior knowledge about the spectral moments and the output data manifold, we design an architecture that can maximally utilize the information in the inputs and generate high-resolution point-clouds with minimal computational overhead. Finally, we propose a framework to build normalizing flows (NF) based on increasing triangular maps and Bernstein-type polynomials. Compared to the existing NF approaches, our framework consists of favorable characteristics for fusing inductive bias within the model i.e., theoretical upper bounds for the approximation error, robustness, higher interpretability, suitability for compactly supported densities, and the ability to employ higher degree polynomials without training instability. Most importantly, we present a constructive universality proof, which permits us to analytically derive the optimal model coefficients for known transformations without training
- …