9 research outputs found

    Regression models on Riemannian symmetric spaces

    Get PDF
    The aim of this paper is to develop a general regression framework for the analysis of manifold-valued response in a Riemannian symmetric space (RSS) and its association with multiple covariates of interest, such as age or gender, in Euclidean space. Such RSS-valued data arises frequently in medical imaging, surface modeling, and computer vision, among many others. We develop an intrinsic regression model solely based on an intrinsic conditional moment assumption, avoiding specifying any parametric distribution in RSS. We propose various link functions to map from the Euclidean space of multiple covariates to the RSS of responses. We develop a two-stage procedure to calculate the parameter estimates and determine their asymptotic distributions. We construct the Wald and geodesic test statistics to test hypotheses of unknown parameters. We systematically investigate the geometric invariant property of these estimates and test statistics. Simulation studies and a real data analysis are used to evaluate the finite sample properties of our methods

    Zero-Shot Domain Adaptation via Kernel Regression on the Grassmannian

    Get PDF
    Most visual recognition methods implicitly assume the data distribution remains unchanged from training to testing. However, in practice domain shift often exists, where real-world factors such as lighting and sensor type change between train and test, and classifiers do not generalise from source to target domains. It is impractical to train separate models for all possible situations because collecting and labelling the data is expensive. Domain adaptation algorithms aim to ameliorate domain shift, allowing a model trained on a source to perform well on a different target domain. However, even for the setting of unsupervised domain adaptation, where the target domain is unlabelled, collecting data for every possible target domain is still costly. In this paper, we propose a new domain adaptation method that has no need to access either data or labels of the target domain when it can be described by a parametrised vector and there exits several related source domains within the same parametric space. It greatly reduces the burden of data collection and annotation, and our experiments show some promising results.Comment: Accepted to BMVC 2015 Workshop on Differential Geometry in Computer Vision (DIFF-CV

    De Casteljau's algorithm in geometric data analysis: Theory and application

    Get PDF
    For decades, de Casteljau's algorithm has been used as a fundamental building block in curve and surface design and has found a wide range of applications in fields such as scientific computing and discrete geometry, to name but a few. With increasing interest in nonlinear data science, its constructive approach has been shown to provide a principled way to generalize parametric smooth curves to manifolds. These curves have found remarkable new applications in the analysis of parameter-dependent, geometric data. This article provides a survey of the recent theoretical developments in this exciting area as well as its applications in fields such as geometric morphometrics and longitudinal data analysis in medicine, archaeology, and meteorology

    Statistical methods for sparse functional object data: elastic curves, shapes and densities

    Get PDF
    Many applications naturally yield data that can be viewed as elements in non-linear spaces. Consequently, there is a need for non-standard statistical methods capable of handling such data. The work presented here deals with the analysis of data in complex spaces derived from functional L2-spaces as quotient spaces (or subsets of such spaces). These data types include elastic curves represented as d-dimensional functions modulo re-parametrization, planar shapes represented as 2-dimensional functions modulo rotation, scaling and translation, and elastic planar shapes combining all of these invariances. Moreover, also probability densities can be thought of as non-negative functions modulo scaling. Since these functional object data spaces lack a natural Hilbert space structure, this work proposes specialized methods that integrate techniques from functional data analysis with those for metric and manifold data. In particular, but not exclusively, novel regression methods for specific metric quotient spaces are discussed. Special attention is given to handling discrete observations, since in practice curves and shapes are typically observed only as a discrete (often sparse or irregular) set of points. Similarly, density functions are usually not directly observed, but a (small) sample from the corresponding probability distribution is available. Overall, this work comprises six contributions that propose new methods for sparse functional object data and apply them to relevant real-world datasets, predominantly in a biomedical context

    Geometric Data Analysis: Advancements of the Statistical Methodology and Applications

    Get PDF
    Data analysis has become fundamental to our society and comes in multiple facets and approaches. Nevertheless, in research and applications, the focus was primarily on data from Euclidean vector spaces. Consequently, the majority of methods that are applied today are not suited for more general data types. Driven by needs from fields like image processing, (medical) shape analysis, and network analysis, more and more attention has recently been given to data from non-Euclidean spaces–particularly (curved) manifolds. It has led to the field of geometric data analysis whose methods explicitly take the structure (for example, the topology and geometry) of the underlying space into account. This thesis contributes to the methodology of geometric data analysis by generalizing several fundamental notions from multivariate statistics to manifolds. We thereby focus on two different viewpoints. First, we use Riemannian structures to derive a novel regression scheme for general manifolds that relies on splines of generalized Bézier curves. It can accurately model non-geodesic relationships, for example, time-dependent trends with saturation effects or cyclic trends. Since Bézier curves can be evaluated with the constructive de Casteljau algorithm, working with data from manifolds of high dimensions (for example, a hundred thousand or more) is feasible. Relying on the regression, we further develop a hierarchical statistical model for an adequate analysis of longitudinal data in manifolds, and a method to control for confounding variables. We secondly focus on data that is not only manifold- but even Lie group-valued, which is frequently the case in applications. We can only achieve this by endowing the group with an affine connection structure that is generally not Riemannian. Utilizing it, we derive generalizations of several well-known dissimilarity measures between data distributions that can be used for various tasks, including hypothesis testing. Invariance under data translations is proven, and a connection to continuous distributions is given for one measure. A further central contribution of this thesis is that it shows use cases for all notions in real-world applications, particularly in problems from shape analysis in medical imaging and archaeology. We can replicate or further quantify several known findings for shape changes of the femur and the right hippocampus under osteoarthritis and Alzheimer's, respectively. Furthermore, in an archaeological application, we obtain new insights into the construction principles of ancient sundials. Last but not least, we use the geometric structure underlying human brain connectomes to predict cognitive scores. Utilizing a sample selection procedure, we obtain state-of-the-art results

    Development of statistical methods for the analysis of single-cell RNA-seq data

    Get PDF
    Single-cell RNA-sequencing profiles the transcriptome of cells from diverse populations. A popular intermediate data format is a large count matrix of genes x cells. This type of data brings several analytical challenges. Here, I present three projects that I worked on during my PhD that address particular aspects of working with such datasets: - The large number of cells in the count matrix is a challenge for fitting gamma-Poisson generalized linear models with existing tools. I developed a new R package called glmGamPoi to address this gap. I optimized the overdispersion estimation procedure to be quick and robust for datasets with many cells and small counts. I compared the performance against two popular tools (edgeR and DESeq2) and find that my inference is 6x to 13x faster and achieves a higher likelihood for a majority of the genes in four single-cell datasets. - The variance of single-cell RNA-seq counts depends on their mean but many existing statistical tools have optimal performance when the variance is uniform. Accordingly, variance-stabilizing transformations are applied to unlock the large number of methods with such an requirement. I compared four approaches to variance-stabilize the data based on the delta method, model residuals, inferred latent expression state or count factor analysis. I describe the theoretical strength and weaknesses, and compare their empirical performance in a benchmark on simulated and real single-cell data. I find that none of the mathematically more sophisticated transformations consistently outperform the simple log(y/s+1) transformation. - Multi-condition single-cell data offers the opportunity to find differentially expressed genes for individual cell subpopulations. However, the prevalent approach to analyze such data is to start by dividing the cells into discrete populations and then test for differential expression within each group. The results are interpretable but may miss interesting cases by (1) choosing the cluster size too small and lacking power to detect effects or (2) choosing the cluster size too large and obscuring interesting effects apparent on a smaller scale. I developed a new statistical framework for the analysis of multi-condition single-cell data that avoids the premature discretization. The approach performs regression on the latent subspaces occupied by the cells in each condition. The method is implemented as an R package called lemur

    Multivariate General Linear Models (MGLM) on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images

    No full text
    Linear regression is a parametric model which is ubiqui-tous in scientific analysis. The classical setup where the observations and responses, i.e., (xi, yi) pairs, are Eu-clidean is well studied. The setting where yi is manifold valued is a topic of much interest, motivated by applica-tions in shape analysis, topic modeling, and medical imag-ing. Recent work gives strategies for max-margin classi-fiers, principal components analysis, and dictionary learn-ing on certain types of manifolds. For parametric regression specifically, results within the last year provide mechanisms to regress one real-valued parameter, xi 2 R, against a manifold-valued variable, yi 2 M. We seek to substan-tially extend the operating range of such methods by deriv-ing schemes for multivariate multiple linear regression — a manifold-valued dependent variable against multiple in-dependent variables, i.e., f: Rn! M. Our variational algorithm efficiently solves for multiple geodesic bases on the manifold concurrently via gradient updates. This allows us to answer questions such as: what is the relationship of the measurement at voxel y to disease when conditioned on age and gender. We show applications to statistical analy-sis of diffusion weighted images, which give rise to regres-sion tasks on the manifold GL(n)/O(n) for diffusion ten-sor images (DTI) and the Hilbert unit sphere for orientation distribution functions (ODF) from high angular resolution acquisition. The companion open-source code is available on nitrc.org/projects/riem mglm. 1
    corecore