339 research outputs found

    Geometric Data Analysis: Advancements of the Statistical Methodology and Applications

    Get PDF
    Data analysis has become fundamental to our society and comes in multiple facets and approaches. Nevertheless, in research and applications, the focus was primarily on data from Euclidean vector spaces. Consequently, the majority of methods that are applied today are not suited for more general data types. Driven by needs from fields like image processing, (medical) shape analysis, and network analysis, more and more attention has recently been given to data from non-Euclidean spaces–particularly (curved) manifolds. It has led to the field of geometric data analysis whose methods explicitly take the structure (for example, the topology and geometry) of the underlying space into account. This thesis contributes to the methodology of geometric data analysis by generalizing several fundamental notions from multivariate statistics to manifolds. We thereby focus on two different viewpoints. First, we use Riemannian structures to derive a novel regression scheme for general manifolds that relies on splines of generalized Bézier curves. It can accurately model non-geodesic relationships, for example, time-dependent trends with saturation effects or cyclic trends. Since Bézier curves can be evaluated with the constructive de Casteljau algorithm, working with data from manifolds of high dimensions (for example, a hundred thousand or more) is feasible. Relying on the regression, we further develop a hierarchical statistical model for an adequate analysis of longitudinal data in manifolds, and a method to control for confounding variables. We secondly focus on data that is not only manifold- but even Lie group-valued, which is frequently the case in applications. We can only achieve this by endowing the group with an affine connection structure that is generally not Riemannian. Utilizing it, we derive generalizations of several well-known dissimilarity measures between data distributions that can be used for various tasks, including hypothesis testing. Invariance under data translations is proven, and a connection to continuous distributions is given for one measure. A further central contribution of this thesis is that it shows use cases for all notions in real-world applications, particularly in problems from shape analysis in medical imaging and archaeology. We can replicate or further quantify several known findings for shape changes of the femur and the right hippocampus under osteoarthritis and Alzheimer's, respectively. Furthermore, in an archaeological application, we obtain new insights into the construction principles of ancient sundials. Last but not least, we use the geometric structure underlying human brain connectomes to predict cognitive scores. Utilizing a sample selection procedure, we obtain state-of-the-art results

    Mining Time-aware Actor-level Evolution Similarity for Link Prediction in Dynamic Network

    Get PDF
    Topological evolution over time in a dynamic network triggers both the addition and deletion of actors and the links among them. A dynamic network can be represented as a time series of network snapshots where each snapshot represents the state of the network over an interval of time (for example, a minute, hour or day). The duration of each snapshot denotes the temporal scale/sliding window of the dynamic network and all the links within the duration of the window are aggregated together irrespective of their order in time. The inherent trade-off in selecting the timescale in analysing dynamic networks is that choosing a short temporal window may lead to chaotic changes in network topology and measures (for example, the actors’ centrality measures and the average path length); however, choosing a long window may compromise the study and the investigation of network dynamics. Therefore, to facilitate the analysis and understand different patterns of actor-oriented evolutionary aspects, it is necessary to define an optimal window length (temporal duration) with which to sample a dynamic network. In addition to determining the optical temporal duration, another key task for understanding the dynamics of evolving networks is being able to predict the likelihood of future links among pairs of actors given the existing states of link structure at present time. This phenomenon is known as the link prediction problem in network science. Instead of considering a static state of a network where the associated topology does not change, dynamic link prediction attempts to predict emerging links by considering different types of historical/temporal information, for example the different types of temporal evolutions experienced by the actors in a dynamic network due to the topological evolution over time, known as actor dynamicities. Although there has been some success in developing various methodologies and metrics for the purpose of dynamic link prediction, mining actor-oriented evolutions to address this problem has received little attention from the research community. In addition to this, the existing methodologies were developed without considering the sampling window size of the dynamic network, even though the sampling duration has a large impact on mining the network dynamics of an evolutionary network. Therefore, although the principal focus of this thesis is link prediction in dynamic networks, the optimal sampling window determination was also considered

    The statistical analysis of acoustic phonetic data: exploring differences between spoken Romance languages

    Get PDF
    The historical and geographical spread from older to more modern languages has long been studied by examining textual changes and in terms of changes in phonetic transcriptions. However, it is more difficult to analyze language change from an acoustic point of view, although this is usually the dominant mode of transmission. We propose a novel analysis approach for acoustic phonetic data, where the aim will be to statistically model the acoustic properties of spoken words. We explore phonetic variation and change using a time-frequency representation, namely the log-spectrograms of speech recordings. We identify time and frequency covariance functions as a feature of the language; in contrast, mean spectrograms depend mostly on the particular word that has been uttered. We build models for the mean and covariances (taking into account the restrictions placed on the statistical analysis of such objects) and use these to define a phonetic transformation that models how an individual speaker would sound in a different language, allowing the exploration of phonetic differences between languages. Finally, we map back these transformations to the domain of sound recordings, allowing us to listen to the output of the statistical analysis. The proposed approach is demonstrated using recordings of the words corresponding to the numbers from ``one'' to ``ten'' as pronounced by speakers from five different Romance languages.John Coleman appreciates the support of UK Arts and Humanities Research Council grant AH/M002993/1, “Ancient Sounds: mixing acoustic phonetics, statistics and comparative philology to bring speech back from the past”. John Aston appreciates the support of UK Engineering and Physical Sciences Research Council grant EP/K021672/2, “Functional Object Data Analysis and its Applications”

    Applications of Gradient Representations in Resting-State fMRI

    Get PDF
    Classical models of brain organization have often considered the brain to be made up of a mosaic of patches that are demarcated by discrete boundaries, often defined histologically. In contrast, emerging views have pointed towards an alternative paradigm – referred to as gradients – by conceptualizing brain organization as sets of organizational axes that characterizes spatial variation of differing connectivity principles over the extent of a region. Such organizational axes provide a well-suited framework for elucidating underpinnings of brain connectivity and has garnered widespread attention across various domains of neuroimaging. This work seeks to explore various applications of gradient estimation techniques, in combination with resting-state functional connectivity data, across the fields of basic, comparative, and clinical neuroscience. First, gradient estimation was performed on resting-state functional connectivity (RSFC) patterns of the primary somatosensory cortex to unveil a secondary organizational axis that spans the region’s anterior-posterior axis, akin to circuitry fundamental to sensory cortical information processing. Second, gradient techniques were used in a cross-species comparison study to unify connectivity principles of humans and marmosets by mapping them simultaneously onto a set of organizational axes. In doing so, this provided a systematic framework to compare the functional architecture of both species, facilitating novel insight of a well-integrated default-mode network in humans, compared to marmosets. Third, connectivity gradients, along with a myriad of other resting-state fMRI features were used to explore the implications of focal lesion pathophysiology on functional organization of the thalamus in individuals with Multiple Sclerosis. A lack of focal changes to resting-state related features was observed suggesting the limited role of focal thalamic lesions to functional organization in MS. Together, these different avenues of research highlight the capacity for a gradient-centric view in neuroimaging to provide profound insights into brain organization, and its utility across the applications of basic, comparative, and clinical neuroscience

    Statistical Issues in the Analysis of Correlated Data.

    Full text link
    We first extend the original classification and regression trees (CART) paradigm (Breiman et al. 1984) to clustered binary outcomes, where individuals within a cluster are correlated. We propose to generate tree models using residuals from a null generalized linear mixed model (with fixed and random intercepts only) as the outcome, which circumvents modeling the correlation structure explicitly while still accounting for the cluster-correlated design, thereby allowing us to adopt the original CART machinery in tree growing, pruning and cross-validation. Based on simulations, we find that our residual-based tree is more appropriate for analyzing clustered binary data, and provides more accurate classification predictions than the standard CART that ignores the clustering. We also illustrated our method using data from clinical studies, and residual-based trees identified clinically meaningful subgroups. Clinical attachment level (CAL) is a tooth-level measure that quantifies the severity of periodontal disease. The within-mouth correlation of tooth-level CAL is difficult to model because it must reflect the three-dimensional spatial geography of teeth and their functional similarity. In the second project, we propose two linear mixed effects (LME) models with random effects that quantify the within-mouth correlation of teeth and their shared functionality. Via simulations, we demonstrate that our mixed models give consistent and more efficient estimates than a t-test and generalized estimating equations that fail to model the within-mouth correlation accurately. We also evaluate the performance of the approaches when data are missing under different biologically plausible missing data mechanisms. Inference for the fixed effects in an LME model is dependent upon the correlation structure implied by the random effects included in the LME model. However, limited methods are available for making inference about the fit of the assumed covariance structure in the LME model. In the third project, we propose three permutation tests, all of which are based on comparing the estimated assumed covariance matrix to the covariance matrix of the marginal residuals. Cholesky residuals, which are exchangeable both within and among subjects, are employed in the permutations. Through simulations, we show that two of our tests have valid size and comparable power in testing different covariance structure assumptions.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116687/1/rongxia_1.pd

    Statistical learning of random probability measures

    Get PDF
    The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle

    Contributions to the study of Austism Spectrum Brain conectivity

    Get PDF
    164 p.Autism Spectrum Disorder (ASD) is a largely prevalent neurodevelopmental condition with a big social and economical impact affecting the entire life of families. There is an intense search for biomarkers that can be assessed as early as possible in order to initiate treatment and preparation of the family to deal with the challenges imposed by the condition. Brain imaging biomarkers have special interest. Specifically, functional connectivity data extracted from resting state functional magnetic resonance imaging (rs-fMRI) should allow to detect brain connectivity alterations. Machine learning pipelines encompass the estimation of the functional connectivity matrix from brain parcellations, feature extraction and building classification models for ASD prediction. The works reported in the literature are very heterogeneous from the computational and methodological point of view. In this Thesis we carry out a comprehensive computational exploration of the impact of the choices involved while building these machine learning pipelines
    • …
    corecore