339 research outputs found
Geometric Data Analysis: Advancements of the Statistical Methodology and Applications
Data analysis has become fundamental to our society and comes in multiple facets and approaches. Nevertheless, in research and applications, the focus was primarily on data from Euclidean vector spaces. Consequently, the majority of methods that are applied today are not suited for more general data types. Driven by needs from fields like image processing, (medical) shape analysis, and network analysis, more and more attention has recently been given to data from non-Euclidean spaces–particularly (curved) manifolds. It has led to the field of geometric data analysis whose methods explicitly take the structure (for example, the topology and geometry) of the underlying space into account.
This thesis contributes to the methodology of geometric data analysis by generalizing several fundamental notions from multivariate statistics to manifolds. We thereby focus on two different viewpoints.
First, we use Riemannian structures to derive a novel regression scheme for general manifolds that relies on splines of generalized BĂ©zier curves. It can accurately model non-geodesic relationships, for example, time-dependent trends with saturation effects or cyclic trends. Since BĂ©zier curves can be evaluated with the constructive de Casteljau algorithm, working with data from manifolds of high dimensions (for example, a hundred thousand or more) is feasible. Relying on the regression, we further develop
a hierarchical statistical model for an adequate analysis of longitudinal data in manifolds, and a method to control for confounding variables.
We secondly focus on data that is not only manifold- but even Lie group-valued, which is frequently the case in applications. We can only achieve this by endowing the group with an affine connection structure that is generally not Riemannian. Utilizing it, we derive generalizations of several well-known dissimilarity measures between data distributions that can be used for various tasks, including hypothesis testing. Invariance under data translations is proven, and a connection to continuous distributions is given for one measure.
A further central contribution of this thesis is that it shows use cases for all notions in real-world applications, particularly in problems from shape analysis in medical imaging and archaeology. We can replicate or further quantify several known findings for shape changes of the femur and the right hippocampus under osteoarthritis and Alzheimer's, respectively. Furthermore, in an archaeological application, we obtain new insights into the construction principles of ancient sundials. Last but not least, we use the geometric structure underlying human brain connectomes to predict cognitive scores. Utilizing a sample selection procedure, we obtain state-of-the-art results
Recommended from our members
High-quality dense stereo vision for whole body imaging and obesity assessment
textThe prevalence of obesity has necessitated developing safe and convenient tools for timely assessing and monitoring this condition for a broad range of population. Three-dimensional (3D) body imaging has become a new mean for obesity assessment. Moreover, it generates body shape information that is meaningful for fitness, ergonomics, and personalized clothing. In the previous work of our lab, we developed a prototype active stereo vision system that demonstrated a potential to fulfill this goal. But the prototype required four computer projectors to cast artificial textures on the body which facilitate the stereo-matching on texture-deficient images (e.g., skin). This decreases the mobility of the system when used to collect a large population data. In addition, the resolution of the generated 3D~images is limited by both cameras and projectors available during the project. The study reported in this dissertation highlights our continued effort in improving the capability of 3Dbody imaging through simplified hardware for passive stereo and advanced computation techniques.
The system utilizes high-resolution single-lens reflex (SLR) cameras, which became widely available lately, and is configured in a two-stance design to image the front and back surfaces of a person. A total of eight cameras are used to form four pairs of stereo units. Each unit covers a quarter of the body surface. The stereo units are individually calibrated with a specific pattern to determine cameras' intrinsic and extrinsic parameters for stereo matching. The global orientation and position of each stereo unit within a common world coordinate system is calculated through a 3Dregistration step. The stereo calibration and 3Dregistration procedures do not need to be repeated for a deployed system if the cameras' relative positions have not changed. This property contributes to the portability of the system, and tremendously alleviates the maintenance task. The image acquisition time is around two seconds for a whole-body capture. The system works in an indoor environment with a moderate ambient light.
Advanced stereo computation algorithms are developed by taking advantage of high-resolution images and by tackling the ambiguity problem in stereo matching. A multi-scale, coarse-to-fine matching framework is proposed to match large-scale textures at a low resolution and refine the matched results over higher resolutions. This matching strategy reduces the complexity of the computation and avoids ambiguous matching at the native resolution. The pixel-to-pixel stereo matching algorithm follows a classic, four-step strategy which consists of matching cost computation, cost aggregation, disparity computation and disparity refinement.
The system performance has been evaluated on mannequins and human subjects in comparison with other measurement methods. It was found that the geometrical measurements from reconstructed 3Dbody models, including body circumferences and whole volume, are highly repeatable and consistent with manual and other instrumental measurements (CV 0.99). The agreement of percent body fat (%BF) estimation on human subjects between stereo and dual-energy X-ray absorptiometry (DEXA) was found to be improved over the previous active stereo system, and the limits of agreement with 95% confidence were reduced by half. Our achieved %BF estimation agreement is among the lowest ones of other comparative studies with commercialized air displacement plethysmography (ADP) and DEXA. In practice, %BF estimation through a two-component model is sensitive to body volume measurement, and the estimation of lung volume could be a source of variation. Protocols for this type of measurement should still be created with an awareness of this factor.Biomedical Engineerin
Mining Time-aware Actor-level Evolution Similarity for Link Prediction in Dynamic Network
Topological evolution over time in a dynamic network triggers both the addition and deletion of actors and the links among them. A dynamic network can be represented as a time series of network snapshots where each snapshot represents the state of the network over an interval of time (for example, a minute, hour or day). The duration of each snapshot denotes the temporal scale/sliding window of the dynamic network and all the links within the duration of the window are aggregated together irrespective of their order in time. The inherent trade-off in selecting the timescale in analysing dynamic networks is that choosing a short temporal window may lead to chaotic changes in network topology and measures (for example, the actors’ centrality measures and the average path length); however, choosing a long window may compromise the study and the investigation of network dynamics. Therefore, to facilitate the analysis and understand different patterns of actor-oriented evolutionary aspects, it is necessary to define an optimal window length (temporal duration) with which to sample a dynamic network. In addition to determining the optical temporal duration, another key task for understanding the dynamics of evolving networks is being able to predict the likelihood of future links among pairs of actors given the existing states of link structure at present time. This phenomenon is known as the link prediction problem in network science. Instead of considering a static state of a network where the associated topology does not change, dynamic link prediction attempts to predict emerging links by considering different types of historical/temporal information, for example the different types of temporal evolutions experienced by the actors in a dynamic network due to the topological evolution over time, known as actor dynamicities. Although there has been some success in developing various methodologies and metrics for the purpose of dynamic link prediction, mining actor-oriented evolutions to address this problem has received little attention from the research community. In addition to this, the existing methodologies were developed without considering the sampling window size of the dynamic network, even though the sampling duration has a large impact on mining the network dynamics of an evolutionary network. Therefore, although the principal focus of this thesis is link prediction in dynamic networks, the optimal sampling window determination was also considered
The statistical analysis of acoustic phonetic data: exploring differences between spoken Romance languages
The historical and geographical spread from older to more modern languages has long been studied by examining textual changes and in terms of changes in phonetic transcriptions. However, it is more difficult to analyze language change from an acoustic point of view, although this is usually the dominant mode of transmission. We propose a novel analysis approach for acoustic phonetic data, where the aim will be to statistically model the acoustic properties of spoken words. We explore phonetic variation and change using a time-frequency representation, namely the log-spectrograms of speech recordings. We identify time and frequency covariance functions as a feature of the language; in contrast, mean spectrograms depend mostly on the particular word that has been uttered. We build models for the mean and covariances (taking into account the restrictions placed on the statistical analysis of such objects) and use these to define a phonetic transformation that models how an individual speaker would sound in a different language, allowing the exploration of phonetic differences between languages. Finally, we map back these transformations to the domain of sound recordings, allowing us to listen to the output of the statistical analysis. The proposed approach is demonstrated using recordings of the words corresponding to the numbers from ``one'' to ``ten'' as pronounced by speakers from five different Romance languages.John Coleman appreciates the support of UK Arts and Humanities Research Council grant AH/M002993/1, “Ancient Sounds: mixing acoustic phonetics, statistics and comparative philology to bring speech back from the past”. John Aston appreciates the support of UK Engineering and Physical Sciences Research Council grant EP/K021672/2, “Functional Object Data Analysis and its Applications”
Applications of Gradient Representations in Resting-State fMRI
Classical models of brain organization have often considered the brain to be made up of a mosaic of patches that are demarcated by discrete boundaries, often defined histologically. In contrast, emerging views have pointed towards an alternative paradigm – referred to as gradients – by conceptualizing brain organization as sets of organizational axes that characterizes spatial variation of differing connectivity principles over the extent of a region. Such organizational axes provide a well-suited framework for elucidating underpinnings of brain connectivity and has garnered widespread attention across various domains of neuroimaging. This work seeks to explore various applications of gradient estimation techniques, in combination with resting-state functional connectivity data, across the fields of basic, comparative, and clinical neuroscience.
First, gradient estimation was performed on resting-state functional connectivity (RSFC) patterns of the primary somatosensory cortex to unveil a secondary organizational axis that spans the region’s anterior-posterior axis, akin to circuitry fundamental to sensory cortical information processing. Second, gradient techniques were used in a cross-species comparison study to unify connectivity principles of humans and marmosets by mapping them simultaneously onto a set of organizational axes. In doing so, this provided a systematic framework to compare the functional architecture of both species, facilitating novel insight of a well-integrated default-mode network in humans, compared to marmosets. Third, connectivity gradients, along with a myriad of other resting-state fMRI features were used to explore the implications of focal lesion pathophysiology on functional organization of the thalamus in individuals with Multiple Sclerosis. A lack of focal changes to resting-state related features was observed suggesting the limited role of focal thalamic lesions to functional organization in MS.
Together, these different avenues of research highlight the capacity for a gradient-centric view in neuroimaging to provide profound insights into brain organization, and its utility across the applications of basic, comparative, and clinical neuroscience
Statistical Issues in the Analysis of Correlated Data.
We first extend the original classification and regression trees (CART) paradigm (Breiman et al. 1984) to clustered binary outcomes, where individuals within a cluster are correlated. We propose to generate tree models using residuals from a null generalized linear mixed model (with fixed and random intercepts only) as the outcome, which circumvents modeling the correlation structure explicitly while still accounting for the cluster-correlated design, thereby allowing us to adopt the original CART machinery in tree growing, pruning and cross-validation. Based on simulations, we find that our residual-based tree is more appropriate for analyzing clustered binary data, and provides more accurate classification predictions than the standard CART that ignores the clustering. We also illustrated our method using data from clinical studies, and residual-based trees identified clinically meaningful subgroups.
Clinical attachment level (CAL) is a tooth-level measure that quantifies the severity of periodontal disease. The within-mouth correlation of tooth-level CAL is difficult to model because it must reflect the three-dimensional spatial geography of teeth and their functional similarity. In the second project, we propose two linear mixed effects (LME) models with random effects that quantify the within-mouth correlation of teeth and their shared functionality. Via simulations, we demonstrate that our mixed models give consistent and more efficient estimates than a t-test and generalized estimating equations that fail to model the within-mouth correlation accurately. We also evaluate the performance of the approaches when data are missing under different biologically plausible missing data mechanisms.
Inference for the fixed effects in an LME model is dependent upon the correlation structure implied by the random effects included in the LME model. However, limited methods are available for making inference about the fit of the assumed covariance structure in the LME model. In the third project, we propose three permutation tests, all of which are based on comparing the estimated assumed covariance matrix to the covariance matrix of the marginal residuals. Cholesky residuals, which are exchangeable both within and among subjects, are employed in the permutations. Through simulations, we show that two of our tests have valid size and comparable power in testing different covariance structure assumptions.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116687/1/rongxia_1.pd
Statistical learning of random probability measures
The study of random probability measures is a lively research topic that has
attracted interest from different fields in recent years. In this thesis, we consider
random probability measures in the context of Bayesian nonparametrics,
where the law of a random probability measure is used as prior distribution,
and in the context of distributional data analysis, where
the goal is to perform inference given avsample from the law of a random probability measure.
The contributions contained in this thesis can be subdivided according to three
different topics: (i) the use of almost surely discrete repulsive random measures
(i.e., whose support points are well separated) for Bayesian model-based
clustering, (ii) the proposal of new laws for collections of random probability
measures for Bayesian density estimation of partially
exchangeable data subdivided into different groups, and (iii) the study
of principal component analysis and regression models for probability distributions
seen as elements of the 2-Wasserstein space. Specifically, for point
(i) above we propose an efficient Markov chain Monte Carlo algorithm for
posterior inference, which sidesteps the need of split-merge reversible jump
moves typically associated with poor performance, we propose a model for
clustering high-dimensional data by introducing a novel class of anisotropic
determinantal point processes, and study the distributional properties of the
repulsive measures, shedding light on important theoretical results which enable
more principled prior elicitation and more efficient posterior simulation
algorithms. For point (ii) above, we consider several models suitable for clustering
homogeneous populations, inducing spatial dependence across groups of
data, extracting the characteristic traits common to all the data-groups, and
propose a novel vector autoregressive model to study of growth
curves of Singaporean kids. Finally, for point (iii), we propose a novel class of
projected statistical methods for distributional data analysis for measures
on the real line and on the unit-circle
Contributions to the study of Austism Spectrum Brain conectivity
164 p.Autism Spectrum Disorder (ASD) is a largely prevalent neurodevelopmental condition with a big social and economical impact affecting the entire life of families. There is an intense search for biomarkers that can be assessed as early as possible in order to initiate treatment and preparation of the family to deal with the challenges imposed by the condition. Brain imaging biomarkers have special interest. Specifically, functional connectivity data extracted from resting state functional magnetic resonance imaging (rs-fMRI) should allow to detect brain connectivity alterations. Machine learning pipelines encompass the estimation of the functional connectivity matrix from brain parcellations, feature extraction and building classification models for ASD prediction. The works reported in the literature are very heterogeneous from the computational and methodological point of view. In this Thesis we carry out a comprehensive computational exploration of the impact of the choices involved while building these machine learning pipelines
- …