149 research outputs found
Stochastic Convergence of Persistence Landscapes and Silhouettes
Persistent homology is a widely used tool in Topological Data Analysis that
encodes multiscale topological information as a multi-set of points in the
plane called a persistence diagram. It is difficult to apply statistical theory
directly to a random sample of diagrams. Instead, we can summarize the
persistent homology with the persistence landscape, introduced by Bubenik,
which converts a diagram into a well-behaved real-valued function. We
investigate the statistical properties of landscapes, such as weak convergence
of the average landscapes and convergence of the bootstrap. In addition, we
introduce an alternate functional summary of persistent homology, which we call
the silhouette, and derive an analogous statistical theory
Persistent homology analysis of brain artery trees
New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries
New methods for fixed-margin binary matrix sampling, Fréchet covariance, and MANOVA tests for random objects in multiple metric spaces
2022 Summer.Includes bibliographical references.Many approaches to the analysis of network data essentially view the data as Euclidean and apply standard multivariate techniques. In this dissertation, we refrain from this approach, exploring two alternate approaches to the analysis of networks and other structured data. The first approach seeks to determine how unique an observed simple, directed network is by comparing it to like networks which share its degree distribution. Generating networks for comparison requires sampling from the space of all binary matrices with the prescribed row and column margins, since enumeration of all such matrices is often infeasible for even moderately sized networks with 20-50 nodes. We propose two new sampling methods for this problem. First, we extend two Markov chain Monte Carlo methods to sample from the space non-uniformly, allowing flexibility in the case that some networks are more likely than others. We show that non-uniform sampling could impede the MCMC process, but in certain special cases is still valid. Critically, we illustrate the differential conclusions that could be drawn from uniform vs. nonuniform sampling. Second, we develop a generalized divide and conquer approach which recursively divides matrices into smaller subproblems which are much easier to count and sample. Each division step reveals interesting mathematics involving the enumeration of integer partitions and points in convex lattice polytopes. The second broad approach we explore is comparing random objects in metric spaces lacking a coordinate system. Traditional definitions of the mean and variance no longer apply, and standard statistical tests have needed reconceptualization in terms of only distances in the metric space. We consider the multivariate setting where random objects exist in multiple metric spaces, which can be thought of as distinct views of the random object. We define the notion of Fréchet covariance to measure dependence between two metric spaces, and establish consistency for the sample estimator. We then propose several tests for differences in means and covariance matrices among two or more groups in multiple metric spaces, and compare their performance on scenarios involving random probability distributions and networks with node covariates
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
SuPP & MaPP: Adaptable Structure-Based Representations For Mir Tasks
Accurate and flexible representations of music data are paramount to addressing MIR tasks, yet many of the existing approaches are difficult to interpret or rigid in nature. This work introduces two new song representations for structure-based retrieval methods: Surface Pattern Preservation (SuPP), a continuous song representation, and Matrix Pattern Preservation (MaPP), SuPP’s discrete counterpart. These representations come equipped with several user-defined parameters so that they are adaptable for a range of MIR tasks. Experimental results show MaPP as successful in addressing the cover song task on a set of Mazurka scores, with a mean precision of 0.965 and recall of 0.776. SuPP and MaPP also show promise in other MIR applications, such as novel-segment detection and genre classification, the latter of which demonstrates their suitability as inputs for machine learning problems
Unified Topological Inference for Brain Networks in Temporal Lobe Epilepsy Using the Wasserstein Distance
Persistent homology can extract hidden topological signals present in brain
networks. Persistent homology summarizes the changes of topological structures
over multiple different scales called filtrations. Doing so detect hidden
topological signals that persist over multiple scales. However, a key obstacle
of applying persistent homology to brain network studies has always been the
lack of coherent statistical inference framework. To address this problem, we
present a unified topological inference framework based on the Wasserstein
distance. Our approach has no explicit models and distributional assumptions.
The inference is performed in a completely data driven fashion. The method is
applied to the resting-state functional magnetic resonance images (rs-fMRI) of
the temporal lobe epilepsy patients collected at two different sites:
University of Wisconsin-Madison and the Medical College of Wisconsin. However,
the topological method is robust to variations due to sex and acquisition, and
thus there is no need to account for sex and site as categorical nuisance
covariates. We are able to localize brain regions that contribute the most to
topological differences. We made MATLAB package available at
https://github.com/laplcebeltrami/dynamicTDA that was used to perform all the
analysis in this study
Recommended from our members
Topological and geometric inference of data
The overarching problem under consideration is to determine the structure
of the subspace on which a distribution is supported, given
only a finite noisy sample thereof. The special case in
which the subspace is an embedded manifold is given particular
attention owing to its conceptual elegance, and asymptotic bounds are
obtained on the admissible level of noise such that the
manifold can be recovered up to homotopy equivalence.
Attention is turned on how to accomplish this in practice.
Following ideas from topological data analysis, simplicial complexes are used
as discrete analogues of spaces suitable for computation. By utilising
the prior assumption that the data lie on a manifold, topologically
inspired techniques are proposed for refining the simplicial complex
to better approximate this manifold. This is applied to the
problem of nonlinear dimensionality reduction and found to improve accuracy
of reconstructing several synthetic and real-world datasets.
The second chapter focuses on extending this work to the
case where the ambient space is non-Euclidean. The interfaces between
topological data analysis, functional data analysis, and shape analysis
are thoroughly explored. Lipschitz bounds are proved which relate several
metrics on the space of positive semidefinite matrices; they are then
interpreted in the context of topological data analysis. This is
applied to diffusion tensor imaging and phonology.
The final chapter explores the case where the points are
non-uniformly distributed over the embedded subspace. In particular, a method
is proposed to overcome the shortcomings of witness complex construction
when there are large deviations in the density. The theory
of multidimensional persistence is leveraged to provide a succinct setting
in which the structure of the data can be interpreted
as a generalised stratified space.EPSR
- …