Search CORE

10 research outputs found

Dimension Detection with Local Homology

Author: Dey Tamal K.
Fan Fengtao
Wang Yusu
Publication venue
Publication date: 14/05/2014
Field of study

Detecting the dimension of a hidden manifold from a point sample has become an important problem in the current data-driven era. Indeed, estimating the shape dimension is often the first step in studying the processes or phenomena associated to the data. Among the many dimension detection algorithms proposed in various fields, a few can provide theoretical guarantee on the correctness of the estimated dimension. However, the correctness usually requires certain regularity of the input: the input points are either uniformly randomly sampled in a statistical setting, or they form the so-called

(\varepsilon,\delta)

-sample which can be neither too dense nor too sparse. Here, we propose a purely topological technique to detect dimensions. Our algorithm is provably correct and works under a more relaxed sampling condition: we do not require uniformity, and we also allow Hausdorff noise. Our approach detects dimension by determining local homology. The computation of this topological structure is much less sensitive to the local distribution of points, which leads to the relaxation of the sampling conditions. Furthermore, by leveraging various developments in computational topology, we show that this local homology at a point

z

can be computed \emph{exactly} for manifolds using Vietoris-Rips complexes whose vertices are confined within a local neighborhood of

z

. We implement our algorithm and demonstrate the accuracy and robustness of our method using both synthetic and real data sets

arXiv.org e-Print Archive

CiteSeerX

Only distances are required to reconstruct submanifolds

Author: Boissonnat Jean-Daniel
Dyer Ramsay
Ghosh Arijit
Oudot Steve Y.
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we give the first algorithm that outputs a faithful reconstruction of a submanifold of Euclidean space without maintaining or even constructing complicated data structures such as Voronoi diagrams or Delaunay complexes. Our algorithm uses the witness complex and relies on the stability of power protection, a notion introduced in this paper. The complexity of the algorithm depends exponentially on the intrinsic dimension of the manifold, rather than the dimension of ambient space, and linearly on the dimension of the ambient space. Another interesting feature of this work is that no explicit coordinates of the points in the point sample is needed. The algorithm only needs the distance matrix as input, i.e., only distance between points in the point sample as input.Comment: Major revision, 16 figures, 47 page

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

A simplicial complex-based approach to unmixing tumor progression data

Author: A Fischer
A Jögi
A Li
A Mantovani
A Roth
A Roth
Amir Nayyeri
B Brinkman
B Schölkopf
B Schölkopf
B Westley
Brittany Terese Fasy
C Chavey
CT Mierke
D Hanahan
D Hanahan
D Pinto
D Tolliver
D Wang
DC Koboldt
E Turpin
F Balkwill
F Durocher
G Dennis Jr
G Ha
G Pennington
G Stolovitzky
GA Doyle
GH Golub
H Joensuu
H Kulbe
H Zare
J Gruhl
J Ro
JA Hartigan
JB Tenenbaum
JJ Verbeek
JN Weinstein
JS Parker
JY Li
K Heselmeyer-Haddad
K Pearson
K Yoshihara
KH Eng
L Jin
L Oesper
L Oesper
L Wasserman
LM Coussens
M Gerlinger
M Gerlinger
M van Kouwenhove
N Iida
N Navin
N Navin
NB Larson
P Comon
QX Zhang
R Desper
R Etzioni
R Goya
R Salari
R Schwartz
RA Casero
Russell Schwartz
S Alam
SA Chowdhury
SA Chowdhury
SA Eccles
SE Shackney
SL Carter
ST Roweis
T Hastie
T Imanishi
T Kanungo
TC Walser
TH Chan
Theodore Roman
TJ Hudson
W Jiao
WE Full
X Su
X Xu
Y Hashimoto
Y Hou
Y Li
Y Miki
Y Qiao
Y Tao
Y Zhao
Z Su
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Structures in High-Dimensional Data: Intrinsic Dimension and Cluster Analysis

Author: Johnsson Kerstin
Publication venue: Centre for Mathematical Sciences, Lund University
Publication date: 16/08/2016
Field of study

With today's improved measurement and data storing technologies it has become common to collect data in search for hypotheses instead of for testing hypotheses---to do exploratory data analysis. Finding patterns and structures in data is the main goal. This thesis deals with two kinds of structures that can convey relationships between different parts of data in a high-dimensional space: manifolds and clusters. They are in a way opposites of each other: a manifold structure shows that it is plausible to connect two distant points through the manifold, a clustering shows that it is plausible to separate two nearby points by assigning them to different clusters. But clusters and manifolds can also be the same: each cluster can be a manifold of its own.The first paper in this thesis concerns one specific aspect of a manifold structure, namely its dimension, also called the intrinsic dimension of the data. A novel estimator of intrinsic dimension, taking advantage of ``the curse of dimensionality'', is proposed and evaluated. It is shown that it has in general less bias than estimators from the literature and can therefore better distinguish manifolds with different dimensions.The second and third paper in this thesis concern cluster analysis of data generated by flow cytometry---a high-throughput single-cell measurement technology. In this area, clustering is performed routinely by manual assignment of data in two-dimensional plots, to identify cell populations. It is a tedious and subjective task, especially since data often has four, eight, twelve or even more dimensions, and the analysts need to decide which two dimensions to look at together, and in which order.In the second paper of the thesis a new pipeline for automated cell population identification is proposed, which can process multiple flow cytometry samples in parallel using a hierarchical model that shares information between the clusterings of the samples, thus making corresponding clusters in different samples similar while allowing for variation in cluster location and shape.In the third and final paper of the thesis, statistical tests for unimodality are investigated as a tool for quality control of automated cell population identification algorithms. It is shown that the different tests have different interpretations of unimodality and thus accept different kinds of clusters as sufficiently close to unimodal

Lund University Publications

Dimension detection via slivers

Author
Publication venue: 'The Hong Kong University of Science and Technology Library'
Publication date
Field of study

Crossref

Dimension Detection via Slivers

Author
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Dimension detection via slivers

Author: Chiu Man Kwun
Publication venue
Publication date
Field of study

We present a method to estimate the manifold dimension by analyzing the shape of simplices formed by point samples in some small neighborhoods. Approximate tangent spaces at these neighborhoods can also be reported. Let P be a set of point samples drawn from a manifold M ⊂ Rd with dimension m according to a Poisson distribution with parameter λ. Both M and λ are unknown to our algorithm. For sufficiently large λ, given any integer k ≥ 1, our algorithm correctly outputs m in O(kdm3∣P∣ log ∣P∣) time with probability 1-O( λ-κβ) for some constant β ∈ (0,1). It holds with the same probability that the angular error of each approximate tangent space reported is O(ε), where ε is a value depending on M, λ and m and limλ→∞ε= 0. We experimented with a practical variant of our algorithm and demonstrated that it does not require very high sampling density and it is competitive with several previous methods

Hong Kong University of Science and Technology Institutional Repository