Search CORE

20 research outputs found

Manifold-adaptive dimension estimation revisited

Author: Bencze Attila István
Benkő Zsigmond
Erőss Loránd
Fabó Dániel
Hajnal Boglárka Zsófia
R. Rehus
Somogyvári Zoltán
Stippinger Marcell
Telcs András
Publication venue: 'PeerJ'
Publication date: 01/01/2022
Field of study

Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold adaptive Farahmand-Szepesvári-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected median-FSA estimator with kNN estimators: maximum likelihood (Levina-Bickel), the 2NN and two implementations of DANCo (R and MATLAB). We show that corrected median-FSA estimator beats the maximum likelihood estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones

PubMed Central

Repository of the Academy's Library

Dimension Detection with Local Homology

Author: Dey Tamal K.
Fan Fengtao
Wang Yusu
Publication venue
Publication date: 14/05/2014
Field of study

Detecting the dimension of a hidden manifold from a point sample has become an important problem in the current data-driven era. Indeed, estimating the shape dimension is often the first step in studying the processes or phenomena associated to the data. Among the many dimension detection algorithms proposed in various fields, a few can provide theoretical guarantee on the correctness of the estimated dimension. However, the correctness usually requires certain regularity of the input: the input points are either uniformly randomly sampled in a statistical setting, or they form the so-called

(\varepsilon,\delta)

-sample which can be neither too dense nor too sparse. Here, we propose a purely topological technique to detect dimensions. Our algorithm is provably correct and works under a more relaxed sampling condition: we do not require uniformity, and we also allow Hausdorff noise. Our approach detects dimension by determining local homology. The computation of this topological structure is much less sensitive to the local distribution of points, which leads to the relaxation of the sampling conditions. Furthermore, by leveraging various developments in computational topology, we show that this local homology at a point

z

can be computed \emph{exactly} for manifolds using Vietoris-Rips complexes whose vertices are confined within a local neighborhood of

z

. We implement our algorithm and demonstrate the accuracy and robustness of our method using both synthetic and real data sets

arXiv.org e-Print Archive

CiteSeerX

Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation

Author: Bac Jonathan
Gorban Alexander N.
Mirkes Evgeny M.
Tyukin Ivan Y.
Zhou Qinghua
Zinovyev Andrei
Publication venue
Publication date: 30/03/2022
Field of study

Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality

arXiv.org e-Print Archive

King's Research Portal

Complete Inference of Causal Relations between Dynamical Systems

Author: Benkő Zsigmond
Erőss Loránd
Fabó Dániel
Somogyvári Zoltán
Stippinger Marcell
Sólyom András
Telcs András
Zlatniczki Ádám
Publication venue
Publication date: 25/02/2020
Field of study

From philosophers of ancient times to modern economists, biologists and other researchers are engaged in revealing causal relations. The most challenging problem is inferring the type of the causal relationship: whether it is uni- or bi-directional or only apparent - implied by a hidden common cause only. Modern technology provides us tools to record data from complex systems such as the ecosystem of our planet or the human brain, but understanding their functioning needs detection and distinction of causal relationships of the system components without interventions. Here we present a new method, which distinguishes and assigns probabilities to the presence of all the possible causal relations between two or more time series from dynamical systems. The new method is validated on synthetic datasets and applied to EEG (electroencephalographic) data recorded in epileptic patients. Given the universality of our method, it may find application in many fields of science

arXiv.org e-Print Archive

A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

Author: Borodin Nikita S.
Gromov Vasilii A.
Yerbolova Asel S.
Publication venue
Publication date: 20/11/2023
Field of study

The present paper introduces a novel object of study - a language fractal structure. We hypothesize that a set of embeddings of all

n

-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all

n

). The paper estimates intrinsic (genuine) dimensions of language fractal structures for the Russian and English languages. To this end, we employ methods based on (1) topological data analysis and (2) a minimum spanning tree of a data graph for a cloud of points considered (Steele theorem). For both languages, for all

n

, the intrinsic dimensions appear to be non-integer values (typical for fractal sets), close to 9 for both of the Russian and English language.Comment: Preprint. Under revie

arXiv.org e-Print Archive

Neural frames: A Tool for Studying the Tangent Bundles Underlying Image Datasets and How Deep Learning Models Process Them

Author: Brown Davis
Emerson Tegan
Godfrey Charles
Jorgenson Grayson
Kvinge Henry
Publication venue
Publication date: 18/11/2022
Field of study

The assumption that many forms of high-dimensional data, such as images, actually live on low-dimensional manifolds, sometimes known as the manifold hypothesis, underlies much of our intuition for how and why deep learning works. Despite the central role that they play in our intuition, data manifolds are surprisingly hard to measure in the case of high-dimensional, sparsely sampled image datasets. This is particularly frustrating since the capability to measure data manifolds would provide a revealing window into the inner workings and dynamics of deep learning models. Motivated by this, we introduce neural frames, a novel and easy to use tool inspired by the notion of a frame from differential geometry. Neural frames can be used to explore the local neighborhoods of data manifolds as they pass through the hidden layers of neural networks even when one only has a single datapoint available. We present a mathematical framework for neural frames and explore some of their properties. We then use them to make a range of observations about how modern model architectures and training routines, such as heavy augmentation and adversarial training, affect the local behavior of a model.Comment: 21 page

arXiv.org e-Print Archive