20 research outputs found
Manifold-adaptive dimension estimation revisited
Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold adaptive Farahmand-Szepesvári-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected median-FSA estimator with kNN estimators: maximum likelihood (Levina-Bickel), the 2NN and two implementations of DANCo (R and MATLAB). We show that corrected median-FSA estimator beats the maximum likelihood estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones
Dimension Detection with Local Homology
Detecting the dimension of a hidden manifold from a point sample has become
an important problem in the current data-driven era. Indeed, estimating the
shape dimension is often the first step in studying the processes or phenomena
associated to the data. Among the many dimension detection algorithms proposed
in various fields, a few can provide theoretical guarantee on the correctness
of the estimated dimension. However, the correctness usually requires certain
regularity of the input: the input points are either uniformly randomly sampled
in a statistical setting, or they form the so-called
-sample which can be neither too dense nor too sparse.
Here, we propose a purely topological technique to detect dimensions. Our
algorithm is provably correct and works under a more relaxed sampling
condition: we do not require uniformity, and we also allow Hausdorff noise. Our
approach detects dimension by determining local homology. The computation of
this topological structure is much less sensitive to the local distribution of
points, which leads to the relaxation of the sampling conditions. Furthermore,
by leveraging various developments in computational topology, we show that this
local homology at a point can be computed \emph{exactly} for manifolds
using Vietoris-Rips complexes whose vertices are confined within a local
neighborhood of . We implement our algorithm and demonstrate the accuracy
and robustness of our method using both synthetic and real data sets
Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation
Finding best architectures of learning machines, such as deep neural
networks, is a well-known technical and theoretical challenge. Recent work by
Mellor et al (2021) showed that there may exist correlations between the
accuracies of trained networks and the values of some easily computable
measures defined on randomly initialised networks which may enable to search
tens of thousands of neural architectures without training. Mellor et al used
the Hamming distance evaluated over all ReLU neurons as such a measure.
Motivated by these findings, in our work, we ask the question of the existence
of other and perhaps more principled measures which could be used as
determinants of success of a given neural architecture. In particular, we
examine, if the dimensionality and quasi-orthogonality of neural networks'
feature space could be correlated with the network's performance after
training. We showed, using the setup as in Mellor et al, that dimensionality
and quasi-orthogonality may jointly serve as network's performance
discriminants. In addition to offering new opportunities to accelerate neural
architecture search, our findings suggest important relationships between the
networks' final performance and properties of their randomly initialised
feature spaces: data dimension and quasi-orthogonality
Complete Inference of Causal Relations between Dynamical Systems
From philosophers of ancient times to modern economists, biologists and other
researchers are engaged in revealing causal relations. The most challenging
problem is inferring the type of the causal relationship: whether it is uni- or
bi-directional or only apparent - implied by a hidden common cause only. Modern
technology provides us tools to record data from complex systems such as the
ecosystem of our planet or the human brain, but understanding their functioning
needs detection and distinction of causal relationships of the system
components without interventions. Here we present a new method, which
distinguishes and assigns probabilities to the presence of all the possible
causal relations between two or more time series from dynamical systems. The
new method is validated on synthetic datasets and applied to EEG
(electroencephalographic) data recorded in epileptic patients. Given the
universality of our method, it may find application in many fields of science
A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures
The present paper introduces a novel object of study - a language fractal
structure. We hypothesize that a set of embeddings of all -grams of a
natural language constitutes a representative sample of this fractal set. (We
use the term Hailonakea to refer to the sum total of all language fractal
structures, over all ). The paper estimates intrinsic (genuine) dimensions
of language fractal structures for the Russian and English languages. To this
end, we employ methods based on (1) topological data analysis and (2) a minimum
spanning tree of a data graph for a cloud of points considered (Steele
theorem). For both languages, for all , the intrinsic dimensions appear to
be non-integer values (typical for fractal sets), close to 9 for both of the
Russian and English language.Comment: Preprint. Under revie
Neural frames: A Tool for Studying the Tangent Bundles Underlying Image Datasets and How Deep Learning Models Process Them
The assumption that many forms of high-dimensional data, such as images,
actually live on low-dimensional manifolds, sometimes known as the manifold
hypothesis, underlies much of our intuition for how and why deep learning
works. Despite the central role that they play in our intuition, data manifolds
are surprisingly hard to measure in the case of high-dimensional, sparsely
sampled image datasets. This is particularly frustrating since the capability
to measure data manifolds would provide a revealing window into the inner
workings and dynamics of deep learning models. Motivated by this, we introduce
neural frames, a novel and easy to use tool inspired by the notion of a frame
from differential geometry. Neural frames can be used to explore the local
neighborhoods of data manifolds as they pass through the hidden layers of
neural networks even when one only has a single datapoint available. We present
a mathematical framework for neural frames and explore some of their
properties. We then use them to make a range of observations about how modern
model architectures and training routines, such as heavy augmentation and
adversarial training, affect the local behavior of a model.Comment: 21 page