20 research outputs found

    Manifold-adaptive dimension estimation revisited

    Get PDF
    Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold adaptive Farahmand-Szepesvári-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected median-FSA estimator with kNN estimators: maximum likelihood (Levina-Bickel), the 2NN and two implementations of DANCo (R and MATLAB). We show that corrected median-FSA estimator beats the maximum likelihood estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones

    Dimension Detection with Local Homology

    Full text link
    Detecting the dimension of a hidden manifold from a point sample has become an important problem in the current data-driven era. Indeed, estimating the shape dimension is often the first step in studying the processes or phenomena associated to the data. Among the many dimension detection algorithms proposed in various fields, a few can provide theoretical guarantee on the correctness of the estimated dimension. However, the correctness usually requires certain regularity of the input: the input points are either uniformly randomly sampled in a statistical setting, or they form the so-called (ε,δ)(\varepsilon,\delta)-sample which can be neither too dense nor too sparse. Here, we propose a purely topological technique to detect dimensions. Our algorithm is provably correct and works under a more relaxed sampling condition: we do not require uniformity, and we also allow Hausdorff noise. Our approach detects dimension by determining local homology. The computation of this topological structure is much less sensitive to the local distribution of points, which leads to the relaxation of the sampling conditions. Furthermore, by leveraging various developments in computational topology, we show that this local homology at a point zz can be computed \emph{exactly} for manifolds using Vietoris-Rips complexes whose vertices are confined within a local neighborhood of zz. We implement our algorithm and demonstrate the accuracy and robustness of our method using both synthetic and real data sets

    Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation

    Full text link
    Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality

    Complete Inference of Causal Relations between Dynamical Systems

    Full text link
    From philosophers of ancient times to modern economists, biologists and other researchers are engaged in revealing causal relations. The most challenging problem is inferring the type of the causal relationship: whether it is uni- or bi-directional or only apparent - implied by a hidden common cause only. Modern technology provides us tools to record data from complex systems such as the ecosystem of our planet or the human brain, but understanding their functioning needs detection and distinction of causal relationships of the system components without interventions. Here we present a new method, which distinguishes and assigns probabilities to the presence of all the possible causal relations between two or more time series from dynamical systems. The new method is validated on synthetic datasets and applied to EEG (electroencephalographic) data recorded in epileptic patients. Given the universality of our method, it may find application in many fields of science

    A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

    Full text link
    The present paper introduces a novel object of study - a language fractal structure. We hypothesize that a set of embeddings of all nn-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all nn). The paper estimates intrinsic (genuine) dimensions of language fractal structures for the Russian and English languages. To this end, we employ methods based on (1) topological data analysis and (2) a minimum spanning tree of a data graph for a cloud of points considered (Steele theorem). For both languages, for all nn, the intrinsic dimensions appear to be non-integer values (typical for fractal sets), close to 9 for both of the Russian and English language.Comment: Preprint. Under revie

    Neural frames: A Tool for Studying the Tangent Bundles Underlying Image Datasets and How Deep Learning Models Process Them

    Full text link
    The assumption that many forms of high-dimensional data, such as images, actually live on low-dimensional manifolds, sometimes known as the manifold hypothesis, underlies much of our intuition for how and why deep learning works. Despite the central role that they play in our intuition, data manifolds are surprisingly hard to measure in the case of high-dimensional, sparsely sampled image datasets. This is particularly frustrating since the capability to measure data manifolds would provide a revealing window into the inner workings and dynamics of deep learning models. Motivated by this, we introduce neural frames, a novel and easy to use tool inspired by the notion of a frame from differential geometry. Neural frames can be used to explore the local neighborhoods of data manifolds as they pass through the hidden layers of neural networks even when one only has a single datapoint available. We present a mathematical framework for neural frames and explore some of their properties. We then use them to make a range of observations about how modern model architectures and training routines, such as heavy augmentation and adversarial training, affect the local behavior of a model.Comment: 21 page
    corecore