12 research outputs found

    Estimating the intrinsic dimension of datasets by a minimal neighborhood information

    Get PDF
    Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a manifold whose Intrinsic Dimension (ID) is much lower than the crude large number of coordinates. Such manifold is generally twisted and curved; in addition points on it will be non-uniformly distributed: two factors that make the identification of the ID and its exploitation really hard. Here we propose a new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample. This extreme minimality enables us to reduce the effects of curvature, of density variation, and the resulting computational cost. The ID estimator is theoretically exact in uniformly distributed datasets, and provides consistent measures in general. When used in combination with block analysis, it allows discriminating the relevant dimensions as a function of the block size. This allows estimating the ID even when the data lie on a manifold perturbed by a high-dimensional noise, a situation often encountered in real world data sets. We demonstrate the usefulness of the approach on molecular simulations and image analysis

    Non-parametric learning critical behavior in Ising partition functions: PCA entropy and intrinsic dimension

    Full text link
    We provide and critically analyze a framework to learn critical behavior in classical partition functions through the application of non-parametric methods to data sets of thermal configurations. We illustrate our approach in phase transitions in 2D and 3D Ising models. First, we extend previous studies on the intrinsic dimension of 2D partition function data sets, by exploring the effect of volume in 3D Ising data. We find that as opposed to 2D systems for which this quantity has been successfully used in unsupervised characterizations of critical phenomena, in the 3D case its estimation is far more challenging. To circumvent this limitation, we then use the principal component analysis (PCA) entropy, a "Shannon entropy" of the normalized spectrum of the covariance matrix. We find a striking qualitative similarity to the thermodynamic entropy, which the PCA entropy approaches asymptotically. The latter allows us to extract -- through a conventional finite-size scaling analysis with modest lattice sizes -- the critical temperature with less than 1%1\% error for both 2D and 3D models while being computationally efficient. The PCA entropy can readily be applied to characterize correlations and critical phenomena in a huge variety of many-body problems and suggests a (direct) link between easy-to-compute quantities and entropies.Comment: Corrected affiliation informatio

    Non-parametric learning critical behavior in Ising partition functions: PCA entropy and intrinsic dimension

    Get PDF
    We provide and critically analyze a framework to learn critical behavior in classical partition functions through the application of non-parametric methods to data sets of thermal configurations. We illustrate our approach in phase transitions in 2D and 3D Ising models. First, we extend previous studies on the intrinsic dimension of 2D partition function data sets, by exploring the effect of volume in 3D Ising data. We find that as opposed to 2D systems for which this quantity has been successfully used in unsupervised characterizations of critical phenomena, in the 3D case its estimation is far more challenging. To circumvent this limitation, we then use the principal component analysis (PCA) entropy, a "Shannon entropy" of the normalized spectrum of the covariance matrix. We find a striking qualitative similarity to the thermodynamic entropy, which the PCA entropy approaches asymptotically. The latter allows us to extract-through a conventional finite-size scaling analysis with modest lattice sizes-the critical temperature with less than 1% error for both 2D and 3D models while being computationally efficient. The PCA entropy can readily be applied to characterize correlations and critical phenomena in a huge variety of many-body problems and suggests a (direct) link between easy-to compute quantities and entropies

    Optic Flow Statistics and Intrinsic Dimensionality

    Get PDF
    Different kinds of visual sub-structures can be distinguished by the intrinsic dimensionality of the local signals. The concept of intrinsic dimensionality has been mostly exercised using discrete formulations. A recent work (Kruger and Felsberg, 2003; Felsberg and Kruger, 2003) introduced a continuous definition and showed that the inherent structure of the intrinsic dimensionality has essentially the form of a triangle. The current study work analyzes the distribution of signals according to the continuous interpretation of intrinsic dimensionality and the relation to orientation and optic flow features of image patches. Among other things, we give a quantitative interpretation of the distribution of signals according to their intrinsic dimensionality that reveals specific patterns associated to established sub-structures in computer vision. Furthermore, we link quantitative and qualitative properties of the distribution of optic-flow error estimates to these patterns

    A Novel Representation for Two-dimensional Image Structures

    Get PDF
    This paper presents a novel approach towards two-dimensional (2D) image structures modeling. To obtain more degrees of freedom, a 2D image signal is embedded into a certain geometric algebra. Coupling methods of differential geometry, tensor algebra, monogenic signal and quadrature filter, we can design a general model for 2D structures as the monogenic extension of a curvature tensor. Based on it, a local representation for the intrinsically two-dimensional (i2D) structure is derived as the monogenic curvature signal. From it, independent features of local amplitude, phase and orientation are simultaneously extracted. Besides, a monogenic curvature scale-space can be built by applying a Poisson kernel to the monogenic curvature signal. Compared with the other related work, the remarkable advantage of our approach lies in the rotationally invariant phase evaluation of 2D structures in a multi-scale framework, which delivers access to phase-based processing in many computer vision tasks

    Signal-Symbol Feedback Process in a

    Get PDF

    Intrinsic dimensionality in vision: Nonlinear filter design and applications

    Get PDF
    Biological vision and computer vision cannot be treated independently anymore. The digital revolution and the emergence of more and more sophisticated technical applications caused a symbiosis between the two communities. Competitive technical devices challenging the human performance rely increasingly on algorithms motivated by the human vision system. On the other hand, computational methods can be used to gain a richer understanding of neural behavior, e.g. the behavior of populations of multiple processing units. The relations between computational approaches and biological findings range from low level vision to cortical areas being responsible for higher cognitive abilities. In early stages of the visual cortex cells have been recorded which could not be explained by the standard approach of orientation- and frequency-selective linear filters anymore. These cells did not respond to straight lines or simple gratings but they fired whenever a more complicated stimulus, like a corner or an end-stopped line, was presented within the receptive field. Using the concept of intrinsic dimensionality, these cells can be classified as intrinsic-two-dimensional systems. The intrinsic dimensionality determines the number of degrees of freedom in the domain which is required to completely determine a signal. A constant image has dimension zero, straight lines and trigonometric functions in one direction have dimension one, and the remaining signals, which require the full number of degrees of freedom, have the dimension two. In this term the reported cells respond to two dimensional signals only. Motivated by the classical approach, which can be realized by orientation- and frequency-selective Gabor-filter functions, a generalized Gabor framework is developed in the context of second-order Volterra systems. The generalized Gabor approach is then used to design intrinsic two-dimensional systems which have the same selectivity properties like the reported cells in early visual cortex. Numerical cognition is commonly assumed to be a higher cognitive ability of humans. The estimation of the number of things from the environment requires a high degree of abstraction. Several studies showed that humans and other species have access to this abstract information. But it is still unclear how this information can be extracted by neural hardware. If one wants to deal with this issue, one has to think about the immense invariance property of number. One can apply a high number of operations to objects which do not change its number. In this work, this problem is considered from a topological perspective. Well known relations between differential geometry and topology are used to develop a computational model. Surprisingly, the resulting operators providing the features which are integrated in the system are intrinsic-two-dimensional operators. This model is used to conduct standard number estimation experiments. The results are then compared to reported human behavior. The last topic of this work is active object recognition. The ability to move the information gathering device, like humans can move their eyes, provides the opportunity to choose the next action. Studies of human saccade behavior suggest that this is not done in a random manner. In order to decrease the time an active object recognition system needs to reach a certain level of performance, several action selection strategies are investigated. The strategies considered within this work are based on information theoretical and probabilistic concepts. These strategies are finally compared to a strategy based on an intrinsic-two-dimensional operator. All three topics are investigated with respect to their relation to the concept of intrinsic dimensionality from a mathematical point of view
    corecore