4 research outputs found

    Détection des courts-circuits pour la réduction de dimension basée sur les graphes de voisinages

    Get PDF
    National audience– Le traitement de données en grande dimension requiert généralement une étape de réduction de dimension afin de travailler dans la dimension intrinsèque des données. Lorsque les données sont bruitées, les méthodes de réduction de dimension non linéaires peuvent être induites en erreur par l'apparition de courts-circuits dans le graphe de voisinage. La méthode proposée a pour but de supprimer ces courts-circuits à l'aide d'un graphe parcimonieux qui approxime la structure des données, dont la construction est basée sur la densité estimée des données. Abstract – Processing high dimensional datasets often makes use of a dimension reduction step. Indeed, high dimension data generally rely on a low dimension underlying structure. When the data are noisy, dimension reduction may fail because of shortcuts appearing on the graph catching the underlying structure. Our paper presents a method to suppress shortcuts in the underlying structure graph, based on a sparse graph that approximates the data structure and that is built using a data probability density estimation

    Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework

    Get PDF
    When dealing with datasets comprising high-dimensional points, it is usually advantageous to discover some data structure. A fundamental information needed to this aim is the minimum number of parameters required to describe the data while minimizing the information loss. This number, usually called intrinsic dimension, can be interpreted as the dimension of the manifold from which the input data are supposed to be drawn. Due to its usefulness in many theoretical and practical problems, in the last decades the concept of intrinsic dimension has gained considerable attention in the scientific community, motivating the large number of intrinsic dimensionality estimators proposed in the literature. However, the problem is still open since most techniques cannot efficiently deal with datasets drawn from manifolds of high intrinsic dimension and nonlinearly embedded in higher dimensional spaces. This paper surveys some of the most interesting, widespread used, and advanced state-of-the-art methodologies. Unfortunately, since no benchmark database exists in this research field, an objective comparison among different techniques is not possible. Consequently, we suggest a benchmark framework and apply it to comparatively evaluate relevant stateof-the-art estimators