4 research outputs found
Détection des courts-circuits pour la réduction de dimension basée sur les graphes de voisinages
National audience– Le traitement de données en grande dimension requiert généralement une étape de réduction de dimension afin de travailler dans la dimension intrinsèque des données. Lorsque les données sont bruitées, les méthodes de réduction de dimension non linéaires peuvent être induites en erreur par l'apparition de courts-circuits dans le graphe de voisinage. La méthode proposée a pour but de supprimer ces courts-circuits à l'aide d'un graphe parcimonieux qui approxime la structure des données, dont la construction est basée sur la densité estimée des données. Abstract – Processing high dimensional datasets often makes use of a dimension reduction step. Indeed, high dimension data generally rely on a low dimension underlying structure. When the data are noisy, dimension reduction may fail because of shortcuts appearing on the graph catching the underlying structure. Our paper presents a method to suppress shortcuts in the underlying structure graph, based on a sparse graph that approximates the data structure and that is built using a data probability density estimation
Recommended from our members
Graph Construction for Manifold Discovery
Manifold learning is a class of machine learning methods that exploits the observation that high-dimensional data tend to lie on a smooth lower-dimensional manifold. Manifold discovery is the essential first component of manifold learning methods, in which the manifold structure is inferred from available data. This task is typically posed as a graph construction problem: selecting a set of vertices and edges that most closely approximates the true underlying manifold. The quality of this learned graph is critical to the overall accuracy of the manifold learning method. Thus, it is essential to develop accurate, efficient, and reliable algorithms for constructing manifold approximation graphs. To aid in this investigation of graph construction methods, we propose new methods for evaluating graph quality. These quality measures act as a proxy for ground-truth manifold approximation error and are applicable even when prior information about the dataset is limited. We then develop an incremental update scheme for some quality measures, demonstrating their usefulness for efficient parameter tuning. We then propose two novel methods for graph construction, the Manifold Spanning Graph and the Mutual Neighbors Graph algorithms. Each method leverages assumptions about the structure of both the input data and the subsequent manifold learning task. The algorithms are experimentally validated against state of the art graph construction techniques on a multi-disciplinary set of application domains, including image classification, directional audio prediction, and spectroscopic analysis. The final contribution of the thesis is a method for aligning sequential datasets while still respecting each set’s internal manifold structure. The use of high quality manifold approximation graphs enables accurate alignments with few ground-truth correspondences
Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework
When dealing with datasets comprising high-dimensional points, it is usually advantageous to discover some data structure. A fundamental information needed to this aim is the minimum number of parameters required to describe the data while minimizing the information loss. This number, usually called intrinsic dimension, can be interpreted as the dimension of the manifold from which the input data are supposed to be drawn. Due to its usefulness in many theoretical and practical problems, in the last decades the concept of intrinsic dimension has gained considerable attention in the scientific community, motivating the large number of intrinsic dimensionality estimators proposed in the literature. However, the problem is still open since most techniques cannot efficiently deal with datasets drawn from manifolds of high intrinsic dimension and nonlinearly embedded in higher dimensional spaces. This paper surveys some of the most interesting, widespread used, and advanced state-of-the-art methodologies. Unfortunately, since no benchmark database exists in this research field, an objective comparison among different techniques is not possible. Consequently, we suggest a benchmark framework and apply it to comparatively evaluate relevant stateof-the-art estimators