8 research outputs found
HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters
Projection algorithms such as t-SNE or UMAP are useful for the visualization
of high dimensional data, but depend on hyperparameters which must be tuned
carefully. Unfortunately, iteratively recomputing projections to find the
optimal hyperparameter value is computationally intensive and unintuitive due
to the stochastic nature of these methods. In this paper we propose HyperNP, a
scalable method that allows for real-time interactive hyperparameter
exploration of projection methods by training neural network approximations.
HyperNP can be trained on a fraction of the total data instances and
hyperparameter configurations and can compute projections for new data and
hyperparameters at interactive speeds. HyperNP is compact in size and fast to
compute, thus allowing it to be embedded in lightweight visualization systems
such as web browsers. We evaluate the performance of the HyperNP across three
datasets in terms of performance and speed. The results suggest that HyperNP is
accurate, scalable, interactive, and appropriate for use in real-world
settings
Local Distance Preserving Auto-encoders using Continuous k-Nearest Neighbours Graphs
Auto-encoder models that preserve similarities in the data are a popular tool
in representation learning. In this paper we introduce several auto-encoder
models that preserve local distances when mapping from the data space to the
latent space. We use a local distance preserving loss that is based on the
continuous k-nearest neighbours graph which is known to capture topological
features at all scales simultaneously. To improve training performance, we
formulate learning as a constraint optimisation problem with local distance
preservation as the main objective and reconstruction accuracy as a constraint.
We generalise this approach to hierarchical variational auto-encoders thus
learning generative models with geometrically consistent latent and data
spaces. Our method provides state-of-the-art performance across several
standard datasets and evaluation metrics
Graph-based Methods for Visualization and Clustering
The amount of data that we produce and consume is larger than it has been at any point in the history of mankind, and it keeps growing exponentially. All this information, gathered in overwhelming volumes, often comes with two problematic characteristics: it is complex and deprived of semantical context. A common step to address those issues is to embed raw data in lower dimensions, by finding a mapping which preserves the similarity between data points from their original space to a new one. Measuring similarity between large sets of high-dimensional objects is, however, problematic for two main reasons: first, high-dimensional points are subject to the curse of dimensionality and second, the number of pairwise distances between points is quadratic with respect to the amount of data points. Both problems can be addressed by using nearest neighbours graphs to understand the structure in data. As a matter of fact, most dimensionality reduction methods use similarity matrices that can be interpreted as graph adjacency matrices. Yet, despite recent progresses, dimensionality reduction is still very challenging when applied to very large datasets. Indeed, although recent methods specifically address the problem of scaleability, processing datasets of millions of elements remain a very lengthy process. In this thesis, we propose new contributions which address the problem of scaleability using the framework of Graph Signal Processing, which extends traditional signal processing to graphs. We do so motivated by the premise that graphs are well suited to represent the structure of the data. In the first part of this thesis, we look at quantitative measures for the evaluation of dimensionality reduction methods. Using tools from graph theory and Graph Signal Processing, we show that specific characteristics related to quality can be assessed by taking measures on the graph, which indirectly validates the hypothesis relating graph to structure. The second contribution is a new method for a fast eigenspace approximation of the graph Laplacian. Using principles of GSP and random matrices, we show that an approximated eigensubpace can be recovered very efficiently, which be used for fast spectral clustering or visualization. Next, we propose a compressive scheme to accelerate any dimensionality reduction technique. The idea is based on compressive sampling and transductive learning on graphs: after computing the embedding for a small subset of data points, we propagate the information everywhere using transductive inference. The key components of this technique are a good sampling strategy to select the subset and the application of transductive learning on graphs. Finally, we address the problem of over-discriminative feature spaces by proposing a hierarchical clustering structure combined with multi-resolution graphs. Using efficient coarsening and refinement procedures on this structure, we show that dimensionality reduction algorithms can be run on intermediate levels and up-sampled to all points leading to a very fast dimensionality reduction method. For all contributions, we provide extensive experiments on both synthetic and natural datasets, including large-scale problems. This allows us to show the pertinence of our models and the validity of our proposed algorithms. Following reproducible principles, we provide everything needed to repeat the examples and the experiments presented in this work