Search CORE

11,785 research outputs found

Supervised dimension reduction mappings

Author: Biehl M.
Bunte K.
Hammer B.
Publication venue: d-side publishing
Publication date: 01/01/2011
Field of study

Dissertations of the University of Groningen

Supervised dimension reduction mappings

Author: Biehl M.
Bunte K.
Hammer B.
Publication venue: d-side publishing
Publication date: 01/01/2011
Field of study

ARTS repository - University of Groningen

Dimensionality Reduction Mappings

Author: Biehl Michael
Bunte Kerstin
Hammer Barbara
IEEE Computational Intelligence Society
Publication venue
Publication date: 01/01/2011
Field of study

A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Publications at Bielefeld University

University of Groningen Digital Archive

Dissertations of the University of Groningen

Representation Learning for Clustering: A Statistical Framework

Author: Ashtiani Hassan
Ben-David Shai
Publication venue
Publication date: 19/06/2015
Field of study

We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which

k

-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.Comment: To be published in Proceedings of UAI 201

arXiv.org e-Print Archive

CiteSeerX

Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models

Author: Campbell Kieran R.
Märtens Kaspar
Yau Christopher
Publication venue
Publication date: 01/01/2019
Field of study

The interpretation of complex high-dimensional data typically requires the use of dimensionality reduction techniques to extract explanatory low-dimensional representations. However, in many real-world problems these representations may not be sufficient to aid interpretation on their own, and it would be desirable to interpret the model in terms of the original features themselves. Our goal is to characterise how feature-level variation depends on latent low-dimensional representations, external covariates, and non-linear interactions between the two. In this paper, we propose to achieve this through a structured kernel decomposition in a hybrid Gaussian Process model which we call the Covariate Gaussian Process Latent Variable Model (c-GPLVM). We demonstrate the utility of our model on simulated examples and applications in disease progression modelling from high-dimensional gene expression data in the presence of additional phenotypes. In each setting we show how the c-GPLVM can extract low-dimensional structures from high-dimensional data sets whilst allowing a breakdown of feature-level variability that is not present in other commonly used dimensionality reduction approaches

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis

Author: Damianou Andreas
Ek Carl Henrik
Lawrence Neil D.
Publication venue
Publication date: 17/04/2016
Field of study

Factor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent variables can be used both for exploratory purposes and for learning representations that enable efficient inference for ambiguous estimation tasks. Learning is performed in a Bayesian manner through the formulation of a variational compression scheme which gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong regularization during training, allowing the structure of the latent space to be determined efficiently and automatically. We demonstrate this by producing the first (to our knowledge) published results of learning from dozens of views, even when data is scarce. We further show experimental results on several different types of multi-view data sets and for different kinds of tasks, including exploratory data analysis, generation, ambiguity modelling through latent priors and classification.Comment: 49 pages including appendi

arXiv.org e-Print Archive

Explore Bristol Research