Search CORE

66 research outputs found

Learning a kernel matrix for nonlinear dimensionality reduction

Author: Saul Lawrence K
Sha Fei
Weinberger Kilian Q
Publication venue: ScholarlyCommons
Publication date: 01/01/2004
Field of study

We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. Noting that the kernel matrix implicitly maps the data into a nonlinear feature space, we show how to discover a mapping that unfolds the underlying manifold from which the data was sampled. The kernel matrix is constructed by maximizing the variance in feature space subject to local constraints that preserve the angles and distances between nearest neighbors. The main optimization involves an instance of semidefinite programming---a fundamentally different computation than previous algorithms for manifold learning, such as Isomap and locally linear embedding. The optimized kernels perform better than polynomial and Gaussian kernels for problems in manifold learning, but worse for problems in large margin classification. We explain these results in terms of the geometric properties of different kernels and comment on various interpretations of other manifold learning algorithms as kernel methods

CiteSeerX

ScholarlyCommons@Penn

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Author: Gao Shi Chao
Lim Ik Soo
Wittek Peter
Zhao Li
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2017
Field of study

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Journal of Statistical Software

Bangor University Research Portal