Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Gao, Shi Chao; Lim, Ik Soo; Wittek, Peter; Zhao, Li

research

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Authors: Shi Chao Gao
Ik Soo Lim
Peter Wittek
Li Zhao
Publication date: 1 January 2017
Publisher: 'Foundation for Open Access Statistic'
Doi

Abstract

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu