5 research outputs found
Parallel Framework for Dimensionality Reduction of Large-Scale Datasets
Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution
Error Metrics for Learning Reliable Manifolds from Streaming Data
Spectral dimensionality reduction is frequently used to identify
low-dimensional structure in high-dimensional data. However, learning
manifolds, especially from the streaming data, is computationally and memory
expensive. In this paper, we argue that a stable manifold can be learned using
only a fraction of the stream, and the remaining stream can be mapped to the
manifold in a significantly less costly manner. Identifying the transition
point at which the manifold is stable is the key step. We present error metrics
that allow us to identify the transition point for a given stream by
quantitatively assessing the quality of a manifold learned using Isomap. We
further propose an efficient mapping algorithm, called S-Isomap, that can be
used to map new samples onto the stable manifold. We describe experiments on a
variety of data sets that show that the proposed approach is computationally
efficient without sacrificing accuracy
Parallel Framework for Dimensionality Reduction of Large-Scale Datasets
Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.This is an article published as Samudrala, Sai Kiranmayee, Jaroslaw Zola, Srinivas Aluru, and Baskar Ganapathysubramanian. "Parallel framework for dimensionality reduction of large-scale datasets." Scientific Programming 2015 (2015). Posted with permission.</p