Repository landing page
Randomized Approximation Methods for the Efficient Compression and Analysis of Hyperspectral Data
Abstract
Hyperspectral imaging techniques such as matrix-assisted laser desorption ionization (MALDI) mass spectrometry imaging produce large, information-rich datasets that are frequently too large to be analyzed as a whole. In addition, the “curse of dimensionality” adds fundamental limits to what can be done with such data, regardless of the resources available. We propose and evaluate random matrix-based methods for the analysis of such data, in this case, a MALDI mass spectrometry image from a section of rat brain. By constructing a randomized orthornormal basis for the data, we are able to achieve reductions in dimensionality and data size of over 100 times. Furthermore, this compression is reversible to within noise limits. This allows more-conventional multivariate analysis techniques such as principal component analysis (PCA) and clustering methods to be directly applied to the compressed data such that the results can easily be back-projected and interpreted in the original measurement space. PCA on the compressed data is shown to be nearly identical to the same analysis on the original data but the run time was reduced from over an hour to 8 seconds. We also demonstrate the generality of the method to other data sets, namely, a hyperspectral optical image of leaves, and a Raman spectroscopy image of an artificial ligament. In order to allow for the full evaluation of these methods on a wide range of data, we have made all software and sample data freely available- Dataset
- Dataset
- Biophysics
- Genetics
- Biotechnology
- Inorganic Chemistry
- Space Science
- Mathematical Sciences not elsewhere classified
- Chemical Sciences not elsewhere classified
- Physical Sciences not elsewhere classified
- Information Systems not elsewhere classified
- data sets
- mass spectrometry imaging
- multivariate analysis techniques
- Hyperspectral DataHyperspectral imaging techniques
- measurement space
- 100 times
- Raman spectroscopy image
- Randomized Approximation Methods
- PCA
- rat brain
- method
- component analysis
- sample data
- 8 seconds
- Efficient Compression
- noise limits
- randomized orthornormal basis
- data size
- MALDI mass spectrometry image