6 research outputs found
Testing for Multivariate Normality in Mass Spectrometry Imaging Data: A Robust Statistical Approach for Clustering Evaluation and the Generation of Synthetic Mass Spectrometry Imaging Data Sets
Spatial clustering
is a powerful tool in mass spectrometry imaging
(MSI) and has been demonstrated to be capable of differentiating tumor
types, visualizing intratumor heterogeneity, and segmenting anatomical
structures. Several clustering methods have been applied to mass spectrometry
imaging data, but a principled comparison and evaluation of different
clustering techniques presents a significant challenge. We propose
that testing whether the data has a multivariate normal distribution
within clusters can be used to evaluate the performance when using
algorithms that assume normality in the data, such as <i>k</i>-means clustering. In cases where clustering has been performed using
the cosine distance, conversion of the data to polar coordinates prior
to normality testing should be performed to ensure normality is tested
in the correct coordinate system. In addition to these evaluations
of internal consistency, we demonstrate that the multivariate normal
distribution can then be used as a basis for statistical modeling
of MSI data. This allows the generation of synthetic MSI data sets
with known ground truth, providing a means of external clustering
evaluation. To demonstrate this, reference data from seven anatomical
regions of an MSI image of a coronal section of mouse brain were modeled.
From this, a set of synthetic data based on this model was generated.
Results of <i>r</i><sup>2</sup> fitting of the chi-squared
quantile–quantile plots on the seven anatomical regions confirmed
that the data acquired from each spatial region was found to be closer
to normally distributed in polar space than in Euclidean. Finally,
principal component analysis was applied to a single data set that
included synthetic and real data. No significant differences were
found between the two data types, indicating the suitability of these
methods for generating realistic synthetic data
MALDI Imaging of Liquid Extraction Surface Analysis Sampled Tissue
Combined mass spectrometry imaging
methods in which two different
techniques are executed on the same sample have recently been reported
for a number of sample types. Such an approach can be used to examine
the sampling effects of the first technique with a second, higher
resolution method and also combines the advantages of each technique
for a more complete analysis. In this work matrix-assisted laser desorption
ionization mass spectrometry imaging (MALDI MSI) was used to study
the effects of liquid extraction surface analysis (LESA) sampling
on mouse brain tissue. Complementary multivariate analysis techniques
including principal component analysis, non-negative matrix factorization,
and <i>t</i>-distributed stochastic neighbor embedding were
applied to MALDI MS images acquired from tissue which had been sampled
by LESA to gain a better understanding of localized tissue washing
in LESA sampling. It was found that MALDI MS images could be used
to visualize regions sampled by LESA. The variability in sampling
area, spatial precision, and delocalization of analytes in tissue
induced by LESA were assessed using both single-ion images and images
provided by multivariate analysis
LESA FAIMS Mass Spectrometry for the Spatial Profiling of Proteins from Tissue
We
have shown previously that coupling of high field asymmetric
waveform ion mobility spectrometry (FAIMS), also known as differential
ion mobility, with liquid extraction surface analysis (LESA) mass
spectrometry of tissue results in significant improvements in the
resulting protein mass spectra. Here, we demonstrate LESA FAIMS mass
spectrometry imaging of proteins in sections of mouse brain and liver
tissue. The results are compared with LESA mass spectrometry images
obtained in the absence of FAIMS. The results show that the number
of different protein species detected can be significantly increased
by incorporating FAIMS into the workflow. A total of 34 proteins were
detected by LESA FAIMS mass spectrometry imaging of mouse brain, of
which 26 were unique to FAIMS, compared with 15 proteins (7 unique)
detected by LESA mass spectrometry imaging. A number of proteins were
identified including α-globin, 6.8 kDa mitochondrial proteolipid,
macrophage migration inhibitory factor, ubiquitin, β-thymosin
4, and calmodulin. A total of 40 species were detected by LESA FAIMS
mass spectrometry imaging of mouse liver, of which 29 were unique
to FAIMS, compared with 24 proteins (13 unique) detected by LESA mass
spectrometry imaging. The spatial distributions of proteins identified
in both LESA mass spectrometry imaging and LESA FAIMS mass spectrometry
imaging were in good agreement indicating that FAIMS is a suitable
tool for inclusion in mass spectrometry imaging workflows
Raster-Mode Continuous-Flow Liquid Microjunction Mass Spectrometry Imaging of Proteins in Thin Tissue Sections
Mass
spectrometry imaging by use of continuous-flow liquid microjunction
sampling at discrete locations (array mode) has previously been demonstrated.
In this Letter, we demonstrate continuous-flow liquid microjunction
mass spectrometry imaging of proteins from thin tissue sections in
raster mode and discuss advantages (a 10-fold reduction in analysis
time) and challenges (suitable solvent systems, data interpretation)
of the approach. Visualization of data is nontrivial, requiring correlation
of solvent-flow, mass spectral data acquisition rate, data quality,
and liquid microjunction sampling area. The latter is particularly
important for determining optimum pixel size. The minimum achievable
pixel size is related to the scan time of the instrument used. Here
we show a minimum achievable pixel size of 50 μm (<i>x</i>-dimension) when using an Orbitrap Elite; however a pixel size of
600 μm is recommended in order to minimize the effects of oversampling
on image accuracy
Memory Efficient Principal Component Analysis for the Dimensionality Reduction of Large Mass Spectrometry Imaging Data Sets
A memory efficient algorithm for
the computation of principal component
analysis (PCA) of large mass spectrometry imaging data sets is presented.
Mass spectrometry imaging (MSI) enables two- and three-dimensional
overviews of hundreds of unlabeled molecular species in complex samples
such as intact tissue. PCA, in combination with data binning or other
reduction algorithms, has been widely used in the unsupervised processing
of MSI data and as a dimentionality reduction method prior to clustering
and spatial segmentation. Standard implementations of PCA require
the data to be stored in random access memory. This imposes an upper
limit on the amount of data that can be processed, necessitating a
compromise between the number of pixels and the number of peaks to
include. With increasing interest in multivariate analysis of large
3D multislice data sets and ongoing improvements in instrumentation,
the ability to retain all pixels and many more peaks is increasingly
important. We present a new method which has no limitation on the
number of pixels and allows an increased number of peaks to be retained.
The new technique was validated against the MATLAB (The MathWorks
Inc., Natick, Massachusetts) implementation of PCA (<i>princomp</i>) and then used to reduce, without discarding peaks or pixels, multiple
serial sections acquired from a single mouse brain which was too large
to be analyzed with <i>princomp</i>. Then, <i>k</i>-means clustering was performed on the reduced data set. We further
demonstrate with simulated data of 83 slices, comprising 20 535
pixels per slice and equaling 44 GB of data, that the new method can
be used in combination with existing tools to process an entire organ.
MATLAB code implementing the memory efficient PCA algorithm is provided
Two-Phase and Graph-Based Clustering Methods for Accurate and Efficient Segmentation of Large Mass Spectrometry Images
Clustering
is widely used in MSI to segment anatomical features
and differentiate tissue types, but existing approaches are both CPU
and memory-intensive, limiting their application to small, single
data sets. We propose a new approach that uses a graph-based algorithm
with a two-phase sampling method that overcomes this limitation. We
demonstrate the algorithm on a range of sample types and show that
it can segment anatomical features that are not identified using commonly
employed algorithms in MSI, and we validate our results on synthetic
MSI data. We show that the algorithm is robust to fluctuations in
data quality by successfully clustering data with a designed-in variance
using data acquired with varying laser fluence. Finally, we show that
this method is capable of generating accurate segmentations of large
MSI data sets acquired on the newest generation of MSI instruments
and evaluate these results by comparison with histopathology