11 research outputs found
Randomized Approximation Methods for the Efficient Compression and Analysis of Hyperspectral Data
Hyperspectral imaging techniques
such as matrix-assisted laser
desorption ionization (MALDI) mass spectrometry imaging produce large,
information-rich datasets that are frequently too large to be analyzed
as a whole. In addition, the “curse of dimensionality”
adds fundamental limits to what can be done with such data, regardless
of the resources available. We propose and evaluate random matrix-based
methods for the analysis of such data, in this case, a MALDI mass
spectrometry image from a section of rat brain. By constructing a
randomized orthornormal basis for the data, we are able to achieve
reductions in dimensionality and data size of over 100 times. Furthermore,
this compression is reversible to within noise limits. This allows
more-conventional multivariate analysis techniques such as principal
component analysis (PCA) and clustering methods to be directly applied
to the compressed data such that the results can easily be back-projected
and interpreted in the original measurement space. PCA on the compressed
data is shown to be nearly identical to the same analysis on the
original data but the run time was reduced from over an hour to 8
seconds. We also demonstrate the generality of the method to other data
sets, namely, a hyperspectral optical image of leaves, and a Raman
spectroscopy image of an artificial ligament. In order to allow for
the full evaluation of these methods on a wide range of data, we have
made all software and sample data freely available
Testing for Multivariate Normality in Mass Spectrometry Imaging Data: A Robust Statistical Approach for Clustering Evaluation and the Generation of Synthetic Mass Spectrometry Imaging Data Sets
Spatial clustering
is a powerful tool in mass spectrometry imaging
(MSI) and has been demonstrated to be capable of differentiating tumor
types, visualizing intratumor heterogeneity, and segmenting anatomical
structures. Several clustering methods have been applied to mass spectrometry
imaging data, but a principled comparison and evaluation of different
clustering techniques presents a significant challenge. We propose
that testing whether the data has a multivariate normal distribution
within clusters can be used to evaluate the performance when using
algorithms that assume normality in the data, such as <i>k</i>-means clustering. In cases where clustering has been performed using
the cosine distance, conversion of the data to polar coordinates prior
to normality testing should be performed to ensure normality is tested
in the correct coordinate system. In addition to these evaluations
of internal consistency, we demonstrate that the multivariate normal
distribution can then be used as a basis for statistical modeling
of MSI data. This allows the generation of synthetic MSI data sets
with known ground truth, providing a means of external clustering
evaluation. To demonstrate this, reference data from seven anatomical
regions of an MSI image of a coronal section of mouse brain were modeled.
From this, a set of synthetic data based on this model was generated.
Results of <i>r</i><sup>2</sup> fitting of the chi-squared
quantile–quantile plots on the seven anatomical regions confirmed
that the data acquired from each spatial region was found to be closer
to normally distributed in polar space than in Euclidean. Finally,
principal component analysis was applied to a single data set that
included synthetic and real data. No significant differences were
found between the two data types, indicating the suitability of these
methods for generating realistic synthetic data
Memory Efficient Principal Component Analysis for the Dimensionality Reduction of Large Mass Spectrometry Imaging Data Sets
A memory efficient algorithm for
the computation of principal component
analysis (PCA) of large mass spectrometry imaging data sets is presented.
Mass spectrometry imaging (MSI) enables two- and three-dimensional
overviews of hundreds of unlabeled molecular species in complex samples
such as intact tissue. PCA, in combination with data binning or other
reduction algorithms, has been widely used in the unsupervised processing
of MSI data and as a dimentionality reduction method prior to clustering
and spatial segmentation. Standard implementations of PCA require
the data to be stored in random access memory. This imposes an upper
limit on the amount of data that can be processed, necessitating a
compromise between the number of pixels and the number of peaks to
include. With increasing interest in multivariate analysis of large
3D multislice data sets and ongoing improvements in instrumentation,
the ability to retain all pixels and many more peaks is increasingly
important. We present a new method which has no limitation on the
number of pixels and allows an increased number of peaks to be retained.
The new technique was validated against the MATLAB (The MathWorks
Inc., Natick, Massachusetts) implementation of PCA (<i>princomp</i>) and then used to reduce, without discarding peaks or pixels, multiple
serial sections acquired from a single mouse brain which was too large
to be analyzed with <i>princomp</i>. Then, <i>k</i>-means clustering was performed on the reduced data set. We further
demonstrate with simulated data of 83 slices, comprising 20 535
pixels per slice and equaling 44 GB of data, that the new method can
be used in combination with existing tools to process an entire organ.
MATLAB code implementing the memory efficient PCA algorithm is provided
Electroporation confocal microscopy images.
<p>Left: Cy3 and Cy5 tagged DNA (<b>S1:S2</b>) duplex added to cells via electroporation and imaged using confocal microscopy. Images A/E represents the Cy3 channel; B/F the Cy5 channel the nuclear stain channel; C/G the bright field channel and D/H an overlay of all the channels. Images A–D are excited with a 543 nm laser only. Images E–H are excited with both the 543 and 633 nm lasers. Right: Intracellular fluorescence intensity from images A/B and E/F. Data are expressed as mean ± s.e.m from at least ten cells (p = 0.001 to 0.01).</p
Emission spectra of Cy3 and Cy5 DNA.
<p>Titration of Cy5 tagged DNA (<b>S2</b>) into Cy3 tagged DNA (<b>S1</b>), showing resulting Cy5-Cy3 FRET upon duplex formation (excitation wavelength = 554 nm). The emission intensities centred at 570 nm and 670 nm correspond to emission from Cy3 and Cy5 respectively (conditions: 1 µM DNA, 100 mM NaCl, and pH 7.0 sodium phosphate buffer). The spectra are subtracted for the spectrum of <b>S2</b> alone, excited at 554 nm, which gave a small signal caused by direct excitation of the Cy5 chromophore.</p
Fixed cell confocal microscopy images.
<p>Left: Cy3 and Cy5 tagged DNA duplex (<b>S1:S2</b>) added to fixed/permeabilised cells and imaged using confocal microscopy. Images A/E represents the Cy3 channel; B/F the Cy5 channel; C/G the bright field channel and D/H an overlay of all the channels. Images A–D are excited with the 543 nm laser. Images E–H are excited with both the 543 and 633 nm lasers. Right: Intracellular fluorescence intensity from images A/B and E/F. Data are expressed as mean ± s.e.m from at least ten cells (p<0.001).</p
Electroporation and Microinjection Successfully Deliver Single-Stranded and Duplex DNA into Live Cells as Detected by FRET Measurements
<div><p>Förster resonance energy transfer (FRET) technology relies on the close proximity of two compatible fluorophores for energy transfer. Tagged (Cy3 and Cy5) complementary DNA strands forming a stable duplex and a doubly-tagged single strand were shown to demonstrate FRET outside of a cellular environment. FRET was also observed after transfecting these DNA strands into fixed and live cells using methods such as microinjection and electroporation, but not when using lipid based transfection reagents, unless in the presence of the endosomal acidification inhibitor bafilomycin. Avoiding the endocytosis pathway is essential for efficient delivery of intact DNA probes into cells.</p></div
Schematic of Cy3 and Cy5 tagged DNA.
<p>a) Complementary DNA strands are individually tagged with Cy3 and Cy5 fluorophores (<b>S1</b> and <b>S2</b> respectively). When in close enough proximity the Cy3 can donate energy to Cy5 through FRET. In this case, FRET can only occur when the two complementary strands form a duplex. b) Single strand DNA can be tagged at either end with Cy3 and Cy5 (<b>S3</b>). FRET can occur as long as the single strand remains intact.</p
Chemical transfection confocal microscopy images.
<p>Left: Cy3 and Cy5 tagged DNA duplex (<b>S1:S2</b>) added to cells via chemical transfection using Lipofectamine and imaged using confocal microscopy. Images A/E represents the Cy3 channel; B/F the nuclear stain channel; C/G the Cy5 channel and D/H an overlay of all the channels. Images A–D are excited with a 543 nm laser only. Images E–H are excited with both the 543 and 633 nm lasers. Right: Intracellular fluorescence intensity from images A/C and E/G. Data are expressed as mean ± s.e.m from at least ten cells (p<0.001).</p