We develop a novel peak detection algorithm for the analysis of comprehensive
two-dimensional gas chromatography time-of-flight mass spectrometry
(GC×GC-TOF MS) data using normal-exponential-Bernoulli (NEB) and mixture
probability models. The algorithm first performs baseline correction and
denoising simultaneously using the NEB model, which also defines peak regions.
Peaks are then picked using a mixture of probability distribution to deal with
the co-eluting peaks. Peak merging is further carried out based on the mass
spectral similarities among the peaks within the same peak group. The algorithm
is evaluated using experimental data to study the effect of different cutoffs
of the conditional Bayes factors and the effect of different mixture models
including Poisson, truncated Gaussian, Gaussian, Gamma and exponentially
modified Gaussian (EMG) distributions, and the optimal version is introduced
using a trial-and-error approach. We then compare the new algorithm with two
existing algorithms in terms of compound identification. Data analysis shows
that the developed algorithm can detect the peaks with lower false discovery
rates than the existing algorithms, and a less complicated peak picking model
is a promising alternative to the more complicated and widely used EMG mixture
models.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS731 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org