We present a novel nonparametric Bayesian approach based on L\'{e}vy Adaptive
Regression Kernels (LARK) to model spectral data arising from MALDI-TOF (Matrix
Assisted Laser Desorption Ionization Time-of-Flight) mass spectrometry. This
model-based approach provides identification and quantification of proteins
through model parameters that are directly interpretable as the number of
proteins, mass and abundance of proteins and peak resolution, while having the
ability to adapt to unknown smoothness as in wavelet based methods. Informative
prior distributions on resolution are key to distinguishing true peaks from
background noise and resolving broad peaks into individual peaks for multiple
protein species. Posterior distributions are obtained using a reversible jump
Markov chain Monte Carlo algorithm and provide inference about the number of
peaks (proteins), their masses and abundance. We show through simulation
studies that the procedure has desirable true-positive and false-discovery
rates. Finally, we illustrate the method on five example spectra: a blank
spectrum, a spectrum with only the matrix of a low-molecular-weight substance
used to embed target proteins, a spectrum with known proteins, and a single
spectrum and average of ten spectra from an individual lung cancer patient.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS450 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org