One important preprocessing step in the analysis of microarray data is
background subtraction. In high-density oligonucleotide arrays this is
recognized as a crucial step for the global performance of the data analysis
from raw intensities to expression values.
We propose here an algorithm for background estimation based on a model in
which the cost function is quadratic in a set of fitting parameters such that
minimization can be performed through linear algebra. The model incorporates
two effects: 1) Correlated intensities between neighboring features in the chip
and 2) sequence-dependent affinities for non-specific hybridization fitted by
an extended nearest-neighbor model.
The algorithm has been tested on 360 GeneChips from publicly available data
of recent expression experiments. The algorithm is fast and accurate. Strong
correlations between the fitted values for different experiments as well as
between the free-energy parameters and their counterparts in aqueous solution
indicate that the model captures a significant part of the underlying physical
chemistry.Comment: 21 pages, 5 figure