Genomic copy number variation (CNV) is a large source of variation between organisms, and its consequences include phenotypic differences and genetic disorders. CNVs are commonly detected by hybridizing genomic DNA to microarrays of nucleic acid probes. System noise caused by operational and probe performance variability complicates the interpretation of these data. To minimize the distortion of genetic signal by system noise, we have explored the latter in an archive of hybridizations in which no genetic signal is expected. This archive is obtained by comparative genomic hybridization (CGH) of a sample in one channel to the same sample in the other channel, or 'self-self' data. These self-self hybridizations trap a variety of system noise inherent in sample-reference (test) data. Through singular value decomposition (SVD) of self-self data, we have determined the principal components of system noise. Assuming simple linear models of noise generation, the linear correction of test data with self-self data -or 'system normalization'- reduces local and long-range correlations and improves signal-to-noise metrics, yet does not introduce detectable spurious signal. Using this method, 90% of hybridizations displayed improved signal-to-noise ratios with an average increase of 7.0%, due mainly to a reduced median average deviation (MAD). In addition, we have found that principal component loadings correlate with specific probe variables including array coordinates, base composition, and proximity to the 5' ends of genes. The correlation of the principal component loadings with the test data depends on operational variables, such as the temporal order of processing and the localization of individual samples within 96-well plates.Comment: 10 figures; 3 table
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.