BACKGROUND: Array-based comparative genome hybridization (aCGH) is a tool for rapid comparison of genomes from different bacterial strains. The purpose of such analysis is to detect highly divergent or absent genes in a sample strain compared to an index strain. Development of methods for analyzing aCGH data has primarily focused on copy number abberations in cancer research. In microbial aCGH analyses, genes are typically ranked by log-ratios, and classification into divergent or present is done by choosing a cutoff log-ratio, either manually or by statistics calculated from the log-ratio distribution. As experimental settings vary considerably, it is not possible to develop a classical discriminant or statistical learning approach. METHODS: We introduce a more efficient method for analyzing microbial aCGH data using a finite mixture model and a data rotation scheme. Using the average posterior probabilities from the model fitted to log-ratios before and after rotation, we get a score for each gene, and demonstrate its advantages for ranking and detecting divergent genes with enlarged specificity and sensitivity. RESULTS: The procedure is tested and compared to other approaches on simulated data sets, as well as on four experimental validation data sets for aCGH analysis on fully sequenced strains of Staphylococcus aureus and Streptococcus pneumoniae. CONCLUSION: When tested on simulated data as well as on four different experimental validation data sets from experiments with only fully sequenced strains, our procedure out-competes the standard procedures of using a simple log-ratio cutoff for classification into present and divergent genes

Aakra, Ågot

Aastveit, Are

Nyquist, Ludvig

Repsilber, Dirk

Snipen, Lars

Ziegler, Andreas

English

PubMed

Lars Snipen

Dirk Repsilber

Ludvig Nyquist

Andreas Ziegler

Ågot Aakra

Are Aastveit

Springer - Publisher Connector

Detection of divergent genes in microbial aCGH experiments

Abstract Background Array-based comparative genome hybridization (aCGH) is a tool for rapid comparison of genomes from different bacterial strains. The purpose of such analysis is to detect highly divergent or absent genes in a sample strain compared to an index strain. Development of methods for analyzing aCGH data has primarily focused on copy number abberations in cancer research. In microbial aCGH analyses, genes are typically ranked by log-ratios, and classification into divergent or present is done by choosing a cutoff log-ratio, either manually or by statistics calculated from the log-ratio distribution. As experimental settings vary considerably, it is not possible to develop a classical discriminant or statistical learning approach. Methods We introduce a more efficient method for analyzing microbial aCGH data using a finite mixture model and a data rotation scheme. Using the average posterior probabilities from the model fitted to log-ratios before and after rotation, we get a score for each gene, and demonstrate its advantages for ranking and detecting divergent genes with enlarged specificity and sensitivity. Results The procedure is tested and compared to other approaches on simulated data sets, as well as on four experimental validation data sets for aCGH analysis on fully sequenced strains of Staphylococcus aureus and Streptococcus pneumoniae. Conclusion When tested on simulated data as well as on four different experimental validation data sets from experiments with only fully sequenced strains, our procedure out-competes the standard procedures of using a simple log-ratio cutoff for classification into present and divergent genes.</p

Aakra Ågot

Ziegler Andreas

Nyquist Ludvig

Repsilber Dirk

Snipen Lars

Aastveit Are

Directory of Open Access Journals

BMC Bioinformatics

A: Data rotation improves genomotyping efficiency.

A: Determination of the Core of a Minimal Bacterial Gene Set. Microbiology and Molecular Biology Reviews

AN: Hidden Markov models approach to the analysis of array CGH data.

BW: Application of DNA Microarrays for Comparative and Evolutionary Genomics. Methods in Microbiology

BW: Whole Genome Comparison of Campylobacter jejuni Human Isolates Using a Low-Cost Microarray Reveals Extensive Genetic Diversity. Genome Research

DG: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genetics

E: Uses of Staphylococcus aureus GeneChip in Genotyping and Genetic Composition Analysis.

Engstrand L: Comparison of Genetic Divergence and Fitness between Two Subclones of Helicobacter pylori. Infection and Immunity

F: Improved analytical methods for microarray-based genome-composition analysis. Genome Biology

Finite Mixture Models

GA: Data transformations for cDNA microarray data. Statistical applications in genetics and molecular biology 2003, 2:article 4.

JM: Evolutionary genomics of Staphylococcus aureus: Insight into the origin of methicillin-resistant strains and the toxic shock syndrome epidemic.

Kallioniemi A: CGH-Plotter: MATLAB toolbox for CGH-data analysis. Bioinformatics

Pattern Recognition and Neural Networks Cambridge:

PO: Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nature Genetics

Significance analysis of microarrays applied to the ionizing radiation response. PNAS

T: Normalization for cDNA Microarray Data.

The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology

Ylstra B: Breakpoint Identification and Smoothing of array Comparative Genomic Hybridization data. Bioinformatics Advanced Access

http://doaj.org/search?source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22id%22%3A%22e81824ae831e4e0ca477e0a5451bc403%22%7D%7D%5D%7D%7D%7D

Detection of divergent genes in microbial aCGH experiments

Abstract

Similar works

Full text

Available Versions

Springer - Publisher Connector

Springer - Publisher Connector

Directory of Open Access Journals