Skip to main content
Article thumbnail
Location of Repository

Gene Cluster Statistics with Gene Families

By Narayanan Raghupathy and Dannie Durand


Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such “gene clusters” is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters)

Topics: Research Articles
Publisher: Oxford University Press
OAI identifier:
Provided by: PubMed Central

Suggested articles

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.