207 research outputs found

    Species Choice for Comparative Genomics: Being Greedy Works

    Get PDF
    Several projects investigating genetic function and evolution through sequencing and comparison of multiple genomes are now underway. These projects consume many resources, and appropriate planning should be devoted to choosing which species to sequence, potentially involving cooperation among different sequencing centres. A widely discussed criterion for species choice is the maximisation of evolutionary divergence. Our mathematical formalization of this problem surprisingly shows that the best long-term cooperative strategy coincides with the seemingly short-term “greedy” strategy of always choosing the next best single species. Other criteria influencing species choice, such as medical relevance or sequencing costs, can also be accommodated in our approach, suggesting our results' broad relevance in scientific policy decisions

    Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model

    Get PDF
    It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes

    Adaptive Evolution of Conserved Noncoding Elements in Mammals

    Get PDF
    Conserved noncoding elements (CNCs) are an abundant feature of vertebrate genomes. Some CNCs have been shown to act as cis-regulatory modules, but the function of most CNCs remains unclear. To study the evolution of CNCs, we have developed a statistical method called the “shared rates test” to identify CNCs that show significant variation in substitution rates across branches of a phylogenetic tree. We report an application of this method to alignments of 98,910 CNCs from the human, chimpanzee, dog, mouse, and rat genomes. We find that ∼68% of CNCs evolve according to a null model where, for each CNC, a single parameter models the level of constraint acting throughout the phylogeny linking these five species. The remaining ∼32% of CNCs show departures from the basic model including speed-ups and slow-downs on particular branches and occasionally multiple rate changes on different branches. We find that a subset of the significant CNCs have evolved significantly faster than the local neutral rate on a particular branch, providing strong evidence for adaptive evolution in these CNCs. The distribution of these signals on the phylogeny suggests that adaptive evolution of CNCs occurs in occasional short bursts of evolution. Our analyses suggest a large set of promising targets for future functional studies of adaptation

    A Phylogenomic Study of Human, Dog, and Mouse

    Get PDF
    In recent years the phylogenetic relationship of mammalian orders has been addressed in a number of molecular studies. These analyses have frequently yielded inconsistent results with respect to some basal ordinal relationships. For example, the relative placement of primates, rodents, and carnivores has differed in various studies. Here, we attempt to resolve this phylogenetic problem by using data from completely sequenced nuclear genomes to base the analyses on the largest possible amount of data. To minimize the risk of reconstruction artifacts, the trees were reconstructed under different criteria—distance, parsimony, and likelihood. For the distance trees, distance metrics that measure independent phenomena (amino acid replacement, synonymous substitution, and gene reordering) were used, as it is highly improbable that all of the trees would be affected the same way by any reconstruction artifact. In contradiction to the currently favored classification, our results based on full-genome analysis of the phylogenetic relationship between human, dog, and mouse yielded overwhelming support for a primate–carnivore clade with the exclusion of rodents

    Transcription Factor Map Alignment of Promoter Regions

    Get PDF
    We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments

    On the Origin and Evolution of Vertebrate Olfactory Receptor Genes: Comparative Genome Analysis Among 23 Chordate Species

    Get PDF
    Olfaction is a primitive sense in organisms. Both vertebrates and insects have receptors for detecting odor molecules in the environment, but the evolutionary origins of these genes are different. Among studied vertebrates, mammals have ∼1,000 olfactory receptor (OR) genes, whereas teleost fishes have much smaller (∼100) numbers of OR genes. To investigate the origin and evolution of vertebrate OR genes, I attempted to determine near-complete OR gene repertoires by searching whole-genome sequences of 14 nonmammalian chordates, including cephalochordates (amphioxus), urochordates (ascidian and larvacean), and vertebrates (sea lamprey, elephant shark, five teleost fishes, frog, lizard, and chicken), followed by a large-scale phylogenetic analysis in conjunction with mammalian OR genes identified from nine species. This analysis showed that the amphioxus has >30 vertebrate-type OR genes though it lacks distinctive olfactory organs, whereas all OR genes appear to have been lost in the urochordate lineage. Some groups of genes (θ, κ, and λ) that are phylogenetically nested within vertebrate OR genes showed few gene gains and losses, which is in sharp contrast to the evolutionary pattern of OR genes, suggesting that they are actually non-OR genes. Moreover, the analysis demonstrated a great difference in OR gene repertoires between aquatic and terrestrial vertebrates, reflecting the necessity for the detection of water-soluble and airborne odorants, respectively. However, a minor group (β) of genes that are atypically present in both aquatic and terrestrial vertebrates was also found. These findings should provide a critical foundation for further physiological, behavioral, and evolutionary studies of olfaction in various organisms

    Comparative studies of glycosylphosphatidylinositol-anchored high-density lipoprotein-binding protein 1: evidence for a eutherian mammalian origin for the GPIHBP1 gene from an LY6-like gene

    Get PDF
    Glycosylphosphatidylinositol-anchored high-density lipoprotein-binding protein 1 (GPIHBP1) functions as a platform and transport agent for lipoprotein lipase (LPL) which functions in the hydrolysis of chylomicrons, principally in heart, skeletal muscle and adipose tissue capillary endothelial cells. Previous reports of genetic deficiency for this protein have described severe chylomicronemia. Comparative GPIHBP1 amino acid sequences and structures and GPIHBP1 gene locations were examined using data from several mammalian genome projects. Mammalian GPIHBP1 genes usually contain four coding exons on the positive strand. Mammalian GPIHBP1 sequences shared 41–96% identities as compared with 9–32% sequence identities with other LY6-domain-containing human proteins (LY6-like). The human N-glycosylation site was predominantly conserved among other mammalian GPIHBP1 proteins except cow, dog and pig. Sequence alignments, key amino acid residues and conserved predicted secondary structures were also examined, including the N-terminal signal peptide, the acidic amino acid sequence region which binds LPL, the glycosylphosphatidylinositol linkage group, the Ly6 domain and the C-terminal α-helix. Comparative and phylogenetic studies of mammalian GPIHBP1 suggested that it originated in eutherian mammals from a gene duplication event of an ancestral LY6-like gene and subsequent integration of exon 2, which may have been derived from BCL11A (B-cell CLL/lymphoma 11A gene) encoding an extended acidic amino acid sequence

    How repetitive are genomes?

    Get PDF
    BACKGROUND: Genome sequences vary strongly in their repetitiveness and the causes for this are still debated. Here we propose a novel measure of genome repetitiveness, the index of repetitiveness, I(r), which can be computed in time proportional to the length of the sequences analyzed. We apply it to 336 genomes from all three domains of life. RESULTS: The expected value of I(r )is zero for random sequences of any G/C content and greater than zero for sequences with excess repeats. We find that the I(r )of archaea is significantly smaller than that of eubacteria, which in turn is smaller than that of eukaryotes. Mouse chromosomes have a significantly higher I(r )than human chromosomes and within each genome the Y chromosome is most repetitive. A sliding window analysis reveals that the human HOXA cluster and two surrounding genes are characterized by local minima in I(r). A program for calculating the I(r )is freely available at . CONCLUSION: The general measure of DNA repetitiveness proposed in this paper can be efficiently computed on a genomic scale. This reveals a broad spectrum of repetitiveness among diverse genomes which agrees qualitatively with previous studies of repeat content. A sliding window analysis helps to analyze the intragenomic distribution of repeats

    Mouse versus Rat: Profound Differences in Meiotic Regulation at the Level of the Isolated Oocyte

    Get PDF
    Cumulus cell-enclosed oocytes (CEO), denuded oocytes (DO), or dissected follicles were obtained 44–48 hr after priming immature mice (20–23 days old) with 5 IU or immature rats (25–27 days old) with 12.5 IU of equine chorionic gonadotropin, and exposed to a variety of culture conditions. Mouse oocytes were more effectively maintained in meiotic arrest by hypoxanthine, dbcAMP, IBMX, milrinone, and 8-Br-cGMP. Atrial natriuretic peptide, a guanylate cyclase activator, suppressed maturation in CEO from both species, but mycophenolic acid reversed IBMX-maintained meiotic arrest in mouse CEO with little activity in rat CEO. IBMX-arrested mouse, but not rat, CEO were induced to undergo germinal vesicle breakdown (GVB) by follicle-stimulating hormone (FSH) and amphiregulin, while human chorionic gonadotropin (hCG) was ineffective in both species. Nevertheless, FSH and amphiregulin stimulated cumulus expansion in both species. FSH and hCG were both effective inducers of GVB in cultured mouse and rat follicles while amphiregulin was stimulatory only in mouse follicles. Changing the culture medium or altering macromolecular supplementation had no effect on FSH-induced maturation in rat CEO. The AMP-activated protein kinase (AMPK) activator, AICAR, was a potent stimulator of maturation in mouse CEO and DO, but only marginally stimulatory in rat CEO and ineffective in rat DO. The AMPK inhibitor, compound C, blocked meiotic induction more effectively in hCG-treated mouse follicles and heat-treated mouse CEO. Both agents produced contrasting results on polar body formation in cultured CEO in the two species. Active AMPK was detected in germinal vesicles of immature mouse, but not rat, oocytes prior to hCG-induced maturation in vivo; it colocalized with chromatin after GVB in rat and mouse oocytes, but did not appear at the spindle poles in rat oocytes as it did in mouse oocytes. Finally, cultured mouse and rat CEO displayed disparate maturation responses to energy substrate manipulation. These data highlight significant differences in meiotic regulation between the two species, and demonstrate a greater potential in mice for control at the level of the cumulus CEO

    Genome-scale relationships between cytosine methylation and dinucleotide abundances in animals

    Get PDF
    AbstractIn mammalian genomes CpGs occur at one-fifth their expected frequency. This is accepted as resulting from cytosine methylation and deamination of 5-methylcytosine leading to TpG and CpA dinucleotides. The corollary that a CpG deficit should correlate with TpG excess has not hitherto been systematically tested at a genomic level. I analyzed genome sequences (human, chimpanzee, mouse, pufferfish, zebrafish, sea squirt, fruitfly, mosquito, and nematode) to do this and generally to assess the hypothesis that CpG deficit, TpG excess, and other data are accountable in terms of 5-methylcytosine mutation. In all methylated genomes local CpG deficit decreases with higher G + C content. Local TpG surplus, while positively associated with G + C level in mammalian genomes but negatively associated with G + C in nonmammalian methylated genomes, is always explicable in terms of the CpG trend under the methylation model. Covariance of dinucleotide abundances with G + C demonstrates that correlation analyses should control for G + C. Doing this reveals a strong negative correlation between local CpG and TpG abundances in methylated genomes, in accord with the methylation hypothesis. CpG deficit also correlates with CpT excess in mammals, which may reflect enhanced cytosine mutation in the context 5′-YCG-3′. Analyses with repeat-masked sequences show that the results are not attributable to repetitive elements
    corecore