Search CORE

Oxford University Research Archive

Adaptive Evolution of Conserved Noncoding Elements in Mammals

Author: Emmanouil T Dermitzakis
International Chicken Genome Sequencing Consortium
International Human Genome Sequencing Consortium
Jonathan K Pritchard
Mouse Genome Sequencing Consortium
Su Yeon Kim
Publication venue: Public Library of Science
Publication date: 01/09/2007
Field of study

Conserved noncoding elements (CNCs) are an abundant feature of vertebrate genomes. Some CNCs have been shown to act as cis-regulatory modules, but the function of most CNCs remains unclear. To study the evolution of CNCs, we have developed a statistical method called the “shared rates test” to identify CNCs that show significant variation in substitution rates across branches of a phylogenetic tree. We report an application of this method to alignments of 98,910 CNCs from the human, chimpanzee, dog, mouse, and rat genomes. We find that ∼68% of CNCs evolve according to a null model where, for each CNC, a single parameter models the level of constraint acting throughout the phylogeny linking these five species. The remaining ∼32% of CNCs show departures from the basic model including speed-ups and slow-downs on particular branches and occasionally multiple rate changes on different branches. We find that a subset of the significant CNCs have evolved significantly faster than the local neutral rate on a particular branch, providing strong evidence for adaptive evolution in these CNCs. The distribution of these signals on the phylogeny suggests that adaptive evolution of CNCs occurs in occasional short bursts of evolution. Our analyses suggest a large set of promising targets for future functional studies of adaptation

Nutrigenetics and personalised/stratified approaches to the provision of dietary advice

Author: AM Minihane
ES Lander
International Human Genome Sequencing Consortium
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/06/2014
Field of study

University of East Anglia digital repository

A Phylogenomic Study of Human, Dog, and Mouse

Author: Adrian Schneider
Gaston Gonnet
Gina Cannarozzi
International Human Genome Consortium
Mouse Genome Sequencing Consortium
Philip E Bourne
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

In recent years the phylogenetic relationship of mammalian orders has been addressed in a number of molecular studies. These analyses have frequently yielded inconsistent results with respect to some basal ordinal relationships. For example, the relative placement of primates, rodents, and carnivores has differed in various studies. Here, we attempt to resolve this phylogenetic problem by using data from completely sequenced nuclear genomes to base the analyses on the largest possible amount of data. To minimize the risk of reconstruction artifacts, the trees were reconstructed under different criteria—distance, parsimony, and likelihood. For the distance trees, distance metrics that measure independent phenomena (amino acid replacement, synonymous substitution, and gene reordering) were used, as it is highly improbable that all of the trees would be affected the same way by any reconstruction artifact. In contradiction to the currently favored classification, our results based on full-genome analysis of the phylogenetic relationship between human, dog, and mouse yielded overwhelming support for a primate–carnivore clade with the exclusion of rodents

Repository for Publications and Research Data

Transition-Transversion Bias Is Not Universal: A Counter Example from Grasshopper Pseudogenes

Author: David L Stern
Douda Bensasson
International Human Genome Sequencing Consortium
Irene Keller
Richard A Nichols
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Comparisons of the DNA sequences of metazoa show an excess of transitional over transversional substitutions. Part of this bias is due to the relatively high rate of mutation of methylated cytosines to thymine. Postmutation processes also introduce a bias, particularly selection for codon-usage bias in coding regions. It is generally assumed, however, that there is a universal bias in favour of transitions over transversions, possibly as a result of the underlying chemistry of mutation. Surprisingly, this underlying trend has been evaluated only in two types of metazoan, namely Drosophila and the Mammalia. Here, we investigate a third group, and find no such bias. We characterize the point substitution spectrum in Podisma pedestris, a grasshopper species with a very large genome. The accumulation of mutations was surveyed in two pseudogene families, nuclear mitochondrial and ribosomal DNA sequences. The cytosine-guanine (CpG) dinucleotides exhibit the high transition frequencies expected of methylated sites. The transition rate at other cytosine residues is significantly lower. After accounting for this methylation effect, there is no significant difference between transition and transversion rates. These results contrast with reports from other taxa and lead us to reject the hypothesis of a universal transition/transversion bias. Instead we suggest fundamental interspecific differences in point substitution processes

CiteSeerX

The University of Manchester - Institutional Repository

Queen Mary Research Online

Bias of Selection on Human Copy-Number Variants

Author: Barbara Trask
Caleb Webber
Chimpanzee Sequencing and Analysis Consortium
Chris P Ponting
Duc-Quang Nguyen
International Human Genome Sequencing Consortium
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

Although large-scale copy-number variation is an important contributor to conspecific genomic diversity, whether these variants frequently contribute to human phenotype differences remains unknown. If they have few functional consequences, then copy-number variants (CNVs) might be expected both to be distributed uniformly throughout the human genome and to encode genes that are characteristic of the genome as a whole. We find that human CNVs are significantly overrepresented close to telomeres and centromeres and in simple tandem repeat sequences. Additionally, human CNVs were observed to be unusually enriched in those protein-coding genes that have experienced significantly elevated synonymous and nonsynonymous nucleotide substitution rates, estimated between single human and mouse orthologues. CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease. Despite mouse CNVs also exhibiting a significant elevation in synonymous substitution rates, in most other respects they do not differ significantly from the genomic background. Nevertheless, they encode proteins that are depleted in olfactory function, and they exhibit significantly decreased amino acid sequence divergence. Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage. By contrast, the functional characteristics of mouse CNVs either suggest that advantageous gene copies have been depleted during recent selective breeding of laboratory mouse strains or suggest that they were preferentially fixed as a consequence of the larger effective population size of wild mice. It thus appears that CNV differences among mouse strains do not provide an appropriate model for large-scale sequence variations in the human population

CiteSeerX

Online Research @ Cardiff

Oxford University Research Archive

A Macaque's-Eye View of Human Insertions and Deletions: Differences in Mechanisms

Author: Erika M Kvikstad
Francesca Chiaromonte
International Human Genome Sequencing Consortium
Kateryna D Makova
Rhesus Macaque Genome Sequencing and Analysis Consortium
Svitlana Tyekucheva
The Chimpanzee Sequencing and Analysis Consortium
Wen-Hsiung Li
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments

Archivio della ricerca della Scuola Superiore Sant'Anna

SSRD: Simple Sequence Repeats Database of the Human Genome

Author: Albanese
Dib
Gur-Arie
Hui
International human genome sequencing consortium
Jurka
Kashi
Majewski
Pearson
Primmer
Sinden
Slatkin
Subramanian
Thangaraj
Toth
Wren
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2003
Field of study

Simple sequence repeats are predominantly found in most organisms. They play a major role in studies of genetic diversity, and are useful as diagnostic markers for many diseases. The simple sequence repeats database (SSRD) for the human genome was created for easy access to such repeats, for analysis, and to be used to understand their biological significance. The data includes the abundance and distribution of SSRs in the coding and non-coding regions of the genome, as well as their association with the UTRs of genes. The exact locations of repeats with respect to genomic regions (such as UTRs, exons, introns or intergenic regions) and their association with STS markers are also highlighted. The resource will facilitate repeat sequence analysis in the human genome and the understanding of the functional and evolutionary significance of simple sequence repeats. SSRD is available through two websites, http://www.ccmb.res.in/ssr and http://www.ingenovis.com/ssr

Intronic Alternative Splicing Regulators Identified by Comparative Genomics in Nematodes

Author: Alan M Zahler
Gary Stormo
Hiram Clawson
International Human Genome Sequencing Consortium
Jennifer L Kabat
Paul McKenna
Sergio Barberan-Soler
Tracy Farrer
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

Many alternative splicing events are regulated by pentameric and hexameric intronic sequences that serve as binding sites for splicing regulatory factors. We hypothesized that intronic elements that regulate alternative splicing are under selective pressure for evolutionary conservation. Using a Wobble Aware Bulk Aligner genomic alignment of Caenorhabditis elegans and Caenorhabditis briggsae, we identified 147 alternatively spliced cassette exons that exhibit short regions of high nucleotide conservation in the introns flanking the alternative exon. In vivo experiments on the alternatively spliced let-2 gene confirm that these conserved regions can be important for alternative splicing regulation. Conserved intronic element sequences were collected into a dataset and the occurrence of each pentamer and hexamer motif was counted. We compared the frequency of pentamers and hexamers in the conserved intronic elements to a dataset of all C. elegans intron sequences in order to identify short intronic motifs that are more likely to be associated with alternative splicing. High-scoring motifs were examined for upstream or downstream preferences in introns surrounding alternative exons. Many of the high- scoring nematode pentamer and hexamer motifs correspond to known mammalian splicing regulatory sequences, such as (T)GCATG, indicating that the mechanism of alternative splicing regulation is well conserved in metazoans. A comparison of the analysis of the conserved intronic elements, and analysis of the entire introns flanking these same exons, reveals that focusing on intronic conservation can increase the sensitivity of detecting putative splicing regulatory motifs. This approach also identified novel sequences whose role in splicing is under investigation and has allowed us to take a step forward in defining a catalog of splicing regulatory elements for an organism. In vivo experiments confirm that one novel high-scoring sequence from our analysis, (T)CTATC, is important for alternative splicing regulation of the unc-52 gene

Repository for Publications and Research Data

Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries

Author: A Mironov
BS Everitt
C Southan
Corinne Dahinden
D Brett
D Brett
F Liang
Giovanni Parmigiani
International Human Genome Sequencing Consortium
International Human Genome Sequencing Consortium
M Yuan
M Zavolan
Mark C Emerick
MR Regan
Peter Bühlmann
R Christensen
R Tibshirani
S Rosset
SL Lauritzen
T Imanishi
The FANTOM Consortium
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. Results We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ1-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. Conclusion We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.</p

Springer - Publisher Connector