6,767 research outputs found
Genomic Selective Constraints in Murid Noncoding DNA
Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5' end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids
Evolutionary origin and diversification of epidermal barrier proteins in amniotes.
The evolution of amniotes has involved major molecular innovations in the epidermis. In particular, distinct structural proteins that undergo covalent cross-linking during cornification of keratinocytes facilitate the formation of mechanically resilient superficial cell layers and help to limit water loss to the environment. Special modes of cornification generate amniote-specific skin appendages such as claws, feathers, and hair. In mammals, many protein substrates of cornification are encoded by a cluster of genes, termed the epidermal differentiation complex (EDC). To provide a basis for hypotheses about the evolution of cornification proteins, we screened for homologs of the EDC in non-mammalian vertebrates. By comparative genomics, de novo gene prediction and gene expression analyses, we show that, in contrast to fish and amphibians, the chicken and the green anole lizard have EDC homologs comprising genes that are specifically expressed in the epidermis and in skin appendages. Our data suggest that an important component of the cornified protein envelope of mammalian keratinocytes, that is, loricrin, has originated in a common ancestor of modern amniotes, perhaps during the acquisition of a fully terrestrial lifestyle. Moreover, we provide evidence that the sauropsid-specific beta-keratins have evolved as a subclass of EDC genes. Based on the comprehensive characterization of the arrangement, exon-intron structures and conserved sequence elements of EDC genes, we propose new scenarios for the evolutionary origin of epidermal barrier proteins via fusion of neighboring S100A and peptidoglycan recognition protein genes, subsequent loss of exons and highly divergent sequence evolution
Evidence for Pervasive Adaptive Protein Evolution in Wild Mice
The relative contributions of neutral and adaptive substitutions to molecular evolution has been one of the most controversial issues in evolutionary biology for more than 40 years. The analysis of within-species nucleotide polymorphism and between-species divergence data supports a widespread role for adaptive protein evolution in certain taxa. For example, estimates of the proportion of adaptive amino acid substitutions (alpha) are 50% or more in enteric bacteria and Drosophila. In contrast, recent estimates of alpha for hominids have been at most 13%. Here, we estimate alpha for protein sequences of murid rodents based on nucleotide polymorphism data from multiple genes in a population of the house mouse subspecies Mus musculus castaneus, which inhabits the ancestral range of the Mus species complex and nucleotide divergence between M. m. castaneus and M. famulus or the rat. We estimate that 57% of amino acid substitutions in murids have been driven by positive selection. Hominids, therefore, are exceptional in having low apparent levels of adaptive protein evolution. The high frequency of adaptive amino acid substitutions in wild mice is consistent with their large effective population size, leading to effective natural selection at the molecular level. Effective natural selection also manifests itself as a paucity of effectively neutral nonsynonymous mutations in M. m. castaneus compared to humans
A genomic approach to examine the complex evolution of laurasiatherian mammals
Recent phylogenomic studies have failed to conclusively resolve certain branches of the placental mammalian tree, despite the evolutionary analysis of genomic data from 32 species. Previous analyses of single genes and retroposon insertion data yielded support for different phylogenetic scenarios for the most basal divergences. The results indicated that some mammalian divergences were best interpreted not as a single bifurcating tree, but as an evolutionary network. In these studies the relationships among some orders of the super-clade Laurasiatheria were poorly supported, albeit not studied in detail. Therefore, 4775 protein-coding genes (6,196,263 nucleotides) were collected and aligned in order to analyze the evolution of this clade. Additionally, over 200,000 introns were screened in silico, resulting in 32 phylogenetically informative long interspersed nuclear elements (LINE) insertion events.
The present study shows that the genome evolution of Laurasiatheria may best be understood as an evolutionary network. Thus, contrary to the common expectation to resolve major evolutionary events as a bifurcating tree, genome analyses unveil complex speciation processes even in deep mammalian divergences. We exemplify this on a subset of 1159 suitable genes that have individual histories, most likely due to incomplete lineage sorting or introgression, processes that can make the genealogy of mammalian genomes complex.
These unexpected results have major implications for the understanding of evolution in general, because the evolution of even some higher level taxa such as mammalian orders may sometimes not be interpreted as a simple bifurcating pattern
Stability domains of actin genes and genomic evolution
In eukaryotic genes the protein coding sequence is split into several
fragments, the exons, separated by non-coding DNA stretches, the introns.
Prokaryotes do not have introns in their genome. We report the calculations of
stability domains of actin genes for various organisms in the animal, plant and
fungi kingdoms. Actin genes have been chosen because they have been highly
conserved during evolution. In these genes all introns were removed so as to
mimic ancient genes at the time of the early eukaryotic development, i.e.
before introns insertion. Common stability boundaries are found in evolutionary
distant organisms, which implies that these boundaries date from the early
origin of eukaryotes. In general boundaries correspond with introns positions
of vertebrates and other animals actins, but not much for plants and fungi. The
sharpest boundary is found in a locus where fungi, algae and animals have
introns in positions separated by one nucleotide only, which identifies a
hot-spot for insertion. These results suggest that some introns may have been
incorporated into the genomes through a thermodynamic driven mechanism, in
agreement with previous observations on human genes. They also suggest a
different mechanism for introns insertion in plants and animals.Comment: 9 Pages, 7 figures. Phys. Rev. E in pres
Structural dynamics and divergence of the polygalacturonase gene family in land plants
A distinct feature of eukaryotic genomes is the presence of gene families. The polygalacturonase (PG) (EC3.2.1.15) gene family is one of the largest gene families in plants. PG is a pectin-digesting enzyme with a glycoside hydrolase 28 domain. It is involved in numerous plant developmental processes. The evolutionary processes accounting for the functional divergence and the specialized functions of PGs in land plants are unclear. Here, phylogenetic and gene structure analysis of PG genes in algae and land plants revealed that land plant PG genes resulted from differential intron gain and loss, with the latter event predominating. PG genes in land plants contained 15 homologous intron blocks and 13 novel intron blocks. Intron position and phase were not conserved between PGs of algae and land plants but conserved among PG genes of land plants from moss to vascular plants, indicating that the current introns in the PGs in land plants appeared after the split between unicellular algae and multicelluar land plants. These findings demonstrate that the functional divergence and differentiation of PGs in land plants is attributable to intronic loss. Moreover, they underscore the importance of intron gain and loss in genomic adaptation to selective pressure
The Alternative Choice of Constitutive Exons throughout Evolution
Alternative cassette exons are known to originate from two processes
exonization of intronic sequences and exon shuffling. Herein, we suggest an
additional mechanism by which constitutively spliced exons become alternative
cassette exons during evolution. We compiled a dataset of orthologous exons
from human and mouse that are constitutively spliced in one species but
alternatively spliced in the other. Examination of these exons suggests that
the common ancestors were constitutively spliced. We show that relaxation of
the 59 splice site during evolution is one of the molecular mechanisms by which
exons shift from constitutive to alternative splicing. This shift is associated
with the fixation of exonic splicing regulatory sequences (ESRs) that are
essential for exon definition and control the inclusion level only after the
transition to alternative splicing. The effect of each ESR on splicing and the
combinatorial effects between two ESRs are conserved from fish to human. Our
results uncover an evolutionary pathway that increases transcriptome diversity
by shifting exons from constitutive to alternative splicin
In search of lost introns
Many fundamental questions concerning the emergence and subsequent evolution
of eukaryotic exon-intron organization are still unsettled. Genome-scale
comparative studies, which can shed light on crucial aspects of eukaryotic
evolution, require adequate computational tools.
We describe novel computational methods for studying spliceosomal intron
evolution. Our goal is to give a reliable characterization of the dynamics of
intron evolution. Our algorithmic innovations address the identification of
orthologous introns, and the likelihood-based analysis of intron data. We
discuss a compression method for the evaluation of the likelihood function,
which is noteworthy for phylogenetic likelihood problems in general. We prove
that after preprocessing time, subsequent evaluations take time almost surely in the Yule-Harding random model of -taxon
phylogenies, where is the input sequence length.
We illustrate the practicality of our methods by compiling and analyzing a
data set involving 18 eukaryotes, more than in any other study to date. The
study yields the surprising result that ancestral eukaryotes were fairly
intron-rich. For example, the bilaterian ancestor is estimated to have had more
than 90% as many introns as vertebrates do now
Identifying statistical dependence in genomic sequences via mutual information estimates
Questions of understanding and quantifying the representation and amount of
information in organisms have become a central part of biological research, as
they potentially hold the key to fundamental advances. In this paper, we
demonstrate the use of information-theoretic tools for the task of identifying
segments of biomolecules (DNA or RNA) that are statistically correlated. We
develop a precise and reliable methodology, based on the notion of mutual
information, for finding and extracting statistical as well as structural
dependencies. A simple threshold function is defined, and its use in
quantifying the level of significance of dependencies between biological
segments is explored. These tools are used in two specific applications. First,
for the identification of correlations between different parts of the maize
zmSRp32 gene. There, we find significant dependencies between the 5'
untranslated region in zmSRp32 and its alternatively spliced exons. This
observation may indicate the presence of as-yet unknown alternative splicing
mechanisms or structural scaffolds. Second, using data from the FBI's Combined
DNA Index System (CODIS), we demonstrate that our approach is particularly well
suited for the problem of discovering short tandem repeats, an application of
importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on
Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb
- …