19,114 research outputs found
Identifying statistical dependence in genomic sequences via mutual information estimates
Questions of understanding and quantifying the representation and amount of
information in organisms have become a central part of biological research, as
they potentially hold the key to fundamental advances. In this paper, we
demonstrate the use of information-theoretic tools for the task of identifying
segments of biomolecules (DNA or RNA) that are statistically correlated. We
develop a precise and reliable methodology, based on the notion of mutual
information, for finding and extracting statistical as well as structural
dependencies. A simple threshold function is defined, and its use in
quantifying the level of significance of dependencies between biological
segments is explored. These tools are used in two specific applications. First,
for the identification of correlations between different parts of the maize
zmSRp32 gene. There, we find significant dependencies between the 5'
untranslated region in zmSRp32 and its alternatively spliced exons. This
observation may indicate the presence of as-yet unknown alternative splicing
mechanisms or structural scaffolds. Second, using data from the FBI's Combined
DNA Index System (CODIS), we demonstrate that our approach is particularly well
suited for the problem of discovering short tandem repeats, an application of
importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on
Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb
Quantitative Genetics and Functional-Structural Plant Growth Models: Simulation of Quantitative Trait Loci Detection for Model Parameters and Application to Potential Yield Optimization
Background and Aims: Prediction of phenotypic traits from new genotypes under
untested environmental conditions is crucial to build simulations of breeding
strategies to improve target traits. Although the plant response to
environmental stresses is characterized by both architectural and functional
plasticity, recent attempts to integrate biological knowledge into genetics
models have mainly concerned specific physiological processes or crop models
without architecture, and thus may prove limited when studying genotype x
environment interactions. Consequently, this paper presents a simulation study
introducing genetics into a functional-structural growth model, which gives
access to more fundamental traits for quantitative trait loci (QTL) detection
and thus to promising tools for yield optimization. Methods: The GreenLab model
was selected as a reasonable choice to link growth model parameters to QTL.
Virtual genes and virtual chromosomes were defined to build a simple genetic
model that drove the settings of the species-specific parameters of the model.
The QTL Cartographer software was used to study QTL detection of simulated
plant traits. A genetic algorithm was implemented to define the ideotype for
yield maximization based on the model parameters and the associated allelic
combination. Key Results and Conclusions: By keeping the environmental factors
constant and using a virtual population with a large number of individuals
generated by a Mendelian genetic model, results for an ideal case could be
simulated. Virtual QTL detection was compared in the case of phenotypic traits
- such as cob weight - and when traits were model parameters, and was found to
be more accurate in the latter case. The practical interest of this approach is
illustrated by calculating the parameters (and the corresponding genotype)
associated with yield optimization of a GreenLab maize model. The paper
discusses the potentials of GreenLab to represent environment x genotype
interactions, in particular through its main state variable, the ratio of
biomass supply over demand
A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al
Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences
We introduce and study a set of training-free methods of
information-theoretic and algorithmic complexity nature applied to DNA
sequences to identify their potential capabilities to determine nucleosomal
binding sites. We test our measures on well-studied genomic sequences of
different sizes drawn from different sources. The measures reveal the known in
vivo versus in vitro predictive discrepancies and uncover their potential to
pinpoint (high) nucleosome occupancy. We explore different possible signals
within and beyond the nucleosome length and find that complexity indices are
informative of nucleosome occupancy. We compare against the gold standard
(Kaplan model) and find similar and complementary results with the main
difference that our sequence complexity approach. For example, for high
occupancy, complexity-based scores outperform the Kaplan model for predicting
binding representing a significant advancement in predicting the highest
nucleosome occupancy following a training-free approach.Comment: 8 pages main text (4 figures), 12 total with Supplementary (1 figure
The asexual genome of Drosophila
The rate of recombination affects the mode of molecular evolution. In
high-recombining sequence, the targets of selection are individual genetic
loci; under low recombination, selection collectively acts on large,
genetically linked genomic segments. Selection under linkage can induce clonal
interference, a specific mode of evolution by competition of genetic clades
within a population. This mode is well known in asexually evolving microbes,
but has not been traced systematically in an obligate sexual organism. Here we
show that the Drosophila genome is partitioned into two modes of evolution: a
local interference regime with limited effects of genetic linkage, and an
interference condensate with clonal competition. We map these modes by
differences in mutation frequency spectra, and we show that the transition
between them occurs at a threshold recombination rate that is predictable from
genomic summary statistics. We find the interference condensate in segments of
low-recombining sequence that are located primarily in chromosomal regions
flanking the centromeres and cover about 20% of the Drosophila genome.
Condensate regions have characteristics of asexual evolution that impact gene
function: the efficacy of selection and the speed of evolution are lower and
the genetic load is higher than in regions of local interference. Our results
suggest that multicellular eukaryotes can harbor heterogeneous modes and tempi
of evolution within one genome. We argue that this variation generates
selection on genome architecture
Genome-wide signatures of complex introgression and adaptive evolution in the big cats.
The great cats of the genus Panthera comprise a recent radiation whose evolutionary history is poorly understood. Their rapid diversification poses challenges to resolving their phylogeny while offering opportunities to investigate the historical dynamics of adaptive divergence. We report the sequence, de novo assembly, and annotation of the jaguar (Panthera onca) genome, a novel genome sequence for the leopard (Panthera pardus), and comparative analyses encompassing all living Panthera species. Demographic reconstructions indicated that all of these species have experienced variable episodes of population decline during the Pleistocene, ultimately leading to small effective sizes in present-day genomes. We observed pervasive genealogical discordance across Panthera genomes, caused by both incomplete lineage sorting and complex patterns of historical interspecific hybridization. We identified multiple signatures of species-specific positive selection, affecting genes involved in craniofacial and limb development, protein metabolism, hypoxia, reproduction, pigmentation, and sensory perception. There was remarkable concordance in pathways enriched in genomic segments implicated in interspecies introgression and in positive selection, suggesting that these processes were connected. We tested this hypothesis by developing exome capture probes targeting ~19,000 Panthera genes and applying them to 30 wild-caught jaguars. We found at least two genes (DOCK3 and COL4A5, both related to optic nerve development) bearing significant signatures of interspecies introgression and within-species positive selection. These findings indicate that post-speciation admixture has contributed genetic material that facilitated the adaptive evolution of big cat lineages
- …