7,762 research outputs found
The Mathematics of Phylogenomics
The grand challenges in biology today are being shaped by powerful
high-throughput technologies that have revealed the genomes of many organisms,
global expression patterns of genes and detailed information about variation
within populations. We are therefore able to ask, for the first time,
fundamental questions about the evolution of genomes, the structure of genes
and their regulation, and the connections between genotypes and phenotypes of
individuals. The answers to these questions are all predicated on progress in a
variety of computational, statistical, and mathematical fields.
The rapid growth in the characterization of genomes has led to the
advancement of a new discipline called Phylogenomics. This discipline results
from the combination of two major fields in the life sciences: Genomics, i.e.,
the study of the function and structure of genes and genomes; and Molecular
Phylogenetics, i.e., the study of the hierarchical evolutionary relationships
among organisms and their genomes. The objective of this article is to offer
mathematicians a first introduction to this emerging field, and to discuss
specific mathematical problems and developments arising from phylogenomics.Comment: 41 pages, 4 figure
Genetic Evolution and Molecular Selection of the HE Gene of Influenza C Virus
Influenza C virus (ICV) was first identified in humans and swine, but recently also in cattle, indicating a wider host range and potential threat to both the livestock industry and public health than was originally anticipated. The ICV hemagglutinin-esterase (HE) glycoprotein has multiple functions in the viral replication cycle and is the major determinant of antigenicity. Here, we developed a comparative approach integrating genetics, molecular selection analysis, and structural biology to identify the codon usage and adaptive evolution of ICV. We show that ICV can be classified into six lineages, consistent with previous studies. The HE gene has a low codon usage bias, which may facilitate ICV replication by reducing competition during evolution. Natural selection, dinucleotide composition, and mutation pressure shape the codon usage patterns of the ICV HE gene, with natural selection being the most important factor. Codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analysis revealed that the greatest adaption of ICV was to humans, followed by cattle and swine. Additionally, similarity index (SiD) analysis revealed that swine exerted a stronger evolutionary pressure on ICV than humans, which is considered the primary reservoir. Furthermore, a similar tendency was also observed in the M gene. Of note, we found HE residues 176, 194, and 198 to be under positive selection, which may be the result of escape from antibody responses. Our study provides useful information on the genetic evolution of ICV from a new perspective that can help devise prevention and control strategies
Neutrality: A Necessity for Self-Adaptation
Self-adaptation is used in all main paradigms of evolutionary computation to
increase efficiency. We claim that the basis of self-adaptation is the use of
neutrality. In the absence of external control neutrality allows a variation of
the search distribution without the risk of fitness loss.Comment: 6 pages, 3 figures, LaTe
REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes
Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/ REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames
Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution
Analyses of serially-sampled data often begin with the assumption that the
observations represent discrete samples from a latent continuous-time
stochastic process. The continuous-time Markov chain (CTMC) is one such
generative model whose popularity extends to a variety of disciplines ranging
from computational finance to human genetics and genomics. A common theme among
these diverse applications is the need to simulate sample paths of a CTMC
conditional on realized data that is discretely observed. Here we present a
general solution to this sampling problem when the CTMC is defined on a
discrete and finite state space. Specifically, we consider the generation of
sample paths, including intermediate states and times of transition, from a
CTMC whose beginning and ending states are known across a time interval of
length . We first unify the literature through a discussion of the three
predominant approaches: (1) modified rejection sampling, (2) direct sampling,
and (3) uniformization. We then give analytical results for the complexity and
efficiency of each method in terms of the instantaneous transition rate matrix
of the CTMC, its beginning and ending states, and the length of sampling
time . In doing so, we show that no method dominates the others across all
model specifications, and we give explicit proof of which method prevails for
any given and endpoints. Finally, we introduce and compare three
applications of CTMCs to demonstrate the pitfalls of choosing an inefficient
sampler.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS247 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Back-translation for discovering distant protein homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change
in the resulting protein sequence, which prevents classic protein alignment
methods from revealing the proteins' common origin. Moreover, when a large
number of substitutions are additionally involved in the divergence, the
homology detection becomes difficult even at the DNA level. To cope with this
situation, we propose a novel method to infer distant homology relations of two
proteins, that accounts for frameshift and point mutations that may have
affected the coding sequences. We design a dynamic programming alignment
algorithm over memory-efficient graph representations of the complete set of
putative DNA sequences of each protein, with the goal of determining the two
putative DNA sequences which have the best scoring alignment under a powerful
scoring system designed to reflect the most probable evolutionary process. This
allows us to uncover evolutionary information that is not captured by
traditional alignment methods, which is confirmed by biologically significant
examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics
(WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
Self-adaptive exploration in evolutionary search
We address a primary question of computational as well as biological research
on evolution: How can an exploration strategy adapt in such a way as to exploit
the information gained about the problem at hand? We first introduce an
integrated formalism of evolutionary search which provides a unified view on
different specific approaches. On this basis we discuss the implications of
indirect modeling (via a ``genotype-phenotype mapping'') on the exploration
strategy. Notions such as modularity, pleiotropy and functional phenotypic
complex are discussed as implications. Then, rigorously reflecting the notion
of self-adaptability, we introduce a new definition that captures
self-adaptability of exploration: different genotypes that map to the same
phenotype may represent (also topologically) different exploration strategies;
self-adaptability requires a variation of exploration strategies along such a
``neutral space''. By this definition, the concept of neutrality becomes a
central concern of this paper. Finally, we present examples of these concepts:
For a specific grammar-type encoding, we observe a large variability of
exploration strategies for a fixed phenotype, and a self-adaptive drift towards
short representations with highly structured exploration strategy that matches
the ``problem's structure''.Comment: 24 pages, 5 figure
Towards Understanding the Origin of Genetic Languages
Molecular biology is a nanotechnology that works--it has worked for billions
of years and in an amazing variety of circumstances. At its core is a system
for acquiring, processing and communicating information that is universal, from
viruses and bacteria to human beings. Advances in genetics and experience in
designing computers have taken us to a stage where we can understand the
optimisation principles at the root of this system, from the availability of
basic building blocks to the execution of tasks. The languages of DNA and
proteins are argued to be the optimal solutions to the information processing
tasks they carry out. The analysis also suggests simpler predecessors to these
languages, and provides fascinating clues about their origin. Obviously, a
comprehensive unraveling of the puzzle of life would have a lot to say about
what we may design or convert ourselves into.Comment: (v1) 33 pages, contributed chapter to "Quantum Aspects of Life",
edited by D. Abbott, P. Davies and A. Pati, (v2) published version with some
editin
- …