662 research outputs found
Comparative Phylogeography of a Coevolved Community: Concerted Population Expansions in Joshua Trees and Four Yucca Moths
Comparative phylogeographic studies have had mixed success in identifying common phylogeographic patterns among co-distributed organisms. Whereas some have found broadly similar patterns across a diverse array of taxa, others have found that the histories of different species are more idiosyncratic than congruent. The variation in the results of comparative phylogeographic studies could indicate that the extent to which sympatrically-distributed organisms share common biogeographic histories varies depending on the strength and specificity of ecological interactions between them. To test this hypothesis, we examined demographic and phylogeographic patterns in a highly specialized, coevolved community – Joshua trees (Yucca brevifolia) and their associated yucca moths. This tightly-integrated, mutually interdependent community is known to have experienced significant range changes at the end of the last glacial period, so there is a strong a priori expectation that these organisms will show common signatures of demographic and distributional changes over time. Using a database of >5000 GPS records for Joshua trees, and multi-locus DNA sequence data from the Joshua tree and four species of yucca moth, we combined paleaodistribution modeling with coalescent-based analyses of demographic and phylgeographic history. We extensively evaluated the power of our methods to infer past population size and distributional changes by evaluating the effect of different inference procedures on our results, comparing our palaeodistribution models to Pleistocene-aged packrat midden records, and simulating DNA sequence data under a variety of alternative demographic histories. Together the results indicate that these organisms have shared a common history of population expansion, and that these expansions were broadly coincident in time. However, contrary to our expectations, none of our analyses indicated significant range or population size reductions at the end of the last glacial period, and the inferred demographic changes substantially predate Holocene climate changes
A General Framework for Updating Belief Distributions
We propose a framework for general Bayesian inference. We argue that a valid
update of a prior belief distribution to a posterior can be made for parameters
which are connected to observations through a loss function rather than the
traditional likelihood function, which is recovered under the special case of
using self information loss. Modern application areas make it is increasingly
challenging for Bayesians to attempt to model the true data generating
mechanism. Moreover, when the object of interest is low dimensional, such as a
mean or median, it is cumbersome to have to achieve this via a complete model
for the whole data distribution. More importantly, there are settings where the
parameter of interest does not directly index a family of density functions and
thus the Bayesian approach to learning about such parameters is currently
regarded as problematic. Our proposed framework uses loss-functions to connect
information in the data to functionals of interest. The updating of beliefs
then follows from a decision theoretic approach involving cumulative loss
functions. Importantly, the procedure coincides with Bayesian updating when a
true likelihood is known, yet provides coherent subjective inference in much
more general settings. Connections to other inference frameworks are
highlighted.Comment: This is the pre-peer reviewed version of the article "A General
Framework for Updating Belief Distributions", which has been accepted for
publication in the Journal of Statistical Society - Series B. This article
may be used for non-commercial purposes in accordance with Wiley Terms and
Conditions for Self-Archivin
Reliable Generation of Native-Like Decoys Limits Predictive Ability in Fragment-Based Protein Structure Prediction
Our previous work with fragment-assembly methods has demonstrated specific deficiencies in conformational sampling behaviour that, when addressed through improved sampling algorithms, can lead to more reliable prediction of tertiary protein structure when good fragments are available, and when score values can be relied upon to guide the search to the native basin. In this paper, we present preliminary investigations into two important questions arising from more difficult prediction problems. First, we investigated the extent to which native-like conformational states are generated during multiple runs of our search protocols. We determined that, in cases of difficult prediction, native-like decoys are rarely or never generated. Second, we developed a scheme for decoy retention that balances the objectives of retaining low-scoring structures and retaining conformationally diverse structures sampled during the course of the search. Our method succeeds at retaining more diverse sets of structures, and, for a few targets, more native-like solutions are retained as compared to our original, energy-based retention scheme. However, in general, we found that the rate at which native-like structural states are generated has a much stronger effect on eventual distributions of predictive accuracy in the decoy sets, as compared to the specific decoy retention strategy used. We found that our protocols show differences in their ability to access native-like states for some targets, and this may explain some of the differences in predictive performance seen between these methods. There appears to be an interaction between fragment sets and move operators, which influences the accessibility of native-like structures for given targets. Our results point to clear directions for further improvements in fragment-based methods, which are likely to enable higher accuracy predictions
From RNA folding to inverse folding: a computational study: Folding and design of RNA molecules
Since the discovery of the structure of DNA in the early 1953s and its double-chained complement of information hinting at its means of replication, biologists have recognized the strong connection between molecular structure and function. In the past two decades, there has been a surge of research on an ever-growing class of RNA molecules that are non-coding but whose various folded structures allow a diverse array of vital functions. From the well-known splicing and modification of ribosomal RNA, non-coding RNAs (ncRNAs) are now known to be intimately involved in possibly every stage of DNA translation and protein transcription, as well as RNA signalling and gene regulation processes.
Despite the rapid development and declining cost of modern molecular methods, they typically can only describe ncRNA's structural conformations in vitro, which differ from their in vivo counterparts. Moreover, it is estimated that only a tiny fraction of known ncRNAs has been documented experimentally, often at a high cost. There is thus a growing realization that computational methods must play a central role in the analysis of ncRNAs. Not only do computational approaches hold the promise of rapidly characterizing many ncRNAs yet to be described, but there is also the hope that by understanding the rules that determine their structure, we will gain better insight into their function and design. Many studies revealed that the ncRNA functions are performed by high-level structures that often depend on their low-level structures, such as the secondary structure. This thesis studies the computational folding mechanism and inverse folding of ncRNAs at the secondary level.
In this thesis, we describe the development of two bioinformatic tools that have the potential to improve our understanding of RNA secondary structure. These tools are as follows: (1) RAFFT for efficient prediction of pseudoknot-free RNA folding pathways using the fast Fourier transform (FFT)}; (2) aRNAque, an evolutionary algorithm inspired by Lévy flights for RNA inverse folding with or without pseudoknot (A secondary structure that often poses difficulties for bio-computational detection).
The first tool, RAFFT, implements a novel heuristic to predict RNA secondary structure formation pathways that has two components: (i) a folding algorithm and (ii) a kinetic ansatz. When considering the best prediction in the ensemble of 50 secondary structures predicted by RAFFT, its performance matches the recent deep-learning-based structure prediction methods. RAFFT also acts as a folding kinetic ansatz, which we tested on two RNAs: the CFSE and a classic bi-stable sequence. In both test cases, fewer structures were required to reproduce the full kinetics, whereas known methods (such as Treekin) required a sample of 20,000 structures and more.
The second tool, aRNAque, implements an evolutionary algorithm (EA) inspired by the Lévy flight, allowing both local global search and which supports pseudoknotted target structures. The number of point mutations at every step of aRNAque's EA is drawn from a Zipf distribution. Therefore, our proposed method increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. The overall performance showed improved empirical results compared to existing tools through intensive benchmarks on both pseudoknotted and pseudoknot-free datasets.
In conclusion, we highlight some promising extensions of the versatile RAFFT method to RNA-RNA interaction studies. We also provide an outlook on both tools' implications in studying evolutionary dynamics
Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model
Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny
Investigating DNA-, RNA-, and protein-based features as a means to discriminate pathogenic synonymous variants
Synonymous single-nucleotide variants (SNVs), although they do not alter the encoded protein sequences, have been implicated in many genetic diseases. Experimental studies indicate that synonymous SNVs can lead to changes in the secondary and tertiary structures of DNA and RNA, thereby affecting translational efficiency, cotranslational protein folding as well as the binding of DNA-/RNA-binding proteins. However, the importance of these various features in disease phenotypes is not clearly understood. Here, we have built a support vector machine (SVM) model (termed DDIG-SN) as a means to discriminate disease-causing synonymous variants. The model was trained and evaluated on nearly 900 disease-causing variants. The method achieves robust performance with the area under the receiver operating characteristic curve of 0.84 and 0.85 for protein-stratified 10-fold cross-validation and independent testing, respectively. We were able to show that the disease-causing effects in the immediate proximity to exon–intron junctions (1–3 bp) are driven by the loss of splicing motif strength, whereas the gain of splicing motif strength is the primary cause in regions further away from the splice site (4–69 bp). The method is available as a part of the DDIG server at http://sparks-lab.org/ddig
Quantitative Immunology for Physicists
The adaptive immune system is a dynamical, self-organized multiscale system
that protects vertebrates from both pathogens and internal irregularities, such
as tumours. For these reason it fascinates physicists, yet the multitude of
different cells, molecules and sub-systems is often also petrifying. Despite
this complexity, as experiments on different scales of the adaptive immune
system become more quantitative, many physicists have made both theoretical and
experimental contributions that help predict the behaviour of ensembles of
cells and molecules that participate in an immune response. Here we review some
recent contributions with an emphasis on quantitative questions and
methodologies. We also provide a more general methods section that presents
some of the wide array of theoretical tools used in the field.Comment: 78 page revie
Frustration in Biomolecules
Biomolecules are the prime information processing elements of living matter.
Most of these inanimate systems are polymers that compute their structures and
dynamics using as input seemingly random character strings of their sequence,
following which they coalesce and perform integrated cellular functions. In
large computational systems with a finite interaction-codes, the appearance of
conflicting goals is inevitable. Simple conflicting forces can lead to quite
complex structures and behaviors, leading to the concept of "frustration" in
condensed matter. We present here some basic ideas about frustration in
biomolecules and how the frustration concept leads to a better appreciation of
many aspects of the architecture of biomolecules, and how structure connects to
function. These ideas are simultaneously both seductively simple and perilously
subtle to grasp completely. The energy landscape theory of protein folding
provides a framework for quantifying frustration in large systems and has been
implemented at many levels of description. We first review the notion of
frustration from the areas of abstract logic and its uses in simple condensed
matter systems. We discuss then how the frustration concept applies
specifically to heteropolymers, testing folding landscape theory in computer
simulations of protein models and in experimentally accessible systems.
Studying the aspects of frustration averaged over many proteins provides ways
to infer energy functions useful for reliable structure prediction. We discuss
how frustration affects folding, how a large part of the biological functions
of proteins are related to subtle local frustration effects and how frustration
influences the appearance of metastable states, the nature of binding
processes, catalysis and allosteric transitions. We hope to illustrate how
Frustration is a fundamental concept in relating function to structural
biology.Comment: 97 pages, 30 figure
- …