10,011 research outputs found

    Frustration in Biomolecules

    Get PDF
    Biomolecules are the prime information processing elements of living matter. Most of these inanimate systems are polymers that compute their structures and dynamics using as input seemingly random character strings of their sequence, following which they coalesce and perform integrated cellular functions. In large computational systems with a finite interaction-codes, the appearance of conflicting goals is inevitable. Simple conflicting forces can lead to quite complex structures and behaviors, leading to the concept of "frustration" in condensed matter. We present here some basic ideas about frustration in biomolecules and how the frustration concept leads to a better appreciation of many aspects of the architecture of biomolecules, and how structure connects to function. These ideas are simultaneously both seductively simple and perilously subtle to grasp completely. The energy landscape theory of protein folding provides a framework for quantifying frustration in large systems and has been implemented at many levels of description. We first review the notion of frustration from the areas of abstract logic and its uses in simple condensed matter systems. We discuss then how the frustration concept applies specifically to heteropolymers, testing folding landscape theory in computer simulations of protein models and in experimentally accessible systems. Studying the aspects of frustration averaged over many proteins provides ways to infer energy functions useful for reliable structure prediction. We discuss how frustration affects folding, how a large part of the biological functions of proteins are related to subtle local frustration effects and how frustration influences the appearance of metastable states, the nature of binding processes, catalysis and allosteric transitions. We hope to illustrate how Frustration is a fundamental concept in relating function to structural biology.Comment: 97 pages, 30 figure

    Population genomics of domestic and wild yeasts

    Get PDF
    The natural genetics of an organism is determined by the distribution of sequences of its genome. Here we present one- to four-fold, with some deeper, coverage of the genome sequences of over seventy isolates of the domesticated baker's yeast, _Saccharomyces cerevisiae_, and its closest relative, the wild _S. paradoxus_, which has never been associated with human activity. These were collected from numerous geographic locations and sources (including wild, clinical, baking, wine, laboratory and food spoilage). These sequences provide an unprecedented view of the population structure, natural (and artificial) selection and genome evolution in these species. Variation in gene content, SNPs, indels, copy numbers and transposable elements provide insights into the evolution of different lineages. Phenotypic variation broadly correlates with global genome-wide phylogenetic relationships however there is no correlation with source. _S. paradoxus_ populations are well delineated along geographic boundaries while the variation among worldwide _S. cerevisiae_ isolates show less differentiation and is comparable to a single _S. paradoxus_ population. Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of _S. cerevisiae_ shows a few well defined geographically isolated lineages and many different mosaics of these lineages, supporting the notion that human influence provided the opportunity for outbreeding and production of new combinations of pre-existing variation

    Automatically extracting functionally equivalent proteins from SwissProt

    Get PDF
    In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot

    Nonlinear Models Using Dirichlet Process Mixtures

    Full text link
    We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component. We use simulated data to compare the performance of this new approach to a simple multinomial logit (MNL) model, an MNL model with quadratic terms, and a decision tree model. We also evaluate our approach on a protein fold classification problem, and find that our model provides substantial improvement over previous methods, which were based on Neural Networks (NN) and Support Vector Machines (SVM). Folding classes of protein have a hierarchical structure. We extend our method to classification problems where a class hierarchy is available. We find that using the prior information regarding the hierarchical structure of protein folds can result in higher predictive accuracy

    Phylogenomic analysis reveals extensive phylogenetic mosaicism in the Human GPCR Superfamily

    Get PDF
    A novel high throughput phylogenomic analysis (HTP) was applied to the rhodopsin G-protein coupled receptor (GPCR) family. Instances of phylogenetic mosaicism between receptors were found to be frequent, often as instances of correlated mosaicism and repeated mosaicism. A null data set was constructed with the same phylogenetic topology as the rhodopsin GPCRs. Comparison of the two data sets revealed that mosaicism was found in GPCRs in a higher frequency than would be expected by homoplasy or the effects of topology alone. Various evolutionary models of differential conservation, recombination and homoplasy are explored which could result in the patterns observed in this analysis. We find that the results are most consistent with frequent recombination events. A complex evolutionary history is illustrated in which it is likely frequent recombination has endowed GPCRs with new functions. The pattern of mosaicism is shown to be informative for functional prediction for orphan receptors. HTP analysis is complementary to conventional phylogenomic analyses revealing mosaicism that would not otherwise have been detectable through conventional phylogenetics

    Islands of linkage in an ocean of pervasive recombination reveals two-speed evolution of human cytomegalovirus genomes

    Get PDF
    Human cytomegalovirus (HCMV) infects most of the population worldwide, persisting throughout the host's life in a latent state with periodic episodes of reactivation. While typically asymptomatic, HCMV can cause fatal disease among congenitally infected infants and immunocompromised patients. These clinical issues are compounded by the emergence of antiviral resistance and the absence of an effective vaccine, the development of which is likely complicated by the numerous immune evasins encoded by HCMV to counter the host's adaptive immune responses, a feature that facilitates frequent super-infections. Understanding the evolutionary dynamics of HCMV is essential for the development of effective new drugs and vaccines. By comparing viral genomes from uncultivated or low-passaged clinical samples of diverse origins, we observe evidence of frequent homologous recombination events, both recent and ancient, and no structure of HCMV genetic diversity at the whole-genome scale. Analysis of individual gene-scale loci reveals a striking dichotomy: while most of the genome is highly conserved, recombines essentially freely and has evolved under purifying selection, 21 genes display extreme diversity, structured into distinct genotypes that do not recombine with each other. Most of these hyper-variable genes encode glycoproteins involved in cell entry or escape of host immunity. Evidence that half of them have diverged through episodes of intense positive selection suggests that rapid evolution of hyper-variable loci is likely driven by interactions with host immunity. It appears that this process is enabled by recombination unlinking hyper-variable loci from strongly constrained neighboring sites. It is conceivable that viral mechanisms facilitating super-infection have evolved to promote recombination between diverged genotypes, allowing the virus to continuously diversify at key loci to escape immune detection, while maintaining a genome optimally adapted to its asymptomatic infectious lifecycle

    Genetic diversity, infection prevalence, and possible transmission routes of Bartonella spp. in vampire bats

    Get PDF
    Bartonella spp. are globally distributed bacteria that cause endocarditis in humans and domestic animals. Recent work has suggested bats as zoonotic reservoirs of some human Bartonella infections; however, the ecological and spatiotemporal patterns of infection in bats remain largely unknown. Here we studied the genetic diversity, prevalence of infection across seasons and years, individual risk factors, and possible transmission routes of Bartonella in populations of common vampire bats (Desmodus rotundus) in Peru and Belize, for which high infection prevalence has previously been reported. Phylogenetic analysis of the gltA gene for a subset of PCR-positive blood samples revealed sequences that were related to Bartonella described from vampire bats from Mexico, other Neotropical bat species, and streblid bat flies. Sequences associated with vampire bats clustered significantly by country but commonly spanned Central and South America, implying limited spatial structure. Stable and nonzero Bartonella prevalence between years supported endemic transmission in all sites. The odds of Bartonella infection for individual bats was unrelated to the intensity of bat flies ectoparasitism, but nearly all infected bats were infested, which precluded conclusive assessment of support for vector-borne transmission. While metagenomic sequencing found no strong evidence of Bartonella DNA in pooled bat saliva and fecal samples, we detected PCR positivity in individual saliva and feces, suggesting the potential for bacterial transmission through both direct contact (i.e., biting) and environmental (i.e., fecal) exposures. Further investigating the relative contributions of direct contact, environmental, and vector-borne transmission for bat Bartonella is an important next step to predict infection dynamics within bats and the risks of human and livestock exposures

    On the entropy of protein families

    Get PDF
    Proteins are essential components of living systems, capable of performing a huge variety of tasks at the molecular level, such as recognition, signalling, copy, transport, ... The protein sequences realizing a given function may largely vary across organisms, giving rise to a protein family. Here, we estimate the entropy of those families based on different approaches, including Hidden Markov Models used for protein databases and inferred statistical models reproducing the low-order (1-and 2-point) statistics of multi-sequence alignments. We also compute the entropic cost, that is, the loss in entropy resulting from a constraint acting on the protein, such as the fixation of one particular amino-acid on a specific site, and relate this notion to the escape probability of the HIV virus. The case of lattice proteins, for which the entropy can be computed exactly, allows us to provide another illustration of the concept of cost, due to the competition of different folds. The relevance of the entropy in relation to directed evolution experiments is stressed.Comment: to appear in Journal of Statistical Physic
    corecore