57 research outputs found

    Comparison between Suitable Priors for Additive Bayesian Networks

    Full text link
    Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the R-package abn. The second prior belongs to the Student's t-distribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley's paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student's t-prior and the limited impact of the Lindley's paradox. Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure

    Sequence-based prediction for vaccine strain selection and identification of antigenic variability in foot-and-mouth disease virus

    Get PDF
    Identifying when past exposure to an infectious disease will protect against newly emerging strains is central to understanding the spread and the severity of epidemics, but the prediction of viral cross-protection remains an important unsolved problem. For foot-and-mouth disease virus (FMDV) research in particular, improved methods for predicting this cross-protection are critical for predicting the severity of outbreaks within endemic settings where multiple serotypes and subtypes commonly co-circulate, as well as for deciding whether appropriate vaccine(s) exist and how much they could mitigate the effects of any outbreak. To identify antigenic relationships and their predictors, we used linear mixed effects models to account for variation in pairwise cross-neutralization titres using only viral sequences and structural data. We identified those substitutions in surface-exposed structural proteins that are correlates of loss of cross-reactivity. These allowed prediction of both the best vaccine match for any single virus and the breadth of coverage of new vaccine candidates from their capsid sequences as effectively as or better than serology. Sub-sequences chosen by the model-building process all contained sites that are known epitopes on other serotypes. Furthermore, for the SAT1 serotype, for which epitopes have never previously been identified, we provide strong evidence - by controlling for phylogenetic structure - for the presence of three epitopes across a panel of viruses and quantify the relative significance of some individual residues in determining cross-neutralization. Identifying and quantifying the importance of sites that predict viral strain cross-reactivity not just for single viruses but across entire serotypes can help in the design of vaccines with better targeting and broader coverage. These techniques can be generalized to any infectious agents where cross-reactivity assays have been carried out. As the parameterization uses pre-existing datasets, this approach quickly and cheaply increases both our understanding of antigenic relationships and our power to control disease

    Integrated Analysis of Residue Coevolution and Protein Structure in ABC Transporters

    Get PDF
    Intraprotein side chain contacts can couple the evolutionary process of amino acid substitution at one position to that at another. This coupling, known as residue coevolution, may vary in strength. Conserved contacts thus not only define 3-dimensional protein structure, but also indicate which residue-residue interactions are crucial to a protein’s function. Therefore, prediction of strongly coevolving residue-pairs helps clarify molecular mechanisms underlying function. Previously, various coevolution detectors have been employed separately to predict these pairs purely from multiple sequence alignments, while disregarding available structural information. This study introduces an integrative framework that improves the accuracy of such predictions, relative to previous approaches, by combining multiple coevolution detectors and incorporating structural contact information. This framework is applied to the ABC-B and ABC-C transporter families, which include the drug exporter P-glycoprotein involved in multidrug resistance of cancer cells, as well as the CFTR chloride channel linked to cystic fibrosis disease. The predicted coevolving pairs are further analyzed based on conformational changes inferred from outward- and inward-facing transporter structures. The analysis suggests that some pairs coevolved to directly regulate conformational changes of the alternating-access transport mechanism, while others to stabilize rigid-body-like components of the protein structure. Moreover, some identified pairs correspond to residues previously implicated in cystic fibrosis

    Prevalence of Epistasis in the Evolution of Influenza A Surface Proteins

    Get PDF
    The surface proteins of human influenza A viruses experience positive selection to escape both human immunity and, more recently, antiviral drug treatments. In bacteria and viruses, immune-escape and drug-resistant phenotypes often appear through a combination of several mutations that have epistatic effects on pathogen fitness. However, the extent and structure of epistasis in influenza viral proteins have not been systematically investigated. Here, we develop a novel statistical method to detect positive epistasis between pairs of sites in a protein, based on the observed temporal patterns of sequence evolution. The method rests on the simple idea that a substitution at one site should rapidly follow a substitution at another site if the sites are positively epistatic. We apply this method to the surface proteins hemagglutinin and neuraminidase of influenza A virus subtypes H3N2 and H1N1. Compared to a non-epistatic null distribution, we detect substantial amounts of epistasis and determine the identities of putatively epistatic pairs of sites. In particular, using sequence data alone, our method identifies epistatic interactions between specific sites in neuraminidase that have recently been demonstrated, in vitro, to confer resistance to the drug oseltamivir; these epistatic interactions are responsible for widespread drug resistance among H1N1 viruses circulating today. This experimental validation demonstrates the predictive power of our method to identify epistatic sites of importance for viral adaptation and public health. We conclude that epistasis plays a large role in shaping the molecular evolution of influenza viruses. In particular, sites with , which would normally not be identified as positively selected, can facilitate viral adaptation through epistatic interactions with their partner sites. The knowledge of specific interactions among sites in influenza proteins may help us to predict the course of antigenic evolution and, consequently, to select more appropriate vaccines and drugs

    Modelling the Evolution and Spread of HIV Immune Escape Mutants

    Get PDF
    During infection with human immunodeficiency virus (HIV), immune pressure from cytotoxic T-lymphocytes (CTLs) selects for viral mutants that confer escape from CTL recognition. These escape variants can be transmitted between individuals where, depending upon their cost to viral fitness and the CTL responses made by the recipient, they may revert. The rates of within-host evolution and their concordant impact upon the rate of spread of escape mutants at the population level are uncertain. Here we present a mathematical model of within-host evolution of escape mutants, transmission of these variants between hosts and subsequent reversion in new hosts. The model is an extension of the well-known SI model of disease transmission and includes three further parameters that describe host immunogenetic heterogeneity and rates of within host viral evolution. We use the model to explain why some escape mutants appear to have stable prevalence whilst others are spreading through the population. Further, we use it to compare diverse datasets on CTL escape, highlighting where different sources agree or disagree on within-host evolutionary rates. The several dozen CTL epitopes we survey from HIV-1 gag, RT and nef reveal a relatively sedate rate of evolution with average rates of escape measured in years and reversion in decades. For many epitopes in HIV, occasional rapid within-host evolution is not reflected in fast evolution at the population level

    Phylodynamic Reconstruction Reveals Norovirus GII.4 Epidemic Expansions and their Molecular Determinants

    Get PDF
    Noroviruses are the most common cause of viral gastroenteritis. An increase in the number of globally reported norovirus outbreaks was seen the past decade, especially for outbreaks caused by successive genogroup II genotype 4 (GII.4) variants. Whether this observed increase was due to an upswing in the number of infections, or to a surveillance artifact caused by heightened awareness and concomitant improved reporting, remained unclear. Therefore, we set out to study the population structure and changes thereof of GII.4 strains detected through systematic outbreak surveillance since the early 1990s. We collected 1383 partial polymerase and 194 full capsid GII.4 sequences. A Bayesian MCMC coalescent analysis revealed an increase in the number of GII.4 infections during the last decade. The GII.4 strains included in our analyses evolved at a rate of 4.3–9.0×10−3 mutations per site per year, and share a most recent common ancestor in the early 1980s. Determinants of adaptation in the capsid protein were studied using different maximum likelihood approaches to identify sites subject to diversifying or directional selection and sites that co-evolved. While a number of the computationally determined adaptively evolving sites were on the surface of the capsid and possible subject to immune selection, we also detected sites that were subject to constrained or compensatory evolution due to secondary RNA structures, relevant in virus-replication. We highlight codons that may prove useful in identifying emerging novel variants, and, using these, indicate that the novel 2008 variant is more likely to cause a future epidemic than the 2007 variant. While norovirus infections are generally mild and self-limiting, more severe outcomes of infection frequently occur in elderly and immunocompromized people, and no treatment is available. The observed pattern of continually emerging novel variants of GII.4, causing elevated numbers of infections, is therefore a cause for concern

    Correlated Evolution of Nearby Residues in Drosophilid Proteins

    Get PDF
    Here we investigate the correlations between coding sequence substitutions as a function of their separation along the protein sequence. We consider both substitutions between the reference genomes of several Drosophilids as well as polymorphisms in a population sample of Zimbabwean Drosophila melanogaster. We find that amino acid substitutions are “clustered” along the protein sequence, that is, the frequency of additional substitutions is strongly enhanced within ≈10 residues of a first such substitution. No such clustering is observed for synonymous substitutions, supporting a “correlation length” associated with selection on proteins as the causative mechanism. Clustering is stronger between substitutions that arose in the same lineage than it is between substitutions that arose in different lineages. We consider several possible origins of clustering, concluding that epistasis (interactions between amino acids within a protein that affect function) and positional heterogeneity in the strength of purifying selection are primarily responsible. The role of epistasis is directly supported by the tendency of nearby substitutions that arose on the same lineage to preserve the total charge of the residues within the correlation length and by the preferential cosegregation of neighboring derived alleles in our population sample. We interpret the observed length scale of clustering as a statistical reflection of the functional locality (or modularity) of proteins: amino acids that are near each other on the protein backbone are more likely to contribute to, and collaborate toward, a common subfunction

    Inference of Co-Evolving Site Pairs: an Excellent Predictor of Contact Residue Pairs in Protein 3D structures

    Get PDF
    Residue-residue interactions that fold a protein into a unique three-dimensional structure and make it play a specific function impose structural and functional constraints on each residue site. Selective constraints on residue sites are recorded in amino acid orders in homologous sequences and also in the evolutionary trace of amino acid substitutions. A challenge is to extract direct dependences between residue sites by removing indirect dependences through other residues within a protein or even through other molecules. Recent attempts of disentangling direct from indirect dependences of amino acid types between residue positions in multiple sequence alignments have revealed that the strength of inferred residue pair couplings is an excellent predictor of residue-residue proximity in folded structures. Here, we report an alternative attempt of inferring co-evolving site pairs from concurrent and compensatory substitutions between sites in each branch of a phylogenetic tree. First, branch lengths of a phylogenetic tree inferred by the neighbor-joining method are optimized as well as other parameters by maximizing a likelihood of the tree in a mechanistic codon substitution model. Mean changes of quantities, which are characteristic of concurrent and compensatory substitutions, accompanied by substitutions at each site in each branch of the tree are estimated with the likelihood of each substitution. Partial correlation coefficients of the characteristic changes along branches between sites are calculated and used to rank co-evolving site pairs. Accuracy of contact prediction based on the present co-evolution score is comparable to that achieved by a maximum entropy model of protein sequences for 15 protein families taken from the Pfam release 26.0. Besides, this excellent accuracy indicates that compensatory substitutions are significant in protein evolution.Comment: 17 pages, 4 figures, and 4 tables with supplementary information of 5 figure
    corecore