968 research outputs found

    A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites

    Get PDF
    Protein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference. Here, we map 2.4 million population variants to 5885 protein families and quantify residue-level constraint with a new Missense Enrichment Score (MES). Analysis of 61,214 structures from the PDB spanning 3661 families shows that missense depleted sites are enriched in buried residues or those involved in small-molecule or protein binding. MES is complementary to evolutionary conservation and a combined analysis allows a new classification of residues according to a conservation plane. This approach finds functional residues that are evolutionarily diverse, which can be related to specificity, as well as family-wide conserved sites that are critical for folding or function. We also find a possible contrast between lethal and non-lethal pathogenic sites, and a surprising clinical variant hot spot at a subset of missense enriched positions

    Classification of likely functional class for ligand binding sites identified from fragment screening

    Get PDF
    Fragment screening is used to identify binding sites and leads in drug discovery, but it is often unclear which binding sites are functionally important. Here, data from 37 experiments, and 1309 protein structures binding to 1601 ligands were analysed. A method to group ligands by binding sites is introduced and sites clustered according to profiles of relative solvent accessibility. This identified 293 unique ligand binding sites, grouped into four clusters (C1-4). C1 includes larger, buried, conserved, and population missense-depleted sites, enriched in known functional sites. C4 comprises smaller, accessible, divergent, missense-enriched sites, depleted in functional sites. A site in C1 is 28 times more likely to be functional than one in C4. Seventeen sites, which to the best of our knowledge are novel, in 13 proteins are identified as likely to be functionally important with examples from human tenascin and 5-aminolevulinate synthase highlighted. A multi-layer perceptron, and K-nearest neighbours model are presented to predict cluster labels for ligand binding sites with an accuracy of 96% and 100%, respectively, so allowing functional classification of sites for proteins not in this set. Our findings will be of interest to those studying protein-ligand interactions and developing new drugs or function modulators

    Analytical challenges in estimating the effect of exposures that are bounded by follow-up time: experiences from the Blood Stream Infection—Focus on Outcomes study

    Get PDF
    Abstract Objective To illustrate the challenges of estimating the effect of an exposure that is bounded by duration of follow-up on all-cause 28-day mortality, whilst simultaneously addressing missing data and time-varying covariates. Study design and methods BSI-FOO is a multicentre cohort study with the primary aim of quantifying the effect of modifiable risk factors, including time to initiation of therapy, on all-cause 28-day mortality in patients with bloodstream infection. The primary analysis involved two Cox proportional hazard models, first one for non-modifiable risk factors and second one for modifiable risk factors, with a risk score calculated from the first model included as a covariate in the second model. Modifiable risk factors considered in this study were recorded daily for a maximum of 28 days after infection. Follow-up was split at daily intervals from day 0 to 28 with values of daily collected data updated at each interval (i.e., one row per patient per day). Analytical challenges Estimating the effect of time to initiation of treatment on survival is analytically challenging since only those who survive to time t can wait until time t to start treatment, introducing immortal time bias. Time-varying covariates representing cumulative counts were used for variables bounded by survival time e.g. the cumulative count of days before first receipt of treatment. Multiple imputation using chained equations was used to impute missing data, using conditional imputation to avoid imputing non-applicable data e.g. ward data after discharge. Conclusion Using time-varying covariates represented by cumulative counts within a one row per day per patient framework can reduce the risk of bias in effect estimates. The approach followed uses established methodology and is easily implemented in standard statistical packages

    Classification of likely functional class for ligand binding sites identified from fragment screening

    Get PDF
    Fragment screening is used to identify binding sites and leads in drug discovery, but it is often unclear which binding sites are functionally important. Here, data from 37 experiments, and 1309 protein structures binding to 1601 ligands were analysed. A method to group ligands by binding sites is introduced and sites clustered according to profiles of relative solvent accessibility. This identified 293 unique ligand binding sites, grouped into four clusters (C1-4). C1 includes larger, buried, conserved, and population missense-depleted sites, enriched in known functional sites. C4 comprises smaller, accessible, divergent, missense-enriched sites, depleted in functional sites. A site in C1 is 28 times more likely to be functional than one in C4. Seventeen sites, which to the best of our knowledge are novel, in 13 proteins are identified as likely to be functionally important with examples from human tenascin and 5-aminolevulinate synthase highlighted. A multi-layer perceptron, and K-nearest neighbours model are presented to predict cluster labels for ligand binding sites with an accuracy of 96% and 100%, respectively, so allowing functional classification of sites for proteins not in this set. Our findings will be of interest to those studying protein-ligand interactions and developing new drugs or function modulators
    corecore