51 research outputs found

    AI is a viable alternative to high throughput screening: a 318-target study

    Get PDF
    : High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires physical compounds, which limits coverage of accessible chemical space. Computational approaches combined with vast on-demand chemical libraries can access far greater chemical space, provided that the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our AtomNet® convolutional neural network successfully finds novel hits across every major therapeutic area and protein class. We address historical limitations of computational screening by demonstrating success for target proteins without known binders, high-quality X-ray crystal structures, or manual cherry-picking of compounds. We show that the molecules selected by the AtomNet® model are novel drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical results suggest that computational methods can substantially replace HTS as the first step of small-molecule drug discovery

    Intoxicação por monofluoroacetato em animais

    Full text link

    Biologically inspired de novo protein structure prediction

    No full text
    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during fragment library generation. Using this information, we developed Flib and shown that it generates fragment libraries with higher precision and coverage than two other methods. We explored co-evolution to identify pairs of residues that are in contact, which were then used to improve model generation. We performed a comparative analysis of nine methods in terms of their precision and their usefulness to de novo structure prediction. Our results show that metaPSICOV stage 2 produces the most accurate predictions and that metaPSICOV stage 1 generates the best modelling results. In general, contact predictors are good at identifying contacts between β-strands and bad at identifying contacts between α-helices. We also show that the ratio of satisfied predicted contacts can be used to assess whether correct models were generated for a given target. We also investigated whether the biological process of cotranslational protein folding, the notion that proteins fold as they are being synthesized, can be used to improve de novo protein structure prediction. Our tool for this investigation is SAINT2. SAINT2 differs from conventional fragment-assembly approaches as it is able to perform predictions sequentially from N to C-terminus, starting with a small peptide that is extended as the simulation progresses (SAINT2 Cotranslational). SAINT2 is also able to generate decoys in a standard non-sequential fashion (SAINT2 In Vitro). We compared SAINT2 Cotranslational to SAINT2 In Vitro and shown that SAINT2 Cotranslational generally produces better answers, generating an individual decoy between 1.5 to 2.5 times faster than SAINT2 In Vitro. Our results suggest that biologically inspired structure prediction can improve search heuristics and final model quality.</p

    Biologically inspired de novo protein structure prediction

    No full text
    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during fragment library generation. Using this information, we developed Flib and shown that it generates fragment libraries with higher precision and coverage than two other methods. We explored co-evolution to identify pairs of residues that are in contact, which were then used to improve model generation. We performed a comparative analysis of nine methods in terms of their precision and their usefulness to de novo structure prediction. Our results show that metaPSICOV stage 2 produces the most accurate predictions and that metaPSICOV stage 1 generates the best modelling results. In general, contact predictors are good at identifying contacts between &beta;-strands and bad at identifying contacts between &alpha;-helices. We also show that the ratio of satisfied predicted contacts can be used to assess whether correct models were generated for a given target. We also investigated whether the biological process of cotranslational protein folding, the notion that proteins fold as they are being synthesized, can be used to improve de novo protein structure prediction. Our tool for this investigation is SAINT2. SAINT2 differs from conventional fragment-assembly approaches as it is able to perform predictions sequentially from N to C-terminus, starting with a small peptide that is extended as the simulation progresses (SAINT2 Cotranslational). SAINT2 is also able to generate decoys in a standard non-sequential fashion (SAINT2 In Vitro). We compared SAINT2 Cotranslational to SAINT2 In Vitro and shown that SAINT2 Cotranslational generally produces better answers, generating an individual decoy between 1.5 to 2.5 times faster than SAINT2 In Vitro. Our results suggest that biologically inspired structure prediction can improve search heuristics and final model quality.</p

    RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold.

    No full text
    While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated for a target, and determine whether this model is likely to be correct. We have developed Random Forest Quality Assessment (RFQAmodel), which assesses whether models produced by a protein structure prediction pipeline have the correct fold. RFQAmodel uses a combination of existing quality assessment scores with two predicted contact map alignment scores. These alignment scores are able to identify correct models for targets that are not otherwise captured. Our classifier was trained on a large set of protein domains that are structurally diverse and evenly balanced in terms of protein features known to have an effect on modelling success, and then tested on a second set of 244 protein domains with a similar spread of properties. When models for each target in this second set were ranked according to the RFQAmodel score, the highest-ranking model had a high-confidence RFQAmodel score for 67 modelling targets, of which 52 had the correct fold. At the other end of the scale RFQAmodel correctly predicted that for 59 targets the highest-ranked model was incorrect. In comparisons to other methods we found that RFQAmodel is better able to identify correct models for targets where only a few of the models are correct. We found that RFQAmodel achieved a similar performance on the model sets for CASP12 and CASP13 free-modelling targets. Finally, by iteratively generating models and running RFQAmodel until a model is produced that is predicted to be correct with high confidence, we demonstrate how such a protocol can be used to focus computational efforts on difficult modelling targets. RFQAmodel and the accompanying data can be downloaded from http://opig.stats.ox.ac.uk/resources

    STCRDab: the structural T-cell receptor database

    No full text
    The Structural T-cell Receptor Database (STCRDab; http://opig.stats.ox.ac.uk/webapps/stcrdab) is an online resource that automatically collects and curates TCR structural data from the Protein Data Bank. For each entry, the database provides annotations, such as the α/β or γ/δ chain pairings, major histocompatibility complex details, and where available, antigen binding affinities. In addition, the orientation between the variable domains and the canonical forms of the complementarity-determining region loops are also provided. Users can select, view, and download individual or bulk sets of structures based on these criteria. Where available, STCRDab also finds antibody structures that are similar to TCRs, helping users explore the relationship between TCRs and antibodies
    corecore