64 research outputs found

    Structure-based prediction of protein allostery

    Get PDF
    Allostery is the functional change at one site on a protein caused by a change at a distant site. In order for the benefits of allostery to be taken advantage of, both for basic understanding of proteins and to develop new classes of drugs, the structure-based prediction of allosteric binding sites, modulators and communication pathways is necessary. Here we review the recently emerging field of allosteric prediction, focusing mainly on computational methods. We also describe the search for cryptic binding pockets and attempts to design allostery into proteins. The development and adoption of such methods is essential or the long-preached potential of allostery will remain elusive

    Protein flexibility, not disorder, is intrinsic to molecular recognition.

    Get PDF
    An 'intrinsically disordered protein' (IDP) is assumed to be unfolded in the cell and perform its biological function in that state. We contend that most intrinsically disordered proteins are in fact proteins waiting for a partner (PWPs), parts of a multi-component complex that do not fold correctly in the absence of other components. Flexibility, not disorder, is an intrinsic property of proteins, exemplified by X-ray structures of many enzymes and protein-protein complexes. Disorder is often observed with purified proteins in vitro and sometimes also in crystals, where it is difficult to distinguish from flexibility. In the crowded environment of the cell, disorder is not compatible with the known mechanisms of protein-protein recognition, and, foremost, with its specificity. The self-assembly of multi-component complexes may, nevertheless, involve the specific recognition of nascent polypeptide chains that are incompletely folded, but then disorder is transient, and it must remain under the control of molecular chaperones and of the quality control apparatus that obviates the toxic effects it can have on the cell

    Protein structure-based evaluation of missense variants: Resources, challenges and future directions.

    Get PDF
    We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning

    AlloPred: prediction of allosteric pockets on proteins using normal mode perturbation analysis

    Get PDF
    BACKGROUND: Despite being hugely important in biological processes, allostery is poorly understood and no universal mechanism has been discovered. Allosteric drugs are a largely unexplored prospect with many potential advantages over orthosteric drugs. Computational methods to predict allosteric sites on proteins are needed to aid the discovery of allosteric drugs, as well as to advance our fundamental understanding of allostery. RESULTS: AlloPred, a novel method to predict allosteric pockets on proteins, was developed. AlloPred uses perturbation of normal modes alongside pocket descriptors in a machine learning approach that ranks the pockets on a protein. AlloPred ranked an allosteric pocket top for 23 out of 40 known allosteric proteins, showing comparable and complementary performance to two existing methods. In 28 of 40 cases an allosteric pocket was ranked first or second. The AlloPred web server, freely available at http://www.sbg.bio.ic.ac.uk/allopred/home, allows visualisation and analysis of predictions. The source code and dataset information are also available from this site. CONCLUSIONS: Perturbation of normal modes can enhance our ability to predict allosteric sites on proteins. Computational methods such as AlloPred assist drug discovery efforts by suggesting sites on proteins for further experimental study

    k-SLAM: Accurate and ultra-fast taxonomic classification and gene identification for large metagenomic datasets

    Get PDF
    k-SLAM is a highly e cient algorithm for the characterisa- tion of metagenomic data. Unlike other ultra-fast metage- nomic classi ers, full sequence alignment is performed allow- ing for gene identi cation and variant calling in addition to accurate taxonomic classi cation. A k -mer based method provides greater taxonomic accuracy than other classi ers and a three orders of magnitude speed increase over align- ment based approaches. The use of alignments to nd vari- ants and genes along with their taxonomic origins enables novel strains to be characterised. k-SLAM's speed allows a full taxonomic classi cation and gene identi cation to be tractable on modern large datasets. A pseudo-assembly method is used to increase classi cation accuracy by up to 40% for species which have high sequence homology within their genus

    Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types

    No full text
    Table S5. Disease–cell-type association P values computed using the GSC method. (XLS 686 kb

    Landscape of pleiotropic proteins causing human disease: structural and system biology insights

    Get PDF
    Pleiotropy is the phenomenon by which the same gene can result in multiple phenotypes. Pleiotropic proteins are emerging as important contributors to rare and common disorders. Nevertheless , little is known on the mechanisms underlying pleiotropy and the characteris tic of pleiotropic proteins. We analysed disease - causing proteins reported in Uni P rot and observed that 12% are pleiotropic ( variants in the same protein cause more than one disease). Pleiotropic proteins were enriched in deleterious and rare variants , bu t not in common variants . Pleiotropic proteins were more likely to be involved in the pathogenesis of n eoplasms, neurological and circulatory diseases, and congenital malformations, whereas non - pleiotropic proteins in endocrine and metabolic disorders . Pleiotropic proteins were more essential and ha d a higher number of interacting partners compared to non -pleiotropic proteins. S ignificantly more pleiotropic than non - pleiotropic proteins contained at least one intrinsically long disordered region (p<0.001 ). Deleterious variants occurring in structurally disordered regions were more commonly found in pleiotropic, rather than non - pleiotropic proteins. 14 In conclusion, pleiotropic proteins are an important contributor to human disease. They represent a biologi cally different class of proteins compared to non - pleiotropic proteins and a better understanding of their characteristics and genetic variants, can greatly aid in the interpretation of genetic studies and drug design

    PhenoRank: reducing study bias in gene prioritisation through simulation

    No full text
    Motivation: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritise genes most likely to be disease causing through the integration of biological data, including protein-protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. Results: We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritises disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritisation methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritise genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritisation methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC=0.87, EXOMISER AUC=0.71, PRINCE AUC=0.83, P < 2.2 Ă— 10-16). Availability: PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. Contact: [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online

    AMBIENT: Active Modules for Bipartite Networks - using high-throughput transcriptomic data to dissect metabolic response

    Get PDF
    BACKGROUND: With the continued proliferation of high-throughput biological experiments, there is a pressing need for tools to integrate the data produced in ways that produce biologically meaningful conclusions. Many microarray studies have analysed transcriptomic data from a pathway perspective, for instance by testing for KEGG pathway enrichment in sets of upregulated genes. However, the increasing availability of species-specific metabolic models provides the opportunity to analyse these data in a more objective, system-wide manner. RESULTS: Here we introduce ambient (Active Modules for Bipartite Networks), a simulated annealing approach to the discovery of metabolic subnetworks (modules) that are significantly affected by a given genetic or environmental change. The metabolic modules returned by ambient are connected parts of the bipartite network that change coherently between conditions, providing a more detailed view of metabolic changes than standard approaches based on pathway enrichment. CONCLUSIONS: ambient is an effective and flexible tool for the analysis of high-throughput data in a metabolic context. The same approach can be applied to any system in which reactions (or metabolites) can be assigned a score based on some biological observation, without the limitation of predefined pathways. A Python implementation of ambient is available at http://www.theosysbio.bio.ic.ac.uk/ambient
    • …
    corecore