14 research outputs found

    FLORA: a novel method to predict protein function from structure in diverse superfamilies

    Get PDF
    Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

    Protein function prediction from structure in structural genomics and its contribution to the study of health and disease

    No full text
    The various structural genomics projects throughout the globe have developed high-throughput protein structure determination pipelines which have been responsible for the deposition of a vast number of protein structures. As a consequence of the need for rapid data release and their target selection strategy, these projects have deposited a large number of proteins with little or no functional information. As the experimental characterization of protein function is expensive and time consuming, the bio-informatics community was prompted to address the problem of protein function prediction from sequence and structure. Over the years many methods have been developed and show varying degrees of success. Here we will discuss the main types of approach, the problems faced and, with examples from the Midwest Center for Structural Genomics (MCSG), illustrate how these structures and the techniques developed can have a significant impact on the study of health and disease
    corecore