5 research outputs found

    The enzymatic nature of an anonymous protein sequence cannot reliably be inferred from superfamily level structural information alone

    No full text
    As the largest fraction of any proteome does not carry out enzymatic functions, and in order to leverage 3D structural data for the annotation of increasingly higher volumes of sequence data, we wanted to assess the strength of the link between coarse grained structural data (i.e., homologous superfamily level) and the enzymatic versus non-enzymatic nature of protein sequences. To probe this relationship, we took advantage of 41 phylogenetically diverse (encompassing 11 distinct phyla) genomes recently sequenced within the GEBA initiative, for which we integrated structural information, as defined by CATH, with enzyme level information, as defined by Enzyme Commission (EC) numbers. This analysis revealed that only a very small fraction (about 1%) of domain sequences occurring in the analyzed genomes was found to be associated with homologous superfamilies strongly indicative of enzymatic function. Resorting to less stringent criteria to define enzyme versus non-enzyme biased structural classes or excluding highly prevalent folds from the analysis had only modest effect on this proportion. Thus, the low genomic coverage by structurally anchored protein domains strongly associated to catalytic activities indicates that, on its own, the power of coarse grained structural information to infer the general property of being an enzyme is rather limited
    corecore