Classification-driven and structure-assisted rule based annotation of protein function

Abstract

The Protein Information Resource (PIR) classifies protein sequences into homeomorphic and monophyletic PIRSF families for functional annotation of proteins. It is a network classification system based on the evolutionary relationships of whole proteins. Position-specific feature rules for annotating and propagating functional sites, active sites, and binding sites are being developed based on manually curated multiple sequence alignments and Hidden Markov Models (HMMs) of homeomorphic families and subfamilies, starting with those that contain at least one known 3D structure with experimentally verified site information. The active site information on proteins is taken from the PDB SITE records, the ligplots of interactions available in PDBSum database and the published scientific literature. We have developed a Rulebase curation interface that maps PIRSF families and protein structures based on sequence identity. The activity involves manual rule definition with visualization of sequences and structures, and computational propagation of site features along with evidence attribution. In our approach, the rule is a HMM developed from the conserved regions containing the active site residues in the multiple sequence alignment of a PIRSF family. The classification-driven rule-based approach for the propagation of functional features on the large body of protein sequences shall facilitate functional proteomics and genomic research

    Similar works