12 research outputs found

    Automatic structure classification of small proteins using random forest

    Get PDF
    <p>Abstract</p> <p><b>Background</b></p> <p>Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs.</p> <p><b>Result</b>s</p> <p>Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP <it>Class, Fold, Super-family </it>or <it>Family </it>levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases.</p> <p>Conclusions</p> <p>The utility of random forest in classifying domains from the place-holder classes of SCOP to the true <it>Class, Fold, Super-family </it>or <it>Family </it>levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.</p

    Characterization of a Gene Family Encoding SEA (Sea-urchin Sperm Protein, Enterokinase and Agrin)-Domain Proteins with Lectin-Like and Heme-Binding Properties from Schistosoma japonicum

    Get PDF
    BackgroundWe previously identified a novel gene family dispersed in the genome of Schistosoma japonicum by retrotransposon-mediated gene duplication mechanism. Although many transcripts were identified, no homolog was readily identifiable from sequence information.Methodology/Principal FindingsHere, we utilized structural homology modeling and biochemical methods to identify remote homologs, and characterized the gene products as SEA (sea-urchin sperm protein, enterokinase and agrin)-domain containing proteins. A common extracellular domain in this family was structurally similar to SEA-domain. SEA-domain is primarily a structural domain, known to assist or regulate binding to glycans. Recombinant proteins from three members of this gene family specifically interacted with glycosaminoglycans with high affinity, with potential implication in ligand acquisition and immune evasion. Similar approach was used to identify a heme-binding site on the SEA-domain. The heme-binding mode showed heme molecule inserted into a hydrophobic pocket, with heme iron putatively coordinated to two histidine axial ligands. Heme-binding properties were confirmed using biochemical assays and UV-visible absorption spectroscopy, which showed high affinity heme-binding (KD = 1.605×10?6 M) and cognate spectroscopic attributes of hexa-coordinated heme iron. The native proteins were oligomers, antigenic, and are localized on adult worm teguments and gastrodermis; major host-parasite interfaces and site for heme detoxification and acquisition.ConclusionsThe results suggest potential role, at least in the nucleation step of heme crystallization (hemozoin formation), and as receptors for heme uptake. Survival strategies exploited by parasites, including heme homeostasis mechanism in hemoparasites, are paramount for successful parasitism. Thus, assessing prospects for application in disease intervention is warranted

    Challenges in Data Life Cycle Management for Sustainable Cyber-Physical Production Systems

    No full text
    Rapid technological advances present new opportunities to use industrial Big Data to monitor and improve performance more systematically and more holistically. The on-going fourth industrial revolution, aka Industrie 4.0, holds the promise to support the implementation of sustainability principles in manufacturing. However, much of these opportunities are missed as social and environmental performance are still largely considered as an afterthought or add-on to business as usual. This paper reviews existing data life cycle models and discusses their usefulness for sustainable manufacturing performance management. Finally, we suggest possible directions for further research to promote more sustainable cyber-physical production systems
    corecore