3 research outputs found

    Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index

    Get PDF
    BACKGROUND: Knowledge of protein domain boundaries is critical for the characterisation and understanding of protein function. The ability to identify domains without the knowledge of the structure ā€“ by using sequence information only ā€“ is an essential step in many types of protein analyses. In this present study, we demonstrate that the performance of DomainDiscovery is improved significantly by including the inter-domain linker index value for domain identification from sequence-based information. Improved DomainDiscovery uses a Support Vector Machine (SVM) approach and a unique training dataset built on the principle of consensus among experts in defining domains in protein structure. The SVM was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence. RESULTS: Improved DomainDiscovery is compared with other methods by benchmarking against a structurally non-redundant dataset and also CASP5 targets. Improved DomainDiscovery achieves 70% accuracy for domain boundary identification in multi-domains proteins. CONCLUSION: Improved DomainDiscovery compares favourably to the performance of other methods and excels in the identification of domain boundaries for multi-domain proteins as a result of introducing support vector machine with benchmark_2 dataset

    Identifying foldable regions in protein sequence from the hydrophobic signal

    Get PDF
    Structural genomics initiatives aim to elucidate representative 3D structures for the majority of protein families over the next decade, but many obstacles must be overcome. The correct design of constructs is extremely important since many proteins will be too large or contain unstructured regions and will not be amenable to crystallization. It is therefore essential to identify regions in protein sequences that are likely to be suitable for structural study. Scooby-Domain is a fast and simple method to identify globular domains in protein sequences. Domains are compact units of protein structure and their correct delineation will aid structural elucidation through a divide-and-conquer approach. Scooby-Domain predictions are based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method employs an A*-search to identify sequence regions that form a globular structure and those that are unstructured. On a test set of 173 proteins with consensus CATH and SCOP domain definitions, Scooby-Domain has a sensitivity of 50% and an accuracy of 29%, which is better than current state-of-the-art methods. The method does not rely on homology searches and, therefore, can identify previously unknown domains

    Ab initio methods for protein structure prediction

    Get PDF
    Recent breakthroughs in DNA and protein sequencing have unlocked many secrets of molecular biology. A complete understanding of gene function, however, requires a protein structure in addition to its sequence. Modern protein structure determination methods such as NMR, cryo-EM and X-ray crystallography are woefully unable to keep pace with automated sequencing techniques, creating a serious gap between available sequences and structures. This thesis describes several ab initio computational methods designed in the near-term to facilitate structure determination experiments, and in the long-term goal to predict protein structure completely and reliably. First, VecFold is a novel method for predicting the global tertiary structure topologies of proteins. VecFold applies fragment assembly to construct structural models from a target sequence by folding a chain of predicted secondary structure elements; these elements are represented either as Calpha-based rigid bodies or as vectors. The knowledge-based energy function OPUS-Ca or a knowledge-based geometric packing potential is used to guide the folding process. The newest version of VecFold is demonstrated to modestly outperform Rosetta, one of the leading ab initio predictors, on the CASP8 benchmark set. In our protein domain boundary prediction method OPUS-Dom, VecFold generates a large ensemble of folded structure models, and the domain boundaries of each model are labeled by a domain parsing algorithm. OPUS-Dom then derives consensus domain boundaries from the statistical distribution of the putative boundaries; the original version is also aided by three empirical sequence-based domain profiles. The latest version of OPUS-Dom outperformed, in terms of prediction sensitivity, several state-of-the-art domain prediction algorithms over various multi-domain protein sets. Even though many VecFold-generated structures contain large errors, collectively these structures provide a more robust delineation of domain boundaries. The success of OPUS-Dom suggests that the arrangement of protein domains is more a consequence of limited coordination patterns per domain arising from tertiary packing of secondary structure segments, rather than sequence-specific constraints. Finally, the knowledge-based energy function OPUS-Core was applied to the problem of protein folding core prediction, and it was shown to outpredict two leading computational methods on a benchmark set of 29 well-characterized protein targets
    corecore