6,169 research outputs found

    CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures

    Get PDF
    We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification

    New Algorithms for Protein Structure Comparison and Protein Structure Prediction

    Get PDF
    Proteins show a great variety of 3D conformations, which can be used to infer their evolutionary relationship and to classify them into more general groups; therefore algorithms of protein structure alignment, protein similarity search and protein structure prediction are very helpful for protein biologists. We developed new algorithms for the problems in this field. The algorithms are tested with structures from the Protein Data Bank (PDB) and SCOP, a Structure Classification of Protein Database. The experimental results show that our tools are more efficient than some well known systems for finding similar protein structures and predicting protein structures

    Sequence-specific sequence comparison using pairwise statistical significance

    Get PDF
    Sequence comparison is one of the most fundamental computational problems in bioinformatics for which many approaches have been and are still being developed. In particular, pairwise sequence alignment forms the crux of both DNA and protein sequence comparison techniques, which in turn forms the basis of many other applications in bioinformatics. Pairwise sequence alignment methods align two sequences using a substitution matrix consisting of pairwise scores of aligning different residues with each other (like BLOSUM62), and give an alignment score for the given sequence-pair. The biologists routinely use such pairwise alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is widely accepted that the relatedness of two sequences is better judged by statistical significance of the alignment score rather than by the alignment score alone. This research addresses the problem of accurately estimating statistical significance of pairwise alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence-specific. The major contributions of this research work are as follows. Firstly, using sequence-specific strategies for pairwise sequence alignment in conjunction with sequence-specific strategies for statistical significance estimation, wherein accurate methods for pairwise statistical significance estimation using standard, sequence-specific, and position-specific substitution matrices are developed. Secondly, using pairwise statistical significance to improve the performance of the most popular database search program PSI-BLAST. Thirdly, design and implementation of heuristics to speed-up pairwise statistical significance estimation by an factor of more than 200. The implementation of all the methods developed in this work is freely available online. With the all-pervasive application of sequence alignment methods in bioinformatics using the ever-increasing sequence data, this work is expected to offer useful contributions to the research community

    Template Based Modeling and Structural Refinement of Protein-Protein Interactions.

    Full text link
    Determining protein structures from sequence is a fundamental problem in molecular biology, as protein structure is essential to understanding protein function. In this study, I developed one of the first fully automated pipelines for template based quaternary structure prediction starting from sequence. Two critical steps for template based modeling are identifying the correct homologous structures by threading which generates sequence to structure alignments and refining the initial threading template coordinates closer to the native conformation. I developed SPRING (single-chain-based prediction of interactions and geometries), a monomer threading to dimer template mapping program, which was compared to the dimer co-threading program, COTH, using 1838 non homologous target complex structures. SPRING’s similarity score outperformed COTH in the first place ranking of templates, correctly identifying 798 and 527 interfaces respectively. More importantly the results were found to be complementary and the programs could be combined in a consensus based threading program showing a 5.1% improvement compared to SPRING. Template based modeling requires a structural analog being present in the PDB. A full search of the PDB, using threading and structural alignment, revealed that only 48.7% of the PDB has a suitable template whereas only 39.4% of the PDB has templates that can be identified by threading. In order to circumvent this, I included intramolecular domain-domain interfaces into the PDB library to boost template recognition of protein dimers; the merging of the two classes of interfaces improved recognition of heterodimers by 40% using benchmark settings. Next the template based assembly of protein complexes pipeline, TACOS, was created. The pipeline combines threading templates and domain knowledge from the PDB into a knowledge based energy score. The energy score is integrated into a Monte Carlo sampling simulation that drives the initial template closer to the native topology. The full pipeline was benchmarked using 350 non homologous structures and compared to two state of the art programs for dimeric structure prediction: ZDOCK and MODELLER. On average, TACOS models global and interface structure have a better quality than the models generated by MODELLER and ZDOCK.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135847/1/bgovi_1.pd

    Exploration of the Disambiguation of Amino Acid Types to Chi-1 Rotamer Types in Protein Structure Prediction and Design

    Full text link
    A protein’s global fold provide insight into function; however, function specificity is often detailed in sidechain orientation. Thus, determining the rotamer conformations is often crucial in the contexts of protein structure/function prediction and design. For all non-glycine and non-alanine types, chi-1 rotamers occupy a small number of discrete number of states. Herein, we explore the possibility of describing evolution from the perspective of the sidechains’ structure versus the traditional twenty amino acid types. To validate our hypothesis that this perspective is more crucial to our understanding of evolutionary relationships, we investigate its uses as evolutionary, substitution matrices for sequence alignments for fold recognition purposes and computational protein design with specific focus in designing beta sheet environments, where previous studies have been done on amino acid-types alone. Throughout this study, we also propose the concept of the “chi-1 rotamer sequence” that describes the chi-1 rotamer composition of a protein. We also present attempts to predict these sequences and real-value torsion angles from amino acid sequence information. First, we describe our developments of log-odds scoring matrices for sequence alignments. Log-odds substitution matrices are widely used in sequence alignments for their ability to determine evolutionary relationship between proteins. Traditionally, databases of sequence information guide the construction of these matrices which illustrates its power in discovering distant or weak homologs. Weak homologs, typically those that share low sequence identity (< 30%), are often difficult to identify when only using basic amino acid sequence alignment. While protein threading approaches have addressed this issue, many of these approaches include sequenced-based information or profiles guided by amino acid-based substitution matrices, namely BLOSUM62. Here, we generated a structural-based substitution matrix born by TM-align structural alignments that captures both the sequence mutation rate within same protein family folds and the chi-1 rotamer that represents each amino acid. These rotamer substitution matrices (ROTSUMs) discover new homologs and improved alignments in the PDB that traditional substitution matrices, based solely on sequence information, cannot identify. Certain tools and algorithms to estimate rotamer torsions angles have been developed but typically require either knowledge of backbone coordinates and/or experimental data to help guide the prediction. Herein, we developed a fragment-based algorithm, Rot1Pred, to determine the chi-1 states in each position of a given amino acid sequence, yielding a chi-1 rotamer sequence. This approach employs fragment matching of the query sequence to sequence-structure fragment pairs in the PDB to predict the query’s sidechain structure information. Real-value torsion angles were also predicted and compared against SCWRL4. Results show that overall and for most amino-acid types, Rot1Pred can calculate chi-1 torsion angles significantly closer to native angles compared to SCWRL4 when evaluated on I-TASSER generated model backbones. Finally, we’ve developed and explored chi-1-rotamer-based statistical potentials and evolutionary profiles constructed for de novo computational protein design. Previous analyses which aim to energetically describe the preference of amino acid types in beta sheet environments (parallel vs antiparallel packing or n- and c-terminal beta strand capping) have been performed with amino acid types although no explicit rotamer representation is given in their scoring functions. In our study, we construct statistical functions which describes chi-1 rotamer preferences in these environments and illustrate their improvement over previous methods. These specialized knowledge-based energy functions have generated sequences whose I-TASSER predicted models are structurally-alike to their input structures yet consist of low sequence identity.PHDChemical BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145951/1/jarrettj_1.pd

    MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8

    Get PDF
    Motivation: Protein structure prediction is one of the most important problems in structural bioinformatics. Here we describe MULTICOM, a multi-level combination approach to improve the various steps in protein structure prediction. In contrast to those methods which look for the best templates, alignments and models, our approach tries to combine complementary and alternative templates, alignments and models to achieve on average better accuracy

    Efficient protein alignment algorithm for protein search

    Get PDF
    © 2010 Lu et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution Licens
    corecore