131,352 research outputs found

    Structure alignment based on coding of local geometric measures

    Get PDF
    BACKGROUND: A structure alignment method based on a local geometric property is presented and its performance is tested in pairwise and multiple structure alignments. In this approach, the writhing number, a quantity originating from integral formulas of Vassiliev knot invariants, is used as a local geometric measure. This measure is used in a sliding window to calculate the local writhe down the length of the protein chain. By encoding the distribution of writhing numbers across all the structures in the protein databank (PDB), protein geometries are represented in a 20-letter alphabet. This encoding transforms the structure alignment problem into a sequence alignment problem and allows the well-established algorithms of sequence alignment to be employed. Such geometric alignments offer distinct advantages over structural alignments in Cartesian coordinates as it better handles structural subtleties associated with slight twists and bends that distort one structure relative to another. RESULTS: The performance of programs for pairwise local alignment (TLOCAL) and multiple alignment (TCLUSTALW) are readily adapted from existing code for Smith-Waterman pairwise alignment and for multiple sequence alignment using CLUSTALW. The alignment algorithms employed a blocked scoring matrix (TBLOSUM) generated using the frequency of changes in the geometric alphabet of a block of protein structures. TLOCAL was tested on a set of 10 difficult proteins and found to give high quality alignments that compare favorably to those generated by existing pairwise alignment programs. A set of protein comparison involving hinged structures was also analyzed and TLOCAL was seen to compare favorably to other alignment methods. TCLUSTALW was tested on a family of protein kinases and reveal conserved regions similar to those previously identified by a hand alignment. CONCLUSION: These results show that the encoding of the writhing number as a geometric measure allow high quality structure alignments to be generated using standard algorithms of sequence alignment. This approach provides computationally efficient algorithms that allow fast database searching and multiple structure alignment. Because the geometric measure can employ different window sizes, the method allows the exploration of alignments on different, well-defined length scales

    Protein structure database search and evolutionary classification

    Get PDF
    As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at

    Rapid detection of similarity in protein structure and function through contact metric distances

    Get PDF
    The characterization of biological function among newly determined protein structures is a central challenge in structural genomics. One class of computational solutions to this problem is based on the similarity of protein structure. Here, we implement a simple yet efficient measure of protein structure similarity, the contact metric. Even though its computation avoids structural alignments and is therefore nearly instantaneous, we find that small values correlate with geometrical root mean square deviations obtained from structural alignments. To test whether the contact metric detects functional similarity, as defined by Gene Ontology (GO) terms, it was compared in large-scale computational experiments to four other measures of structural similarity, including alignment algorithms as well as alignment independent approaches. The contact metric was the fastest method and its sensitivity, at any given specificity level, was a close second only to Fast Alignment and Search Tool—a structural alignment method that is slower by three orders of magnitude. Critically, nearly 40% of correct functional inferences by the contact metric were not identified by any other approach, which shows that the contact metric is complementary and computationally efficient in detecting functional relationships between proteins. A public ‘Contact Metric Internet Server’ is provided

    A novel method to compare protein structures using local descriptors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships.</p> <p>Results</p> <p>We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy).</p> <p>Conclusions</p> <p>DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at <url>http://bioexploratorium.pl/EP/DEDAL</url>.</p

    Structural alignment using network properties

    Get PDF
    Understanding the structural means of protein function via structural comparisons have wide range of applications such as protein fold classification, protein structure modelling and design. In this thesis, a novel structural alignment algorithm based on a amino acid network model is presented. The method we present models proteins as an amino acid network, derived from contact map representation of proteins. By using this model, we obtain fast tertiary structure comparisons, and combine them with primary and secondary structure comparisons to develop an overall similarity function. The similarity function drives a dynamic programming based alignment algorithm to obtain fast and accurate structural alignments. The structural alignments obtained are used to discover functional structural subunits called domains and to discover overall structural similarity of two proteins. We compared our domain prediction results with existing domain recognition methods and saw that our method correlates well with existing methods. Our global structural alignment results are compared with CE alignments

    Alignment-free local structural search by writhe decomposition

    Get PDF
    Motivation: Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare two structures by measuring distance between their projected points. These methods offer a tremendous increase in speed over residue-level structural alignment methods. However, current projection methods are not practical, partly because they are unable to identify local similarities

    CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures

    Get PDF
    We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification

    In Silico characterization of growth hormone from freshwater ornamental fishes: Sequence analysis, molecular modelling and phylogeny

    Get PDF
    The present investigation includes in Silico sequence analysis, three-dimensional (3D) structure prediction and evolutionary profile of growth hormone (GH) from 14 ornamental freshwater fishes. The analyses were performed using the sequence data of growth hormone gene (gh) and its encoded GH protein. The evolutionary analyses were performed using maximum likelihood (ML) estimate and maximum parsimony (MP) methods. Bootstrap test (1000 replicates) was performed to validate the phylogenetic tree. The tertiary structures of GH were predicted using the comparative modelling method. The suitable template for comparative modeling protein databank (PDB IDs: 1HWG A) has been selected on the basis of basic local alignment search tool (BLASTp) and fast analysis (FASTA) results. The target-template alignment, model building, loop modelling and evaluation have been performed in Modeller 9.10. The tertiary structure of GH is α-helix structure connected by loops, which forms a compressed complex maintained by two disulfide bridges. The resultant 3D models are verified by ERRAT and ProCheck programmes. After fruitful verification, the tertiary structures of GH have been deposited to protein model database (PMDB). Sequence analyses and RNA secondary structure prediction was performed by CLC genomics workbench version 4.0. The computational models of GH could be of use for further evaluation of molecular mechanism of function.Keywords: Growth hormone, in Silico, somatotropin, growth hormone gene (gh) mRNA, freshwater ornamental fis
    corecore