84 research outputs found

    Accuracy of structure-based sequence alignment of automatic methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Accurate sequence alignments are essential for homology searches and for building three-dimensional structural models of proteins. Since structure is better conserved than sequence, structure alignments have been used to guide sequence alignments and are commonly used as the gold standard for sequence alignment evaluation. Nonetheless, as far as we know, there is no report of a systematic evaluation of pairwise structure alignment programs in terms of the sequence alignment accuracy.</p> <p>Results</p> <p>In this study, we evaluate CE, DaliLite, FAST, LOCK2, MATRAS, SHEBA and VAST in terms of the accuracy of the sequence alignments they produce, using sequence alignments from NCBI's human-curated Conserved Domain Database (CDD) as the standard of truth. We find that 4 to 9% of the residues on average are either not aligned or aligned with more than 8 residues of shift error and that an additional 6 to 14% of residues on average are misaligned by 1–8 residues, depending on the program and the data set used. The fraction of correctly aligned residues generally decreases as the sequence similarity decreases or as the RMSD between the C<sub><it>Ξ± </it></sub>positions of the two structures increases. It varies significantly across CDD superfamilies whether shift error is allowed or not. Also, alignments with different shift errors occur between proteins within the same CDD superfamily, leading to inconsistent alignments between superfamily members. In general, residue pairs that are more than 3.0 Γ… apart in the reference alignment are heavily (>= 25% on average) misaligned in the test alignments. In addition, each method shows a different pattern of relative weaknesses for different SCOP classes. CE gives relatively poor results for <it>Ξ²</it>-sheet-containing structures (all-<it>Ξ²</it>, <it>Ξ±</it>/<it>Ξ²</it>, and <it>Ξ±</it>+<it>Ξ² </it>classes), DaliLite for "others" class where all but the major four classes are combined, and LOCK2 and VAST for all-<it>Ξ² </it>and "others" classes.</p> <p>Conclusion</p> <p>When the sequence similarity is low, structure-based methods produce better sequence alignments than by using sequence similarities alone. However, current structure-based methods still mis-align 11–19% of the conserved core residues when compared to the human-curated CDD alignments. The alignment quality of each program depends on the protein structural type and similarity, with DaliLite showing the most agreement with CDD on average.</p

    Iterative refinement of structure-based sequence alignments by Seed Extension

    Get PDF
    BACKGROUND: Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. RESULTS: RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. CONCLUSION: RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs

    SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments.</p> <p>Results</p> <p>SE gave an average accuracy of 95.9% over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 Γ…. SE also used considerably less CPU time than DP.</p> <p>Conclusion</p> <p>The Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm.</p

    Sodium 5,6-Dihydro-2-thiouracil-6-sulfonate Monohydrate

    Get PDF
    This is the publisher's version, also available electronically from http://scripts.iucr.org/cgi-bin/paper?S056774087800437

    (Β±)-9-exo-Amino-5,6,7,8-tetrahydro-5,8-methano-9H-benzocyclohepten-8-ol Hydrochloride

    Get PDF
    This is the published version, also available here: http://www.dx.doi.org/10.1107/S0567740878004458

    Towards an automatic classification of protein structural domains based on structural similarity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification.</p> <p>Results</p> <p>We use DALI, SHEBA and VAST pairwise scores on the SCOP C class domains, to investigate a variety of hierarchical clustering procedures. The constructed dendrogram is cut in a variety of ways to produce a partition, which is compared to the SCOP fold classification.</p> <p>Ward's method dendrograms led to partitions closest to the SCOP fold classification. Dendrogram- or tree-cutting strategies fell into four categories according to the similarity of resulting partitions to the SCOP fold partition. Two strategies which optimize similarity to SCOP, gave an average of 72% true positives rate (TPR), at a 1% false positive rate. Cutting the largest size cluster at each step gave an average of 61% TPR which was one of the best strategies not making use of prior knowledge of SCOP. Cutting the longest branch at each step produced one of the worst strategies.</p> <p>We also developed a method to detect irreducible differences between the best possible automatic partitions and SCOP, regardless of the cutting strategy. These differences are substantial. Visual examination of hard-to-classify proteins confirms our previous finding, that global structural similarity of domains is not the only criterion used in the SCOP classification.</p> <p>Conclusion</p> <p>Different clustering procedures give rise to different levels of agreement between automatic and manual protein classifications. None of the tested procedures completely eliminates the divergence between automatic and manual protein classifications. Achieving full agreement between these two approaches would apparently require additional information.</p

    Mesothelin-MUC16 binding is a high affinity, N-glycan dependent interaction that facilitates peritoneal metastasis of ovarian tumors

    Get PDF
    BACKGROUND: The mucin MUC16 and the glycosylphosphatidylinositol anchored glycoprotein mesothelin likely facilitate the peritoneal metastasis of ovarian tumors. The biochemical basis and the kinetics of the binding between these two glycoproteins are not clearly understood. Here we have addressed this deficit and provide further evidence supporting the role of the MUC16-mesothelin interaction in facilitating cell-cell binding under conditions that mimic the peritoneal environment. RESULTS: In this study we utilize recombinant-Fc tagged human mesothelin to measure the binding kinetics of this glycoprotein to MUC16 expressed on the ovarian tumor cell line OVCAR-3. OVCAR-3 derived sublines that did not express MUC16 showed no affinity for mesothelin. In a flow cytometry-based assay mesothelin binds with very high affinity to the MUC16 on the OVCAR-3 cells with an apparent K(d )of 5–10 nM. Maximum interaction occurs within 5 mins of incubation of the recombinant mesothelin with the OVCAR-3 cells and significant binding is observed even after 10 sec. A five-fold molar excess of soluble MUC16 was unable to completely inhibit the binding of mesothelin to the OVCAR-3 cells. Oxidation of the MUC16 glycans, removal of its N-linked oligosaccharides, and treatment of the mucin with wheat germ agglutinin and erythroagglutinating phytohemagglutinin abrogates its binding to mesothelin. These observations suggest that at least a subset of the MUC16-asscociated N-glycans is required for binding to mesothelin. We also demonstrate that MUC16 positive ovarian tumor cells exhibit increased adherence to A431 cells transfected with mesothelin (A431-Meso(+)). Only minimal adhesion is observed between MUC16 knockdown cells and A431-Meso(+ )cells. The binding between the MUC16 expressing ovarian tumor cells and the A431-Meso(+ )cells occurs even in the presence of ascites from patients with ovarian cancer. CONCLUSION: The strong binding kinetics of the mesothelin-MUC16 interaction and the cell adhesion between ovarian tumor cells and A431-Meso+ even in the presence of peritoneal fluid strongly support the importance of these two glycoproteins in the peritoneal metastasis of ovarian tumors. The demonstration that N-linked glycans are essential for mediating mesothlein-MUC16 binding may lead to novel therapeutic targets to control the spread of ovarian carcinoma

    Neighbor Overlap Is Enriched in the Yeast Interaction Network: Analysis and Implications

    Get PDF
    The yeast protein-protein interaction network has been shown to have distinct topological features such as a scale free degree distribution and a high level of clustering. Here we analyze an additional feature which is called Neighbor Overlap. This feature reflects the number of shared neighbors between a pair of proteins. We show that Neighbor Overlap is enriched in the yeast protein-protein interaction network compared with control networks carefully designed to match the characteristics of the yeast network in terms of degree distribution and clustering coefficient. Our analysis also reveals that pairs of proteins with high Neighbor Overlap have higher sequence similarity, more similar GO annotations and stronger genetic interactions than pairs with low ones. Finally, we demonstrate that pairs of proteins with redundant functions tend to have high Neighbor Overlap. We suggest that a combination of three mechanisms is the basis for this feature: The abundance of protein complexes, selection for backup of function, and the need to allow functional variation
    • …
    corecore