Search CORE

84 research outputs found

Accuracy of structure-based sequence alignment of automatic methods

Author: Kim Changhoon
Lee Byungkook
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Accurate sequence alignments are essential for homology searches and for building three-dimensional structural models of proteins. Since structure is better conserved than sequence, structure alignments have been used to guide sequence alignments and are commonly used as the gold standard for sequence alignment evaluation. Nonetheless, as far as we know, there is no report of a systematic evaluation of pairwise structure alignment programs in terms of the sequence alignment accuracy. Results In this study, we evaluate CE, DaliLite, FAST, LOCK2, MATRAS, SHEBA and VAST in terms of the accuracy of the sequence alignments they produce, using sequence alignments from NCBI's human-curated Conserved Domain Database (CDD) as the standard of truth. We find that 4 to 9% of the residues on average are either not aligned or aligned with more than 8 residues of shift error and that an additional 6 to 14% of residues on average are misaligned by 1–8 residues, depending on the program and the data set used. The fraction of correctly aligned residues generally decreases as the sequence similarity decreases or as the RMSD between the C<it>α </it>positions of the two structures increases. It varies significantly across CDD superfamilies whether shift error is allowed or not. Also, alignments with different shift errors occur between proteins within the same CDD superfamily, leading to inconsistent alignments between superfamily members. In general, residue pairs that are more than 3.0 Å apart in the reference alignment are heavily (>= 25% on average) misaligned in the test alignments. In addition, each method shows a different pattern of relative weaknesses for different SCOP classes. CE gives relatively poor results for <it>β</it>-sheet-containing structures (all-<it>β</it>, <it>α</it>/<it>β</it>, and <it>α</it>+<it>β </it>classes), DaliLite for "others" class where all but the major four classes are combined, and LOCK2 and VAST for all-<it>β </it>and "others" classes. Conclusion When the sequence similarity is low, structure-based methods produce better sequence alignments than by using sequence similarities alone. However, current structure-based methods still mis-align 11–19% of the conserved core residues when compared to the human-curated CDD alignments. The alignment quality of each program depends on the protein structural type and similarity, with DaliLite showing the most agreement with CDD on average.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Detecting internally symmetric protein structures

Author: Basner Jodi
Kim Changhoon
Lee Byungkook
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Iterative refinement of structure-based sequence alignments by Seed Extension

Author: Byungkook Lee
Changhoon Kim
Chin-Hsien Tai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

BACKGROUND: Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. RESULTS: RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. CONCLUSION: RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs

Crossref

PubMed Central

SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

Author: Kim Changhoon
Lee Byungkook
Tai Chin-Hsien
Vincent James J
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments. Results SE gave an average accuracy of 95.9% over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 Å. SE also used considerably less CPU time than DP. Conclusion The Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Sodium 5,6-Dihydro-2-thiouracil-6-sulfonate Monohydrate

Author: Bowman-James Kristin
Jain N. B.
Lee Byungkook
Pitman Ian H.
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date: 07/01/2015
Field of study

This is the publisher's version, also available electronically from http://scripts.iucr.org/cgi-bin/paper?S056774087800437

KU ScholarWorks

(±)-9-exo-Amino-5,6,7,8-tetrahydro-5,8-methano-9H-benzocyclohepten-8-ol Hydrochloride

Author: Grunewald Gary L.
Lee Byungkook
Rodgers James
Ruble John R.
Staples Mark
Walters D. Eric
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date: 19/05/2015
Field of study

This is the published version, also available here: http://www.dx.doi.org/10.1107/S0567740878004458

KU ScholarWorks

Towards an automatic classification of protein structural domains based on structural similarity

Author: Garnier Jean
Gibrat Jean-Francois
Lee Byungkook
Munson Peter J
Sam Vichetra
Tai Chin-Hsien
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification. Results We use DALI, SHEBA and VAST pairwise scores on the SCOP C class domains, to investigate a variety of hierarchical clustering procedures. The constructed dendrogram is cut in a variety of ways to produce a partition, which is compared to the SCOP fold classification. Ward's method dendrograms led to partitions closest to the SCOP fold classification. Dendrogram- or tree-cutting strategies fell into four categories according to the similarity of resulting partitions to the SCOP fold partition. Two strategies which optimize similarity to SCOP, gave an average of 72% true positives rate (TPR), at a 1% false positive rate. Cutting the largest size cluster at each step gave an average of 61% TPR which was one of the best strategies not making use of prior knowledge of SCOP. Cutting the longest branch at each step produced one of the worst strategies. We also developed a method to detect irreducible differences between the best possible automatic partitions and SCOP, regardless of the cutting strategy. These differences are substantial. Visual examination of hard-to-classify proteins confirms our previous finding, that global structural similarity of domains is not the only criterion used in the SCOP classification. Conclusion Different clustering procedures give rise to different levels of agreement between automatic and manual protein classifications. None of the tested procedures completely eliminates the divergence between automatic and manual protein classifications. Achieving full agreement between these two approaches would apparently require additional information.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Mesothelin-MUC16 binding is a high affinity, N-glycan dependent interaction that facilitates peritoneal metastasis of ovarian tumors

Author: Belisle Jennifer
Bera Tapan K
Connor Joseph
Gubbels Jennifer AA
Ho Mitchell
Lee Byungkook
Migneault Martine
Onda Masanori
Pastan Ira
Patankar Manish S
Rancourt Claudine
Sathyanarayana Bangalore K
Publication venue: BioMed Central
Publication date: 01/10/2006
Field of study

BACKGROUND: The mucin MUC16 and the glycosylphosphatidylinositol anchored glycoprotein mesothelin likely facilitate the peritoneal metastasis of ovarian tumors. The biochemical basis and the kinetics of the binding between these two glycoproteins are not clearly understood. Here we have addressed this deficit and provide further evidence supporting the role of the MUC16-mesothelin interaction in facilitating cell-cell binding under conditions that mimic the peritoneal environment. RESULTS: In this study we utilize recombinant-Fc tagged human mesothelin to measure the binding kinetics of this glycoprotein to MUC16 expressed on the ovarian tumor cell line OVCAR-3. OVCAR-3 derived sublines that did not express MUC16 showed no affinity for mesothelin. In a flow cytometry-based assay mesothelin binds with very high affinity to the MUC16 on the OVCAR-3 cells with an apparent K(d )of 5–10 nM. Maximum interaction occurs within 5 mins of incubation of the recombinant mesothelin with the OVCAR-3 cells and significant binding is observed even after 10 sec. A five-fold molar excess of soluble MUC16 was unable to completely inhibit the binding of mesothelin to the OVCAR-3 cells. Oxidation of the MUC16 glycans, removal of its N-linked oligosaccharides, and treatment of the mucin with wheat germ agglutinin and erythroagglutinating phytohemagglutinin abrogates its binding to mesothelin. These observations suggest that at least a subset of the MUC16-asscociated N-glycans is required for binding to mesothelin. We also demonstrate that MUC16 positive ovarian tumor cells exhibit increased adherence to A431 cells transfected with mesothelin (A431-Meso(+)). Only minimal adhesion is observed between MUC16 knockdown cells and A431-Meso(+ )cells. The binding between the MUC16 expressing ovarian tumor cells and the A431-Meso(+ )cells occurs even in the presence of ascites from patients with ovarian cancer. CONCLUSION: The strong binding kinetics of the mesothelin-MUC16 interaction and the cell adhesion between ovarian tumor cells and A431-Meso+ even in the presence of peritoneal fluid strongly support the importance of these two glycoproteins in the peritoneal metastasis of ovarian tumors. The demonstration that N-linked glycans are essential for mediating mesothlein-MUC16 binding may lead to novel therapeutic targets to control the spread of ovarian carcinoma

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Neighbor Overlap Is Enriched in the Yeast Interaction Network: Analysis and Implications

Author: A Wagner
AC Gavin
AH Tong
Anna Tramontano
Ariel Feiglin
B Errede
Byungkook Lee
C Lin
CJ Roberts
DE Levin
E Ravasz
E Zotenko
F Radicchi
G Giaever
H Frohlich
H Jeong
I Ulitsky
I Wapinski
J Xiang
JG Cook
John Moult
K Irie
K Ohkuni
K Tedford
KA Olson
L Bardwell
L Grassi
M Jimenez-Sanchez
M Kellis
M Kupiec
M Schuldiner
M Soler
MC Gustin
MF Manolson
MP Samanta
NJ Krogan
P Rice
R Albert
R Kafri
R Kelley
R Milo
R Milo
Ron Unger
S Maslov
S Pu
S Sun
SR Collins
SR Collins
T Roemer
V Cherkasova
V Spirin
X He
Y Artzy-Randrup
Y Guan
Yanay Ofran
Publication venue: Public Library of Science
Publication date: 26/06/2012
Field of study

The yeast protein-protein interaction network has been shown to have distinct topological features such as a scale free degree distribution and a high level of clustering. Here we analyze an additional feature which is called Neighbor Overlap. This feature reflects the number of shared neighbors between a pair of proteins. We show that Neighbor Overlap is enriched in the yeast protein-protein interaction network compared with control networks carefully designed to match the characteristics of the yeast network in terms of degree distribution and clustering coefficient. Our analysis also reveals that pairs of proteins with high Neighbor Overlap have higher sequence similarity, more similar GO annotations and stronger genetic interactions than pairs with low ones. Finally, we demonstrate that pairs of proteins with redundant functions tend to have high Neighbor Overlap. We suggest that a combination of three mechanisms is the basis for this feature: The abundance of protein complexes, selection for backup of function, and the need to allow functional variation

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare