Skip to main content
Article thumbnail
Location of Repository

Sequence-similar, structure-dissimilar protein pairs in the PDB

By Mickey Kosloff and Rachel Kolodny


It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (

Topics: Research Article
Publisher: Wiley Subscription Services, Inc., A Wiley Company
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (2006). 3D complex: a structural classification of protein complexes. PLoS Comput Biol
  2. 3D domain swapping: as domains continue to swap. Protein Sci 2002;11:1285–1299.
  3. (1998). A database of macromolecular motions.
  4. (1996). Adenylate kinase motions during catalysis: an energetic counterweight balancing substrate binding.
  5. Analysis of protein sequence/structure similarity relationships.
  6. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.
  7. Assessment of predictions submitted for the CASP6 comparative modeling category.
  8. (1999). BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.
  9. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures.
  10. Conformational changes associated with protein–protein interactions.
  11. (2003). Conformational diversity and protein evolution—a 60-year-old hypothesis revisited. Trends Biochem Sci
  12. Critical assessment of methods of protein structure prediction (CASP)— round 6.
  13. Crystal structure of a 92-residue C-terminal fragment of TonB from Escherichia coli reveals significant conformational changes compared to structures of smaller TonB fragments.
  14. (1991). Database of homology-derived protein structures and the structural meaning of sequence alignment.
  15. (1999). Direct determination of changes of interdomain orientation on ligation: use of the orientational dependence of
  16. (1994). Enlarged representative set of protein structures.
  17. (1999). Evolution of protein sequences and structures.
  18. Exploring the range of protein flexibility, from a structural proteomics perspective.
  19. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
  20. (2003). GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. Methods Enzymol
  21. (2004). Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography. Structure (Camb)
  22. How root-mean-square distance (r.m.s.d.) values depend on the resolution of protein structures that are compared.
  23. (1996). Intramolecular interactions of the regulatory domains of the Bcr-Abl kinase reveal a novel control mechanism.
  24. (2005). Intrinsically unstructured proteins. Trends Biochem Sci 2002;27:527–533. Sequence-Similar Structure-Dissimilar Proteins PROTEINS 90138. Dyson HJ, Wright PE. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol
  25. (1999). Introduction to protein structure,
  26. Mechanism of c-Myb-C/EBP beta cooperation from separated sites on a promoter.
  27. MolMovDB: analysis and visualization of conformational change and structural flexibility.
  28. (2005). Natively unfolded proteins. Curr Opin Struct Biol
  29. (2002). Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic. Proteins
  30. Nucleic acid recognition by OB-fold proteins.
  31. Organization of the SH3-SH2 unit in active and inactive forms of the c-Abl tyrosine kinase.
  32. PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB)
  33. PISCES: recent improvements to a PDB sequence culling server.
  34. (1998). Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng
  35. SCOP database in 2004: refinements integrate structure and sequence family data.
  36. (1992). Selection of representative protein data sets. Protein Sci
  37. (1976). Solution for Best Rotation to Relate 2 Sets of Vectors. Acta Crystallogr A
  38. Structural basis for the autoinhibition of c-Abl tyrosine kinase.
  39. The ASTRAL compendium for protein structure and sequence analysis.
  40. The limit of accuracy of protein modeling: influence of crystal packing on protein structure.
  41. The Protein Data Bank.
  42. (1986). The relation between the divergence of sequence and structure in proteins.
  43. The unfolding story of three-dimensional domain swapping.
  44. (2005). TonB-dependent outer membrane transport: going for Baroque? Curr Opin Struct Biol
  45. (1999). Twilight zone of protein sequence alignments. Protein Eng
  46. UniqueProt: creating representative protein sequence sets.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.