56 research outputs found

    CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area

    No full text
    <div><p>Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at <a href="http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html" target="_blank">http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html</a>.</p></div

    Computational time.

    No full text
    <p><sup>a</sup>Average computational time for an alignment pair, excluding the preprocessing step.</p><p><sup>b</sup>Average computational time required to preprocess a PDB file.</p><p>Computational time.</p

    Consistency of alignments based on six datasets.

    No full text
    <p>(<b>A</b>) SCOPe_NR10_all (7,384 triplets), (<b>B</b>) SCOPe_NR10_e10 (2,173 triplets), (<b>C</b>) SCOPe_FAMILY_all (50,630 triplets), (<b>D</b>) SCOPe_FAMILY_e10 (14,689 triplets), (<b>E</b>) PDB30_e5 (1,403,291 triplets), and (<b>F</b>) PDB30_e10 (790,623 triplets). PDB, protein data bank.</p

    Calculation of the similarity matrix <i>M</i>′ with a window size of three.

    No full text
    <p>(<b>A</b>) In the matrix <i>M</i>′, the similarity score for the residue pair (<i>k</i><sub><i>a</i></sub>,<i>k</i><sub><i>b</i></sub>) is calculated from the two orange cells (<i>M</i>(<i>k</i><sub><i>a</i></sub>–1,<i>k</i><sub><i>b</i></sub>–1) and <i>M</i>(<i>k</i><sub><i>a</i></sub>+1,<i>k</i><sub><i>b</i></sub>+1)) and the red cell <i>M</i>(<i>k</i><sub><i>a</i></sub>,<i>k</i><sub><i>b</i></sub>). (<b>B</b>) Similarity score for the residue pair (<i>k</i><sub><i>a</i></sub>–1,<i>k</i><sub><i>b</i></sub>–1). (<b>C</b>) Similarity score for the residue pair (<i>k</i><sub><i>a</i></sub>,<i>k</i><sub><i>b</i></sub>). (<b>D</b>) Similarity score for the residue pair (<i>k</i><sub><i>a</i></sub>+1,<i>k</i><sub><i>b</i></sub>+1). The black cells represent the aligned positions in the given alignment. The gray cells represent the ignored pairs.</p

    Examples of structural alignments between 2c2f_A and 1j30_A.

    No full text
    <p>(<b>A</b>) Alignment graph produced using SISYPHUS and CAB-align. (<b>B</b>) Alignment graph produced using SISYPHUS and DaliLite. CAB-align, contact area-based alignment.</p

    PRCs obtained using the NR10 and FAMILY datasets in the superfamily recognition test.

    No full text
    <p>(<b>A</b>) NR10 benchmark dataset. (<b>B</b>) FAMILY benchmark dataset. (<b>C</b>) PDB30 benchmark dataset. PDB, protein data bank; PRC, precision-recall curve.</p

    PRC for the five alignment methods using the alignments returned by all methods.

    No full text
    <p>(<b>A</b>) NR10 benchmark dataset. (<b>B</b>) FAMILY benchmark dataset. (<b>C</b>) PDB30 benchmark dataset. PDB, protein data bank; PRC, precision-recall curve.</p

    Comparison of the AQ for six benchmark datasets.

    No full text
    <p>(<b>A</b>) SCOPe_NR10_all (6,799 pairs), (<b>B</b>) SCOPe_NR10_e10 (3,660 pairs), (<b>C</b>) SCOPe_FAMILY_all (15,790 pairs), (<b>D</b>) SCOPe_FAMILY_e10 (5,730 pairs), (<b>E</b>) PDB30_e5 (182,907 pairs), and (<b>F</b>) PDB30_e10 (122,626 pairs). The methods are shown in order from top to bottom on the left (<i>n</i> = 0) of (<b>A</b>): CAB-align, DaliLite, FATCAT, and TM-align. CAB-align, contact area-based alignment; PDB, protein data bank.</p

    Comparison of DaliLite and CAB-align in terms of the agreement, reliability, and <i>NormS</i> values based on SISYPHUS_ID10.

    No full text
    <p>(<b>A</b>) Scatter plots of agreement, (<b>B</b>) reliability, and (<b>C</b>) <i>NormS</i>. The numbers of pairs belonging to each area are indicated. For example, 557 CAD-align alignments had better agreement values than DaliLite, where the agreement values for CAB-align and DaliLite were both higher than 0.5. CAB-align, contact area-based alignment.</p
    • …
    corecore