33 research outputs found
An Algorithm for Protein Helix Assignment Using Helix Geometry
<div><p>Helices are one of the most common and were among the earliest recognized secondary structure elements in proteins. The assignment of helices in a protein underlies the analysis of its structure and function. Though the mathematical expression for a helical curve is simple, no previous assignment programs have used a genuine helical curve as a model for helix assignment. In this paper we present a two-step assignment algorithm. The first step searches for a series of bona fide helical curves each one best fits the coordinates of four successive backbone C<sub>α</sub> atoms. The second step uses the best fit helical curves as input to make helix assignment. The application to the protein structures in the PDB (protein data bank) proves that the algorithm is able to assign accurately not only regular α-helix but also 3<sub>10</sub> and π helices as well as their left-handed versions. One salient feature of the algorithm is that the assigned helices are structurally more uniform than those by the previous programs. The structural uniformity should be useful for protein structure classification and prediction while the accurate assignment of a helix to a particular type underlies structure-function relationship in proteins.</p></div
A gene pathway enrichment method based on improved TF-IDF algorithm
Gene pathway enrichment analysis is a widely used method to analyze whether a gene set is statistically enriched on certain biological pathway network. Current gene pathway enrichment methods commonly consider local importance of genes in pathways without considering the interactions between genes. In this paper, we propose a gene pathway enrichment method (GIGSEA) based on improved TF-IDF algorithm. This method employs gene interaction data to calculate the influence of genes based on the local importance in a pathway as well as the global specificity. Computational experiment result shows that, compared with traditional gene set enrichment analysis method, our proposed method in this paper can find more specific enriched pathways related to phenotype with higher efficiency
The normal distribution parameters (<i>μ</i>, <i>σ</i>) for <i>r</i>, <i>p</i>, <i>t</i>.
<p>The residues have been divided into five groups (H, G, I, E, R) based on the secondary structure elements assigned by dssp. The last row (<i>δ</i><sub><i>P</i></sub>) shows the difference in <i>μ</i> between H and the other four groups. The unit for both <i>r</i>, <i>p</i> is Å while <i>t</i> is in degree.</p
The distributions of the lengths of the <i>α</i>-helices from our algorithm, dssp and stride (a), and an example of 4-residue <i>α</i>-helix by dssp(b).
<p>The x-axis in (a) is the helix length while the y-axis is the number of the helices with that particular length. The two arrows point to the most frequently appeared helices assigned by dssp and by both our algorithm and stride. The right figure (b) depicts a dssp-assigned 4-residue <i>α</i>-helix in a protein (pdbid 1CC5) that is not assigned to a helix by our algorithm.</p
The clusters of <i>α</i>–helices by our algorithm, dssp and p-sea.
<p>The sets of helices in the left have a length of 12 residues while the sets in the right 24 residues. The 12-residue set (11,756 helices) and 24-residue set (1,211 helices) by our algorithm are classified respectively into 12 and 17 clusters. The dssp assigned 12-residue set (12,631 helices) and 24-residue set (1,285 helices) are classified respectively into 21 and 35 clusters while the p-sea assigned 12-residue set (5,306 helices) and 24-residue set (574 helices) are classified respectively into 10 and 24 clusters. The clusters are produced using our geometric clustering algorithm [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0129674#pone.0129674.ref022" target="_blank">22</a>]. The RMSD threshold for clustering is 1.5Ã….</p
A histogram of helix axis angle <i>a</i> (a) and the threshold <i>a</i><sub><i>max</i></sub> (b).
<p>The x-axis in (a) is the angle in degree while the y-axis is the number of residues. In (b) the axis angle of residue 28 (colored in red and located in the middle of a segment, the top figure) in a protein (pdbid 4CXF) has <i>a</i><sub>28</sub> = 37.72° while the angle of residue 53 (colored in red and located in the middle of a segment, the bottom figure) in a protein (pdbid 1SQG) has <i>a</i><sub>53</sub> = 44.95°. With a threshold of <i>a</i><sub><i>max</i></sub> = 40.0°, the first segment is assigned by our algorithm as a single <i>α</i>-helix while the second segment is divided into two different helices. In contrast both segments are assigned as a single helix by dssp.</p
The illustration of the differences in assignment by our algorithm and dssp.
<p>In (a) dssp assigns the entire segment (51–65, excluding P50) in a protein (pdbid 3OY9) as an <i>α</i>-helix. Our algorithm divides it into two helices: 3<sub>10</sub>-helix (51–52, red) and <i>α</i>-helix (53–63, green, purple, yellow). The <i>α</i>-helix stops at N63 since the C<sub><i>α</i></sub> RMSD <i>δ</i> values for residue 64 and 65 are respectively 0.541, 0.431, none of them less than <i>d</i><sub><i>max</i></sub> = 0.3. In contrast, the dssp assigned helix extends to residue 65. However, as shown in the left figure, the C<sub><i>α</i></sub> coordinates of both residue 64 and 65 deviate clearly from a helical curve. In (b) a segment of residues 153–172 in a protein (pdbid 1MHY) is assigned as a single <i>α</i>-helix by dssp while our algorithm divides it into four helices: 3<sub>10</sub>-helix(154–156, yellow)–<i>α</i>-helix(157–163, green)–3<sub>10</sub>-helix(164–166, purple)–<i>α</i>-helix(167–171, green). However, a careful examination of the hydrogen bond energies for these residues in fact suggests that they could also be assigned to a 3<sub>10</sub>-helix even by the dssp standard (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0129674#pone.0129674.s002" target="_blank">S2 Fig</a>).</p
Amino Acids in Nine Ligand-Prefer Ramachandran Regions
Several secondary structures, such as π-helix and left-handed helix, have been frequently identified at protein ligand-binding sites. A secondary structure is considered to be constrained to a specific region of dihedral angles. However, a comprehensive analysis of the correlation between main chain dihedral angles and ligand-binding sites has not been performed. We undertook an extensive analysis of the relationship between dihedral angles in proteins and their distance to ligand-binding sites, frequency of occurrence, molecular potential energy, amino acid composition, van der Waals contacts, and hydrogen bonds with ligands. The results showed that the values of dihedral angles have a strong preference for ligand-binding sites at certain regions in the Ramachandran plot. We discovered that amino acids preceding the ligand-prefer ϕ/ψ box residues are exposed more to solvents, whereas amino acids following ligand-prefer ϕ/ψ box residues form more hydrogen bonds and van der Waals contacts with ligands. Our method exhibited a similar performance compared with the program Ligsite-csc for both ligand-bound structures and ligand-free structures when just one ligand-binding site was predicted. These results should be useful for the prediction of protein ligand-binding sites and for analysing the relationship between structure and function
Helix assignment on a residue and a helix basis by our algorithm, dssp and stride.
<p>The third column for each helix type presents respectively the range in helix length and the length of the most frequently appeared helices.</p
The assignments on
<p>The assignments are made for a set of 100 x-ray structures with different resolutions. The first row is the total number of residues. All the other rows are the agreement between a pair of programs in percentage. The percentage is computed as </p><p></p><p></p><p></p><p><mi>n</mi></p><p></p><p><mi>n</mi><mn>1</mn></p><mo>+</mo><mi>n</mi><mo>+</mo><p><mi>n</mi><mn>2</mn></p><p></p><p></p><p></p><p></p><p></p> where <i>n</i> is the number of residues assigned by both programs while <i>n</i><sub>1</sub> and <i>n</i><sub>2</sub> are respectively the numbers of residues assigned only by the first and second programs.<p></p