Search CORE

33 research outputs found

An Algorithm for Protein Helix Assignment Using Helix Geometry

Author: Chen Cao (376652)
Lincong Wang (763090)
Shutan Xu (763091)
Publication venue
Publication date: 01/07/2015
Field of study

<div>Helices are one of the most common and were among the earliest recognized secondary structure elements in proteins. The assignment of helices in a protein underlies the analysis of its structure and function. Though the mathematical expression for a helical curve is simple, no previous assignment programs have used a genuine helical curve as a model for helix assignment. In this paper we present a two-step assignment algorithm. The first step searches for a series of bona fide helical curves each one best fits the coordinates of four successive backbone Cα atoms. The second step uses the best fit helical curves as input to make helix assignment. The application to the protein structures in the PDB (protein data bank) proves that the algorithm is able to assign accurately not only regular α-helix but also 310 and π helices as well as their left-handed versions. One salient feature of the algorithm is that the assigned helices are structurally more uniform than those by the previous programs. The structural uniformity should be useful for protein structure classification and prediction while the accurate assignment of a helix to a particular type underlies structure-function relationship in proteins.</div

Public Library of Science (PLOS)

Directory of Open Access Journals

The Francis Crick Institute

A gene pathway enrichment method based on improved TF-IDF algorithm

Author: Chenjing Zhang
Guofu Feng
Ming Chen
Shutan Xu
Yinhui Leng
Publication venue: 'Elsevier BV'
Publication date: 01/07/2023
Field of study

Gene pathway enrichment analysis is a widely used method to analyze whether a gene set is statistically enriched on certain biological pathway network. Current gene pathway enrichment methods commonly consider local importance of genes in pathways without considering the interactions between genes. In this paper, we propose a gene pathway enrichment method (GIGSEA) based on improved TF-IDF algorithm. This method employs gene interaction data to calculate the influence of genes based on the local importance in a pathway as well as the global specificity. Computational experiment result shows that, compared with traditional gene set enrichment analysis method, our proposed method in this paper can find more specific enriched pathways related to phenotype with higher efficiency

Directory of Open Access Journals

The normal distribution parameters (μ, σ) for r, p, t.

Author: Chen Cao (376652)
Lincong Wang (763090)
Shutan Xu (763091)
Publication venue
Publication date
Field of study

The residues have been divided into five groups (H, G, I, E, R) based on the secondary structure elements assigned by dssp. The last row (δP) shows the difference in μ between H and the other four groups. The unit for both r, p is Å while t is in degree.</p

The Francis Crick Institute

The distributions of the lengths of the α-helices from our algorithm, dssp and stride (a), and an example of 4-residue α-helix by dssp(b).

Author: Chen Cao (376652)
Lincong Wang (763090)
Shutan Xu (763091)
Publication venue
Publication date
Field of study

The x-axis in (a) is the helix length while the y-axis is the number of the helices with that particular length. The two arrows point to the most frequently appeared helices assigned by dssp and by both our algorithm and stride. The right figure (b) depicts a dssp-assigned 4-residue α-helix in a protein (pdbid 1CC5) that is not assigned to a helix by our algorithm.</p

The Francis Crick Institute

The clusters of α–helices by our algorithm, dssp and p-sea.

Author: Chen Cao (376652)
Lincong Wang (763090)
Shutan Xu (763091)
Publication venue
Publication date
Field of study

The sets of helices in the left have a length of 12 residues while the sets in the right 24 residues. The 12-residue set (11,756 helices) and 24-residue set (1,211 helices) by our algorithm are classified respectively into 12 and 17 clusters. The dssp assigned 12-residue set (12,631 helices) and 24-residue set (1,285 helices) are classified respectively into 21 and 35 clusters while the p-sea assigned 12-residue set (5,306 helices) and 24-residue set (574 helices) are classified respectively into 10 and 24 clusters. The clusters are produced using our geometric clustering algorithm [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0129674#pone.0129674.ref022" target="_blank">22</a>]. The RMSD threshold for clustering is 1.5Å.</p

The Francis Crick Institute

A histogram of helix axis angle a (a) and the threshold amax (b).

Author: Chen Cao (376652)
Lincong Wang (763090)
Shutan Xu (763091)
Publication venue
Publication date
Field of study

The x-axis in (a) is the angle in degree while the y-axis is the number of residues. In (b) the axis angle of residue 28 (colored in red and located in the middle of a segment, the top figure) in a protein (pdbid 4CXF) has a28 = 37.72° while the angle of residue 53 (colored in red and located in the middle of a segment, the bottom figure) in a protein (pdbid 1SQG) has a53 = 44.95°. With a threshold of amax = 40.0°, the first segment is assigned by our algorithm as a single α-helix while the second segment is divided into two different helices. In contrast both segments are assigned as a single helix by dssp.</p

The Francis Crick Institute

The illustration of the differences in assignment by our algorithm and dssp.

Author: Chen Cao (376652)
Lincong Wang (763090)
Shutan Xu (763091)
Publication venue
Publication date
Field of study

In (a) dssp assigns the entire segment (51–65, excluding P50) in a protein (pdbid 3OY9) as an α-helix. Our algorithm divides it into two helices: 310-helix (51–52, red) and α-helix (53–63, green, purple, yellow). The α-helix stops at N63 since the Cα RMSD δ values for residue 64 and 65 are respectively 0.541, 0.431, none of them less than dmax = 0.3. In contrast, the dssp assigned helix extends to residue 65. However, as shown in the left figure, the Cα coordinates of both residue 64 and 65 deviate clearly from a helical curve. In (b) a segment of residues 153–172 in a protein (pdbid 1MHY) is assigned as a single α-helix by dssp while our algorithm divides it into four helices: 310-helix(154–156, yellow)–α-helix(157–163, green)–310-helix(164–166, purple)–α-helix(167–171, green). However, a careful examination of the hydrogen bond energies for these residues in fact suggests that they could also be assigned to a 310-helix even by the dssp standard (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0129674#pone.0129674.s002" target="_blank">S2 Fig</a>).</p

The Francis Crick Institute

Amino Acids in Nine Ligand-Prefer Ramachandran Regions

Author: Chen Cao
Lincong Wang
Xiaoyang Chen
Shuxue Zou
Guishen Wang
Shutan Xu
Publication venue: Hindawi Limited
Publication date: 01/01/2015
Field of study

Several secondary structures, such as π-helix and left-handed helix, have been frequently identified at protein ligand-binding sites. A secondary structure is considered to be constrained to a specific region of dihedral angles. However, a comprehensive analysis of the correlation between main chain dihedral angles and ligand-binding sites has not been performed. We undertook an extensive analysis of the relationship between dihedral angles in proteins and their distance to ligand-binding sites, frequency of occurrence, molecular potential energy, amino acid composition, van der Waals contacts, and hydrogen bonds with ligands. The results showed that the values of dihedral angles have a strong preference for ligand-binding sites at certain regions in the Ramachandran plot. We discovered that amino acids preceding the ligand-prefer ϕ/ψ box residues are exposed more to solvents, whereas amino acids following ligand-prefer ϕ/ψ box residues form more hydrogen bonds and van der Waals contacts with ligands. Our method exhibited a similar performance compared with the program Ligsite-csc for both ligand-bound structures and ligand-free structures when just one ligand-binding site was predicted. These results should be useful for the prediction of protein ligand-binding sites and for analysing the relationship between structure and function

Crossref

Directory of Open Access Journals

Helix assignment on a residue and a helix basis by our algorithm, dssp and stride.

Author: Chen Cao (376652)
Lincong Wang (763090)
Shutan Xu (763091)
Publication venue
Publication date
Field of study

The third column for each helix type presents respectively the range in helix length and the length of the most frequently appeared helices.</p

The Francis Crick Institute

The assignments on

Author: Chen Cao (376652)
Lincong Wang (763090)
Shutan Xu (763091)
Publication venue
Publication date
Field of study

The assignments are made for a set of 100 x-ray structures with different resolutions. The first row is the total number of residues. All the other rows are the agreement between a pair of programs in percentage. The percentage is computed as <mi>n</mi><mi>n</mi><mn>1</mn><mo>+</mo><mi>n</mi><mo>+</mo><mi>n</mi><mn>2</mn> where n is the number of residues assigned by both programs while n1 and n2 are respectively the numbers of residues assigned only by the first and second programs.</p

The Francis Crick Institute

An Algorithm for Protein Helix Assignment Using Helix Geometry

A gene pathway enrichment method based on improved TF-IDF algorithm

The normal distribution parameters (<i>μ</i>, <i>σ</i>) for <i>r</i>, <i>p</i>, <i>t</i>.

The distributions of the lengths of the <i>α</i>-helices from our algorithm, dssp and stride (a), and an example of 4-residue <i>α</i>-helix by dssp(b).

The clusters of <i>α</i>–helices by our algorithm, dssp and p-sea.

A histogram of helix axis angle <i>a</i> (a) and the threshold <i>a</i><sub><i>max</i></sub> (b).

The illustration of the differences in assignment by our algorithm and dssp.

Amino Acids in Nine Ligand-Prefer Ramachandran Regions

Helix assignment on a residue and a helix basis by our algorithm, dssp and stride.

The assignments on