Search CORE

11 research outputs found

NMRDSP: An Accurate Prediction of Protein Shape Strings from NMR Chemical Shifts and Sequence Data

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date: 23/12/2013
Field of study

<div>Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at <a href="http://cal.tongji.edu.cn/NMRDSP/index.jsp" target="_blank">http://cal.tongji.edu.cn/NMRDSP/index.jsp</a>.</div

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

A comparison of performances of NS203 by NMRDSP, DSP and Frag1D (%).

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

A comparison of performances of NS203 by NMRDSP, DSP and Frag1D (%).</p

The Francis Crick Institute

An example of normalization and alphabetization of Cystine C NMR CS data.

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

After normalization, the values of NMR CS distribute from zero to one (horizontal ordinate). After alphabetization, each sub-region is expressed a character (top). The performances of pre-processing are given in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0083532#pone-0083532-t001" target="_blank">Table 1</a>.</p

The Francis Crick Institute

Performances of NMRDSP, DSP and Frag1D for three classes of different sequence identities.

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

Performances of NMRDSP, DSP and Frag1D for three classes of different sequence identities.</p

The Francis Crick Institute

Performances on NS1003 set by using the original NMR CS data, the normalized data and the alphabetized data (5-fold cross validation, %).

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

Performances on NS1003 set by using the original NMR CS data, the normalized data and the alphabetized data (5-fold cross validation, %).</p

The Francis Crick Institute

Performances of leave one feature out validations and using all six features on NS800 (5-fold cross validation, %).

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

Performances of leave one feature out validations and using all six features on NS800 (5-fold cross validation, %).</p

The Francis Crick Institute

Performances of different feature combinations on NS800 (5-fold cross validation, %).

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

Performances of different feature combinations on NS800 (5-fold cross validation, %).</p

The Francis Crick Institute

Performances of using PSSM, SPSSM, NMR CS and DS_Profile features on NS800 (5-fold cross validation, %).

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

Performances of using PSSM, SPSSM, NMR CS and DS_Profile features on NS800 (5-fold cross validation, %).</p

The Francis Crick Institute

Performances of NS203 independent testing set (%) based on NS800 training set.

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

Performances of NS203 independent testing set (%) based on NS800 training set.</p

The Francis Crick Institute

The flowchart of NMRDSP.

Author: Longjian Lu (501295)
Peisheng Cong (121213)
Tonghua Li (121211)
Wusong Mao (501294)
Zhiheng Wang (424546)
Zhongliang Zhu (487392)
Publication venue
Publication date
Field of study

There are four procedures in the flowchart. The normalization and the alphabetization are pre-processed of NMR CS data. The DSP is used to generate shape string profiles. Then 14 features are input for CRF.</p

The Francis Crick Institute