826 research outputs found
Predicting Secondary Structures, Contact Numbers, and Residue-wise Contact Orders of Native Protein Structure from Amino Acid Sequence by Critical Random Networks
Prediction of one-dimensional protein structures such as secondary structures
and contact numbers is useful for the three-dimensional structure prediction
and important for the understanding of sequence-structure relationship. Here we
present a new machine-learning method, critical random networks (CRNs), for
predicting one-dimensional structures, and apply it, with position-specific
scoring matrices, to the prediction of secondary structures (SS), contact
numbers (CN), and residue-wise contact orders (RWCO). The present method
achieves, on average, accuracy of 77.8% for SS, correlation coefficients
of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS
prediction is comparable to other state-of-the-art methods, and that of the CN
prediction is a significant improvement over previous methods. We give a
detailed formulation of critical random networks-based prediction scheme, and
examine the context-dependence of prediction accuracies. In order to study the
nonlinear and multi-body effects, we compare the CRNs-based method with a
purely linear method based on position-specific scoring matrices. Although not
superior to the CRNs-based method, the surprisingly good accuracy achieved by
the linear method highlights the difficulty in extracting structural features
of higher order from amino acid sequence beyond that provided by the
position-specific scoring matrices.Comment: 20 pages, 1 figure, 5 tables; minor revision; accepted for
publication in BIOPHYSIC
Wang-Landau molecular dynamics technique to search for low-energy conformational space of proteins
Multicanonical molecular dynamics (MD) is a powerful technique for sampling
conformations on rugged potential surfaces such as protein. However, it is
notoriously difficult to estimate the multicanonical temperature effectively.
Wang and Landau developed a convenient method for estimating the density of
states based on a multicanonical Monte Carlo method. In their method, the
density of states is calculated autonomously during a simulation. In this paper
we develop a set of techniques to effectively apply the Wang-Landau method to
MD simulations. In the multicanonical MD, the estimation of the derivative of
the density of states is critical. In order to estimate it accurately, we
devise two original improvements. First, the correction for the density of
states is made smooth by using the Gaussian distribution obtained by a short
canonical simulation. Second, an approximation is applied to the derivative,
which is based on the Gaussian distribution and the multiple weighted histogram
technique. A test of this method was performed with small polypeptides,
Met-enkephalin and Trp-cage, and it is demonstrated that Wang-Landau MD is
consistent with replica exchange MD but can sample much larger conformational
space.Comment: 8 pages, 7 figures, accepted for publication in Physical Review
SPRITE and ASSAM: web servers for side chain 3D-motif searching in protein structures
Similarities in the 3D patterns of amino acid side chains can provide insights into their function despite the absence of any detectable sequence or fold similarities. Search for protein sites (SPRITE) and amino acid pattern search for substructures and motifs (ASSAM) are graph theoretical programs that can search for 3D amino side chain matches in protein structures, by representing the amino acid side chains as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. Both programs require the input file to be in the PDB format. The objective of using SPRITE is to identify matches of side chains in a query structure to patterns with characterized function. In contrast, a 3D pattern of interest can be searched for existing occurrences in available PDB structures using ASSAM. Both programs are freely accessible without any login requirement. SPRITE is available at http://mfrlab.org/grafss/sprite/while ASSAM can be accessed at http://mfrlab.org/grafss/assam/
Unique Interplay between Sugar and Lipid in Determining the Antigenic Potency of Bacterial Antigens for NKT Cells
Structural and biophysical studies reveal the induced-fit mechanism underlying the stringent specificity of invariant natural killer T cells for unique glycolipid antigens from the pathogen Streptococcus pneumoniae
Composite structural motifs of binding sites for delineating biological functions of proteins
Most biological processes are described as a series of interactions between
proteins and other molecules, and interactions are in turn described in terms
of atomic structures. To annotate protein functions as sets of interaction
states at atomic resolution, and thereby to better understand the relation
between protein interactions and biological functions, we conducted exhaustive
all-against-all atomic structure comparisons of all known binding sites for
ligands including small molecules, proteins and nucleic acids, and identified
recurring elementary motifs. By integrating the elementary motifs associated
with each subunit, we defined composite motifs which represent
context-dependent combinations of elementary motifs. It is demonstrated that
function similarity can be better inferred from composite motif similarity
compared to the similarity of protein sequences or of individual binding sites.
By integrating the composite motifs associated with each protein function, we
define meta-composite motifs each of which is regarded as a time-independent
diagrammatic representation of a biological process. It is shown that
meta-composite motifs provide richer annotations of biological processes than
sequence clusters. The present results serve as a basis for bridging atomic
structures to higher-order biological phenomena by classification and
integration of binding site structures.Comment: 34 pages, 7 figure
Predicting residue-wise contact orders in proteins by support vector regression
BACKGROUND: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. RESULTS: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. CONCLUSION: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences
Unveiling exotic magnetic phase diagram of a non-Heisenberg quasicrystal approximant
A magnetic phase diagram of the non-Heisenberg Tsai-type 1/1 Au-Ga-Tb
approximant crystal (AC) has been established across a wide electron-per-atom
(e/a) range via magnetization and powder neutron diffraction measurements. The
diagram revealed exotic ferromagnetic (FM) and antiferromagnetic (AFM) orders
that originate from the unique local spin icosahedron common to icosahedral
quasicrystals (iQCs) and ACs; The noncoplanar whirling AFM order is stabilized
as the ground state at the e/a of 1.72 or less whereas a noncoplanar whirling
FM order was found at the larger e/a of 1.80, with magnetic moments tangential
to the Tb icosahedron in both cases. Moreover, the FM/AFM phase selection rule
was unveiled in terms of the nearest neighbour (J1) and next nearest neighbour
(J2) interactions by numerical calculations on a non-Heisenberg single
icosahedron. The present findings will pave the way for understanding the
intriguing magnetic orders of not only non-Heisenberg FM/AFM ACs but also
non-Heisenberg FM/AFM iQCs, the latter of which are yet to be discovered
Nature of protein family signatures: Insights from singular value analysis of position-specific scoring matrices
Position-specific scoring matrices (PSSMs) are useful for detecting weak
homology in protein sequence analysis, and they are thought to contain some
essential signatures of the protein families. In order to elucidate what kind
of ingredients constitute such family-specific signatures, we apply singular
value decomposition to a set of PSSMs and examine the properties of dominant
right and left singular vectors. The first right singular vectors were
correlated with various amino acid indices including relative mutability, amino
acid composition in protein interior, hydropathy, or turn propensity, depending
on proteins. A significant correlation between the first left singular vector
and a measure of site conservation was observed. It is shown that the
contribution of the first singular component to the PSSMs act to disfavor
potentially but falsely functionally important residues at conserved sites. The
second right singular vectors were highly correlated with hydrophobicity
scales, and the corresponding left singular vectors with contact numbers of
protein structures. It is suggested that sequence alignment with a PSSM is
essentially equivalent to threading supplemented with functional information.
The presented method may be used to separate functionally important sites from
structurally important ones, and thus it may be a useful tool for predicting
protein functions.Comment: 22 pages, 7 figures, 4 table
- …