6 research outputs found

    Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of protein structural classes (<it>α</it>, <it>β</it>, <it>α </it>+ <it>β </it>and <it>α</it>/<it>β</it>) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%.</p> <p>Results</p> <p>We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the <it>chaos game representation </it>is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using <it>recurrence quantification analysis</it>, <it>K-string based information entropy </it>and <it>segment-based analysis</it>. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at <url>http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/</url>.</p> <p>Conclusion</p> <p>The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of <it>α </it>helices and <it>β </it>strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.</p

    Insight of Tp53 Mutations and their effect on Protein in Different Feline and Canine Neoplasms

    Get PDF
    Background: Mutations in the Tp53 gene, a tumor suppressor gene, may cause dysfunction in growing cells and hinder the phenomenon of apoptosis, an alleged cause of tumorigenesis. It is involved in conservation of the genome and DNA repair, mutations of this gene may cause the damaged cells to grow continuously.Methods: The type of molecular changes in Tp53 gene and their effects on physiochemical and structural properties of this protein in various Canine and Feline cancers were observed in this study by using online bioinformatics tools.Results: Our results indicated that lymphomas and perianal adenocarcinomas (PAC) have the same mutation at c. 104, while mammary tumors and canine transmissible venereal tumor (CTVT) contain different mutations. Referring to changes in protein, synonymous mutations in granulomas were observed while certain mutations in squamous cell carcinoma (SCC) and head & neck tumors were detected in Canis familiaris. In Felis catus, the mutant protein was similar to wild type protein with exception of mutant 5 of mammary tumor, which had a deletion at the 287 amino acid position.Conclusion: The insight gathered on the p53 mutant proteins in both species aided our understanding of the in-vivo fate of the p53 protein and its isoforms and the effects that morphological changes can have on the fate of cells. Furthermore, isolation of this protein may augment our understanding about the structural biology of these proteins

    Accurate Prediction of Protein Structural Class

    Get PDF
    Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods

    A series of PDB related databases for everyday needs

    Get PDF
    The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design

    Investigation into the role of sequence-driven-features and amino acid indices for the prediction of structural classes of proteins

    Get PDF
    The work undertaken within this thesis is towards the development of a representative set of sequence driven features for the prediction of structural classes of proteins. Proteins are biological molecules that make living things function, to determine the function of a protein the structure must be known because the structure dictates its physical capabilities. A protein is generally classified into one of the four main structural classes, namely all-α, all-β, α + β or α / β, which are based on the arrangements and gross content of the secondary structure elements. Current methods manually assign the structural classes to the protein by manual inspection, which is a slow process. In order to address the problem, this thesis is concerned with the development of automated prediction of structural classes of proteins and extraction of a small but robust set of sequence driven features by using the amino acid indices. The first main study undertook a comprehensive analysis of the largest collection of sequence driven features, which includes an existing set of 1479 descriptor values grouped by ten different feature groups. The results show that composition based feature groups are the most representative towards the four main structural classes, achieving a predictive accuracy of 63.87%. This finding led to the second main study, development of the generalised amino acid composition method (GAAC), where amino acid index values are used to weigh corresponding amino acids. GAAC method results in a higher accuracy of 68.02%. The third study was to refine the amino acid indices database, which resulted in the highest accuracy of 75.52%. The main contributions from this thesis are the development of four computationally extracted sequence driven feature-sets based on the underused amino acid indices. Two of these methods, GAAC and the hybrid method have shown improvement over the usage of traditional sequence driven features in the context of smaller and refined feature sizes and classification accuracy. The development of six non-redundant novel sets of the amino acid indices dataset, of which each are more representative than the original database. Finally, the construction of two large 25% and 40% homology datasets consisting over 5000 and 7000 protein samples, respectively. A public webserver has been developed located at http://www.generalised-protein-sequence-features.com, which allows biologists and bioinformaticians to extract GAAC sequence driven features from any inputted protein sequence

    Secondary structure-based template selection for fragment-assembly protein structure prediction

    Get PDF
    Proteins play critical biochemical roles in all living organisms; in human beings, they are the targets of 50% of all drugs. Although the first protein structure was determined 60 years ago, experimental techniques are still time and cost consuming. Consequently, in silico protein structure prediction, which is considered a main challenge in computational biology, is fundamental to decipher conformations of protein targets. This thesis contributes to the state of the art of fragment-assembly protein structure prediction. This category has been widely and thoroughly studied due to its application to any type of targets. While the majority of research focuses on enhancing the functions that are used to score fragments by incorporating new terms and optimising their weights, another important issue is how to pick appropriate fragments from a large pool of candidate structures. Since prediction of the main structural classes, i.e. mainly-alpha, mainly-beta and alpha-beta, has recently reached quite a high level of accuracy, we have introduced a novel approach by decreasing the size of the pool of candidate structures to comprise only proteins that share the same structural class a target is likely to adopt. Picking fragments from this customised set of known structures not only has contributed in generating decoys with higher level of accuracy but also has eliminated irrelevant parts of the search space which makes the selection of first models a less complicated process, addressing the inaccuracies of energy functions. In addition to the challenge of adopting a unique template structure for all targets, another one arises whenever relying on the same amount of corrections and fine tunings; such a phase may be damaging to “easy’ targets, i.e. those that comprise a relatively significant percentage of alpha helices. Owing to the sequence-structure correlation based on which fragment-based protein structure prediction was born, we have also proposed a customised phase of correction based on the structural class prediction of the target in question. After using secondary structure prediction as a “global feature” of a target, i.e. structural classes, we have also investigated its usage as a “local feature” to customise the number of candidate fragments, which is currently the same at all positions. Relying on the known facts regarding diversity of short fragments of helices, sheets and loops, the fragment insertion process has been adjusted to make “changes” relative to the expected complexity of each region. We have proved in this thesis the extent to which secondary structure features can be used implicitly or explicitly to enhance fragment assembly protein structure prediction
    corecore