408 research outputs found

    Discrimination of soluble and aggregation-prone proteins based on sequence information

    Get PDF
    Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer’s disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using redundancy-reduced dataset (sequence identity <= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at http://shark.abl.ku.edu/ProS/

    Diagnostic prediction of complex diseases using phase-only correlation based on virtual sample template

    Get PDF
    Motivation: Complex diseases induce perturbations to interaction and regulation networks in living systems, resulting in dynamic equilibrium states that differ for different diseases and also normal states. Thus identifying gene expression patterns corresponding to different equilibrium states is of great benefit to the diagnosis and treatment of complex diseases. However, it remains a major challenge to deal with the high dimensionality and small size of available complex disease gene expression datasets currently used for discovering gene expression patterns. Results: Here we present a phase-only correlation (POC) based classification method for recognizing the type of complex diseases. First, a virtual sample template is constructed for each subclass by averaging all samples of each subclass in a training dataset. Then the label of a test sample is determined by measuring the similarity between the test sample and each template. This novel method can detect the similarity of overall patterns emerged from the differentially expressed genes or proteins while ignoring small mismatches. Conclusions: The experimental results obtained on seven publicly available complex disease datasets including microarray and protein array data demonstrate that the proposed POC-based disease classification method is effective and robust for diagnosing complex diseases with regard to the number of initially selected features, and its recognition accuracy is better than or comparable to other state-of-the-art machine learning methods. In addition, the proposed method does not require parameter tuning and data scaling, which can effectively reduce the occurrence of over-fitting and bias

    In Silico Classification of Proteins from Acidic and Neutral Cytoplasms

    Get PDF
    Protein acidostability is a common problem in biopharmaceutical and other industries. However, it remains a great challenge to engineer proteins for enhanced acidostability because our knowledge of protein acidostabilization is still very limited. In this paper, we present a comparative study of proteins from bacteria with acidic (AP) and neutral cytoplasms (NP) using an integrated statistical and machine learning approach. We construct a set of 393 non-redundant AP-NP ortholog pairs and calculate a total of 889 sequence based features for these proteins. The pairwise alignments of these ortholog pairs are used to build a residue substitution propensity matrix between APs and NPs. We use Gini importance provided by the Random Forest algorithm to rank the relative importance of these features. A scoring function using the 10 most significant features is developed and optimized using a hill climbing algorithm. The accuracy of the score function is 86.01% in predicting AP-NP ortholog pairs and is 76.65% in predicting non-ortholog AP-NP pairs, suggesting that there are significant differences between APs and NPs which can be used to predict relative acidostability of proteins. The overall trends uncovered in the study can be used as general guidelines for designing acidostable proteins. To best of our knowledge, this work represents the first systematic comparative study of the acidostable proteins and their non-acidostable orthologs

    Function annotation of hepatic retinoid x receptor α based on genome-wide DNA binding and transcriptome profiling.

    Get PDF
    BackgroundRetinoid x receptor α (RXRα) is abundantly expressed in the liver and is essential for the function of other nuclear receptors. Using chromatin immunoprecipitation sequencing and mRNA profiling data generated from wild type and RXRα-null mouse livers, the current study identifies the bona-fide hepatic RXRα targets and biological pathways. In addition, based on binding and motif analysis, the molecular mechanism by which RXRα regulates hepatic genes is elucidated in a high-throughput manner.Principal findingsClose to 80% of hepatic expressed genes were bound by RXRα, while 16% were expressed in an RXRα-dependent manner. Motif analysis predicted direct repeat with a spacer of one nucleotide as the most prevalent RXRα binding site. Many of the 500 strongest binding motifs overlapped with the binding motif of specific protein 1. Biological functional analysis of RXRα-dependent genes revealed that hepatic RXRα deficiency mainly resulted in up-regulation of steroid and cholesterol biosynthesis-related genes and down-regulation of translation- as well as anti-apoptosis-related genes. Furthermore, RXRα bound to many genes that encode nuclear receptors and their cofactors suggesting the central role of RXRα in regulating nuclear receptor-mediated pathways.ConclusionsThis study establishes the relationship between RXRα DNA binding and hepatic gene expression. RXRα binds extensively to the mouse genome. However, DNA binding does not necessarily affect the basal mRNA level. In addition to metabolism, RXRα dictates the expression of genes that regulate RNA processing, translation, and protein folding illustrating the novel roles of hepatic RXRα in post-transcriptional regulation

    Using multitask classification methods to investigate the kinase-specific phosphorylation sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.</p> <p>Methods</p> <p>A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).</p> <p>Results</p> <p>Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.</p> <p>Conclusions</p> <p>The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.</p

    Identification of Properties Important to Protein Aggregation Using Feature Selection

    Get PDF
    Background: Protein aggregation is a significant problem in the biopharmaceutical industry (protein drug stability) and is associated medically with over 40 human diseases. Although a number of computational models have been developed for predicting aggregation propensity and identifying aggregation-prone regions in proteins, little systematic research has been done to determine physicochemical properties relevant to aggregation and their relative importance to this important process. Such studies may result in not only accurately predicting peptide aggregation propensities and identifying aggregation prone regions in proteins, but also aid in discovering additional underlying mechanisms governing this process. Results: We use two feature selection algorithms to identify 16 features, out of a total of 560 physicochemical properties, presumably important to protein aggregation. Two predictors (ProA-SVM and ProA-RF) using selected features are built for predicting peptide aggregation propensity and identifying aggregation prone regions in proteins. Both methods are compared favourably to other state-of-the-art algorithms in cross validation. The identified important properties are fairly consistent with previous studies and bring some new insights into protein and peptide aggregation. One interesting new finding is that aggregation prone peptide sequences have similar properties to signal peptide and signal anchor sequences. Conclusions: Both predictors are implemented in a freely available web application (http://www.abl.ku.edu/ProA/ webcite). We suggest that the quaternary structure of protein aggregates, especially soluble oligomers, may allow the formation of new molecular recognition signals that guide aggregate targeting to specific cellular sites

    Function Annotation of Hepatic Retinoid x Receptor α Based on Genome-Wide DNA Binding and Transcriptome Profiling

    Get PDF
    Background Retinoid x receptor α (RXRα) is abundantly expressed in the liver and is essential for the function of other nuclear receptors. Using chromatin immunoprecipitation sequencing and mRNA profiling data generated from wild type and RXRα-null mouse livers, the current study identifies the bona-fide hepatic RXRα targets and biological pathways. In addition, based on binding and motif analysis, the molecular mechanism by which RXRα regulates hepatic genes is elucidated in a high-throughput manner. Principal Findings Close to 80% of hepatic expressed genes were bound by RXRα, while 16% were expressed in an RXRα-dependent manner. Motif analysis predicted direct repeat with a spacer of one nucleotide as the most prevalent RXRα binding site. Many of the 500 strongest binding motifs overlapped with the binding motif of specific protein 1. Biological functional analysis of RXRα-dependent genes revealed that hepatic RXRα deficiency mainly resulted in up-regulation of steroid and cholesterol biosynthesis-related genes and down-regulation of translation- as well as anti-apoptosis-related genes. Furthermore, RXRα bound to many genes that encode nuclear receptors and their cofactors suggesting the central role of RXRα in regulating nuclear receptor-mediated pathways. Conclusions This study establishes the relationship between RXRα DNA binding and hepatic gene expression. RXRα binds extensively to the mouse genome. However, DNA binding does not necessarily affect the basal mRNA level. In addition to metabolism, RXRα dictates the expression of genes that regulate RNA processing, translation, and protein folding illustrating the novel roles of hepatic RXRα in post-transcriptional regulation.This work was supported by the National Institutes of Health (DK092100 and CA053596 to YYW). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    A preliminary analysis of the formation of travertine and travertine cones in the Jifei hot spring, Yunnan, China

    Get PDF
    The Jifei hot spring emerges in the form of a spring group in the Tibet–Yunnan geothermal zone, southwest of Yunnan Province, China. The temperatures of spring waters range from 35 to 81°C and are mainly of HCO3–Na·Ca type. The total discharge of the hot spring is about 10 L/s. The spring is characterized by its huge travertine terrace with an area of about 4,000 m2 and as many as 18 travertine cones of different sizes. The tallest travertine cone is as high as 7.1 m. The travertine formation and evolution can be divided into three periods: travertine terrace deposition period, travertine cone formation period and death period. The hydrochemical characteristics of the Jifei hot spring was analyzed and compared with a local non-travertine hot spring and six other famous travertine springs. The results indicate that the necessary hydrochemical conditions of travertine and travertine cones deposition in the Jifei area are (1) high concentration of HCO3 − and CO2; (2) about 52.9% deep source CO2 with significantly high PCO2 value; (3) very high milliequivalent percentage of HCO3 − (97.4%) with not very high milliequivalent percentage of Ca2+ (24.4%); and (4) a large saturation index of calcite and aragonite of the hot water
    corecore