1,086 research outputs found

    Modelling the transcriptional regulation of androgen receptor in prostate cancer

    Get PDF
    Transcription of genes and production of proteins are essential functions of a normal cell. If disturbed, misregulation of crucial genes leads to aberrant cell behaviour and in some cases, leads to the development of diseased states such as cancer. One major transcriptional regulation tool involves the binding of transcription factor onto enhancer sequences that will encourage or repress transcription depending on the role of the transcription factor. In prostate cells, misregulation of the androgen receptor(AR), a key transcriptional regulator, leads to the development and maintenance of prostate cancer. Androgen receptor binds to numerous locations in the genome, but it is still unclear how and which other key transcription factors aid and repress AR-mediated transcription. Here I analyzed the data that contained the transcriptional activity of 4139 putative AR binding sites (ARBS) in the genome with and without the presence of hormone using the STARR-seq assay. Only a small fraction of ARBS showed significant differential expression when treated with hormone. To understand the underlying essential factors behind hormone-dependent behaviour, we developed both machine learning and biophysical models to identify active enhancers in prostate cancer cells. We also identify potentially crucial transcription factors for androgen-dependent behaviour and discuss the benefits and shortcomings of each modelling method

    APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required.</p> <p>Results</p> <p>In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods.</p> <p>Conclusion</p> <p>We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site <url>http://home.ustc.edu.cn/~jfxia/hotspot.html</url>.</p

    GENOME-WIDE ANALYSIS OF LONG NON-CODING RNA (LNCRNA) OF AUTOIMMUNE THYROID DISEASES USING BIOINFORMATICS APPROACHES

    Get PDF
    Objective: Long non-coding RNA's (lncRNA's) have a crucial role in cancer biology. In this study, the genome sequence analysis of lncRNA expression in autoimmune thyroid disease is done to identify novel targets for further study of the disease.Methods: All the data were collected from Disgenet and Ensemble genome browser. Gene ontology and network analysis were performed using the standard enrichment annotation method. Association of lncRNA and their targeted mRNA were analyzed by GENEMANIA.Results: Of the all 334 lncRNA transcripts identified, only four had coding potential. LncRNA'stranscripts ENST00000462973, ENST00000555326 were involved in autoimmune thyroid disease pathway which corresponds to thyroid peroxidase (TPO) and thyroid-stimulating hormone receptor (TSHR), and this could provide better insights to therapeutics.Conclusion: Our current study on the potential link between lncRNAs and autoimmune thyroid disease presents a novel area for further investigations into the target genes of such lncRNAs, leading to therapeutic strategies for the disease.Keywords: lncRNA, Autoimmune thyroid disease, GENEMANI

    A machine learning approach for the identification of odorant binding proteins from sequence-derived properties

    Get PDF
    Background: Odorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins. Results: In this paper, we propose a new algorithm that uses Regularized Least Squares Classifier (RLSC) in conjunction with multiple physicochemical properties of amino acids to predict odorantbinding proteins. The algorithm was applied to the dataset derived from Pfam and GenDiS database and we obtained overall prediction accuracy of 97.7% (94.5% and 98.4% for positive and negative classes respectively). Conclusion: Our study suggests that RLSC is potentially useful for predicting the odorant binding proteins from sequence-derived properties irrespective of sequence similarity. Our method predicts 92.8% of 56 odorant binding proteins non-homologous to any protein in the swissprot database and 97.1% of the 414 independent dataset proteins, suggesting the usefulness of RLSC method for facilitating the prediction of odorant binding proteins from sequence information

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest

    Get PDF
    corecore