1,207 research outputs found

    Unraveling the transcriptional Cis-regulatory code

    Get PDF
    It is nowadays accepted that eukaryotic complexity is not dictated by the number of protein-coding genes of the genome, but rather achieved through the combinatorics of gene expression programs. Distinct aspects of the expression pattern of a gene are mediated by discrete regulatory sequences, known as cis-regulatory elements. The work described in this thesis was aimed at developing computational and statistical methods to guide the search and characterization of novel cis-regulatory elements

    Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    Get PDF
    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan

    Supervised classification and mathematical optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.Ministerio de Ciencia e InnovaciĆ³nJunta de AndalucĆ­

    Prediction of amphipathic in-plane membrane anchors in monotopic proteins using a SVM classifier

    Get PDF
    BACKGROUND: Membrane proteins are estimated to represent about 25% of open reading frames in fully sequenced genomes. However, the experimental study of proteins remains difficult. Considerable efforts have thus been made to develop prediction methods. Most of these were conceived to detect transmembrane helices in polytopic proteins. Alternatively, a membrane protein can be monotopic and anchored via an amphipathic helix inserted in a parallel way to the membrane interface, so-called in-plane membrane (IPM) anchors. This type of membrane anchor is still poorly understood and no suitable prediction method is currently available. RESULTS: We report here the "AmphipaSeeK" method developed to predict IPM anchors. It uses a set of 21 reported examples of IPM anchored proteins. The method is based on a pattern recognition Support Vector Machine with a dedicated kernel. CONCLUSION: AmphipaSeeK was shown to be highly specific, in contrast with classically used methods (e.g. hydrophobic moment). Additionally, it has been able to retrieve IPM anchors in naively tested sets of transmembrane proteins (e.g. PagP). AmphipaSeek and the list of the 21 IPM anchored proteins is available on NPS@, our protein sequence analysis server

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Residue propensities, discrimination and binding site prediction of adenine and guanine phosphates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Adenine and guanine phosphates are involved in a number of biological processes such as cell signaling, metabolism and enzymatic cofactor functions. Binding sites in proteins for these ligands are often detected by looking for a previously known motif by alignment based search. This is likely to miss those where a similar binding site has not been previously characterized and when the binding sites do not follow the rule described by predefined motif. Also, it is intriguing how proteins select between adenine and guanine derivative with high specificity.</p> <p>Results</p> <p>Residue preferences for AMP, GMP, ADP, GDP, ATP and GTP have been investigated in details with additional comparison with cyclic variants cAMP and cGMP. We also attempt to predict residues interacting with these nucleotides using information derived from local sequence and evolutionary profiles. Results indicate that subtle differences exist between single residue preferences for specific nucleotides and taking neighbor environment and evolutionary context into account, successful models of their binding site prediction can be developed.</p> <p>Conclusion</p> <p>In this work, we explore how single amino acid propensities for these nucleotides play a role in the affinity and specificity of this set of nucleotides. This is expected to be helpful in identifying novel binding sites for adenine and guanine phosphates, especially when a known binding motif is not detectable.</p

    Machine learning for regulatory analysis and transcription factor target prediction in yeast

    Get PDF
    High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional mapsā€”the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data

    Computational and Experimental Approaches to Reveal the Effects of Single Nucleotide Polymorphisms with Respect to Disease Diagnostics

    Get PDF
    DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules

    In silico comparative genomic analysis of GABAA receptor transcriptional regulation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Subtypes of the GABA<sub>A </sub>receptor subunit exhibit diverse temporal and spatial expression patterns. <it>In silico </it>comparative analysis was used to predict transcriptional regulatory features in individual mammalian GABA<sub>A </sub>receptor subunit genes, and to identify potential transcriptional regulatory components involved in the coordinate regulation of the GABA<sub>A </sub>receptor gene clusters.</p> <p>Results</p> <p>Previously unreported putative promoters were identified for the Ī²2, Ī³1, Ī³3, Īµ, Īø and Ļ€ subunit genes. Putative core elements and proximal transcriptional factors were identified within these predicted promoters, and within the experimentally determined promoters of other subunit genes. Conserved intergenic regions of sequence in the mammalian GABA<sub>A </sub>receptor gene cluster comprising the Ī±1, Ī²2, Ī³2 and Ī±6 subunits were identified as potential long range transcriptional regulatory components involved in the coordinate regulation of these genes. A region of predicted DNase I hypersensitive sites within the cluster may contain transcriptional regulatory features coordinating gene expression. A novel model is proposed for the coordinate control of the gene cluster and parallel expression of the Ī±1 and Ī²2 subunits, based upon the selective action of putative Scaffold/Matrix Attachment Regions (S/MARs).</p> <p>Conclusion</p> <p>The putative regulatory features identified by genomic analysis of GABA<sub>A </sub>receptor genes were substantiated by cross-species comparative analysis and now require experimental verification. The proposed model for the coordinate regulation of genes in the cluster accounts for the head-to-head orientation and parallel expression of the Ī±1 and Ī²2 subunit genes, and for the disruption of transcription caused by insertion of a neomycin gene in the close vicinity of the Ī±6 gene, which is proximal to a putative critical S/MAR.</p

    Residue Propensities, Discrimination and Binding Site Prediction of Adenine and Guanine Phosphates

    Get PDF
    Background: Adenine and guanine phosphates are involved in a number of biological processes such as cell signaling, metabolism and enzymatic cofactor functions. Binding sites in proteins for these ligands are often detected by looking for a previously known motif by alignment based search. This is likely to miss those where a similar binding site has not been previously characterized and when the binding sites do not follow the rule described by predefined motif. Also, it is intriguing how proteins select between adenine and guanine derivative with high specificity. Results: Residue preferences for AMP, GMP, ADP, GDP, ATP and GTP have been investigated in details with additional comparison with cyclic variants cAMP and cGMP. We also attempt to predict residues interacting with these nucleotides using information derived from local sequence and evolutionary profiles. Results indicate that subtle differences exist between single residue preferences for specific nucleotides and taking neighbor environment and evolutionary context into account, successful models of their binding site prediction can be developed. Conclusion: In this work, we explore how single amino acid propensities for these nucleotides play a role in the affinity and specificity of this set of nucleotides. This is expected to be helpful in identifying novel binding sites for adenine and guanine phosphates, especially when a known binding motif is not detectable
    • ā€¦
    corecore