17,215 research outputs found

    Understanding Transcriptional Regulation Using De-novo Sequence Motif Discovery, Network Inference and Interactome Data

    Full text link
    Gene regulation is a complex process involving the role of several genomic elements which work in concert to drive spatio-temporal expression. The experimental characterization of gene regulatory elements is a very complex and resource-intensive process. One of the major goals in computational biology is the \textit{in-silico} annotation of previously uncharacterized elements using results from the subset of known, previously annotated, regulatory elements. The recent results of the ENCODE project (\emph{http://encode.nih.gov}) presented in-depth analysis of such functional (regulatory) non-coding elements for 1% of the human genome. It is hoped that the results obtained on this subset can be scaled to the rest of the genome. This is an extremely important effort which will enable faster dissection of other functional elements in key biological processes such as disease progression and organ development (\cite{Kleinjan2005},\cite{Lieb2006}. The computational annotation of these hitherto uncharacterized regions would require an identification of features that have good predictive value. In this work, we study transcriptional regulation as a problem in heterogeneous data integration, across sequence, expression and interactome level attributes. Using the example of the \textit{Gata2} gene and its recently discovered urogenital enhancers \cite{Khandekar2004} as a case study, we examine the predictive value of various high throughput functional genomic assays (from projects like ENCODE and SymAtlas) in characterizing these enhancers and their regulatory role. Observing results from the application of modern statistical learning methodologies for each of these data modalities, we propose a set of features that are most discriminatory to find these enhancers.Comment: 25 pages, 9 fig

    An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs

    Get PDF
    Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty. Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed. Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven

    Nucleosome positioning and energetics: Recent advances in genomic and computational studies

    Full text link
    Chromatin is a complex of DNA, RNA and proteins whose primary function is to package genomic DNA into the tight confines of a cell nucleus. A fundamental repeating unit of chromatin is the nucleosome, an octamer of histone proteins around which 147 base pairs of DNA are wound in almost two turns of a left-handed superhelix. Chromatin is a dynamic structure which exerts profound influence on regulation of gene expression and other cellular functions. These chromatin-directed processes are facilitated by optimizing nucleosome positions throughout the genome and by remodeling nucleosomes in response to various external and internal signals such as environmental perturbations. Here we discuss large-scale maps of nucleosome positions made available through recent advances in parallel high-throughput sequencing and microarray technologies. We show that these maps reveal common features of nucleosome organization in eukaryotic genomes. We also survey computational models designed to predict nucleosome formation scores or energies, and demonstrate how these predictions can be used to position multiple nucleosome on the genome without steric overlap.Comment: 41 pages, 11 figure

    Electrostatic map of T7 DNA. Comparative analysis of functional and electrostatic properties of T7 RNA polymerase specific promoters

    Full text link
    The entire T7 bacteriophage genome contains 39937 base pairs (Database NCBI RefSeq N1001604). Here, electrostatic potential distribution around double helical T7 DNA was calculated by Coulomb method using the computer program of Sorokin A.A. Electrostatic profiles of 17 promoters recognized by T7 phage specific RNA polymerase were analyzed. It was shown that electrostatic profiles of all T7 RNA polymerase specific promoters can be characterized by distinctive motifs which are specific for each promoter class. Comparative analysis of electrostatic profiles of native T7 promoters of different classes demonstrates that T7 RNA polymerase can differentiate them due to their electrostatic features.Comment: This is an Author's Original Manuscript of an article submitted for consideration in the Journal of Journal of Biomolecular Structure & Dynamic

    Retrotransposon Tto1: functional analysis and engineering for insertional mutagenesis

    Get PDF
    Retrotransposons are genomic parasites activated by stress conditions that can be seriously detrimental for their host. In this work I demonstrate that Tto1, a typical plant LTR retrotransposon with insertion preference into genes can be turned into a synthetic molecular tool for gene tagging in plants and can be used to predict models for its replication steps. Although retrotransposons have been already used in plant mutagenesis, such application always required establishing protocols for tissue cultures and regeneration in vitro. Here, I show that sequence engineering of Tto1 provides the possibility to obtain transposition in vivo, with a simple screening method based on PCR and with the advantage to skip all in vitro manipulations. An artificial -estradiol inducible promoter has been used to obtain transposition “on demand” in Arabidopsis plants, which generates stable unlinked insertions that follow mendelian segregation in the progeny. Comparing serial deletions of 3’ LTR of the engineered inducible Tto1 (iTto1), I have mapped its two natural terminators and identified the “minimal” R (redundant) region required to achieve the complete reverse transcription of the genomic mRNA into a new cDNA copy. Interestingly, the transcripts ending at the major “early” terminator cannot support reverse transcription, suggesting a mechanism of natural control on the expression. Transcripts with a more extended termination point contain 100 essential nucleotides that define the active nucleus of the R region. This sequence promotes the formation of a stable hairpin structure that “kisses” a complementary identical hairpin on the cDNA and determines the formation of the characteristic cDNA/mRNA heteroduplex. Since the LTR is a repeated sequence the definition of a minimal redundant region has also the important implication to reduce the only possible target for sequence-based gene silencing, which should lead to an increase of the mutagenic efficiency of iTto1. Additional investigations have been carried out in attempt to identify points of improvement of iTto1 performances. By sequence alignment I identified different versions of the integrase that might have influence on insertion efficiency. Furthermore I tested the pOp6/LhGR-N system that will provide higher expression levels in different host plants. The final goal of my work is to extend the application of iTto1 to crop mutagenesis, therefore a big part of my work has been spent to develop Tto1 constructs with activity in barley. Transgenic plants have been obtained, however the constructs still need further experimentation

    Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin

    Full text link
    The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together? Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; we call AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map. Code and data are shared at www.deepchrome.orgComment: 12 pages; At NIPS 201

    Evolving methods for rational de novo design of functional RNA molecules

    Full text link
    Artificial RNA molecules with novel functionality have many applications in synthetic biology, pharmacy and white biotechnology. The de novo design of such devices using computational methods and prediction tools is a resource-efficient alternative to experimental screening and selection pipelines. In this review, we describe methods common to many such computational approaches, thoroughly dissect these methods and highlight open questions for the individual steps. Initially, it is essential to investigate the biological target system, the regulatory mechanism that will be exploited, as well as the desired components in order to define design objectives. Subsequent computational design is needed to combine the selected components and to obtain novel functionality. This process can usually be split into constrained sequence sampling, the formulation of an optimization problem and an in silico analysis to narrow down the number of candidates with respect to secondary goals. Finally, experimental analysis is important to check whether the defined design objectives are indeed met in the target environment and detailed characterization experiments should be performed to improve the mechanistic models and detect missing design requirements.Comment: Published at METHODS, Issue title: Chemical Biology of RNA, Guest Editor: Michael Ryckelync
    corecore