Search CORE

17,215 research outputs found

Understanding Transcriptional Regulation Using De-novo Sequence Motif Discovery, Network Inference and Interactome Data

Author: Engel James Douglas
Hero Alfred O.
Rao Arvind
States David J.
Publication venue
Publication date: 09/10/2007
Field of study

Gene regulation is a complex process involving the role of several genomic elements which work in concert to drive spatio-temporal expression. The experimental characterization of gene regulatory elements is a very complex and resource-intensive process. One of the major goals in computational biology is the \textit{in-silico} annotation of previously uncharacterized elements using results from the subset of known, previously annotated, regulatory elements. The recent results of the ENCODE project (\emph{http://encode.nih.gov}) presented in-depth analysis of such functional (regulatory) non-coding elements for 1% of the human genome. It is hoped that the results obtained on this subset can be scaled to the rest of the genome. This is an extremely important effort which will enable faster dissection of other functional elements in key biological processes such as disease progression and organ development (\cite{Kleinjan2005},\cite{Lieb2006}. The computational annotation of these hitherto uncharacterized regions would require an identification of features that have good predictive value. In this work, we study transcriptional regulation as a problem in heterogeneous data integration, across sequence, expression and interactome level attributes. Using the example of the \textit{Gata2} gene and its recently discovered urogenital enhancers \cite{Khandekar2004} as a case study, we examine the predictive value of various high throughput functional genomic assays (from projects like ENCODE and SymAtlas) in characterizing these enhancers and their regulatory role. Observing results from the application of modern statistical learning methodologies for each of these data modalities, we propose a set of features that are most discriminatory to find these enhancers.Comment: 25 pages, 9 fig

arXiv.org e-Print Archive

An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs

Author: A Sandelin
A Sandelin
A Sharov
A Tomovic
Adrian J Shepherd
Armando Blanco
C Lawrence
D Denning
E Baker
E Szmidt
E Wingender
F Garcia
F Lam
F Lopez
F Offner
F Zare-Mirakabad
Fernando Garcia-Alcalde
G Chamilos
G Diop
G Hertz
J Hanley
J Hughes
J Sainz
J Van Helden
J Zhao
K Atanassov
K Atanassov
K Atanassov
K Atanassov
K Won
L Liang
L Zadeh
M Bulyk
M Das
M Eisen
N Dror
N Kim
P Benos
P Bochud
P Schling
R Gordan
S De
T Bailey
T Fawcett
T Hehlgans
T Tamura
T Tamura
V Khatibi
W Hung
W Wasserman
X Chen
Y Haudry
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty. Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed. Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven

Springer - Publisher Connector

Directory of Open Access Journals

Repositorio Institucional Universidad de Granada

Birkbeck Institutional Research Online

Nucleosome positioning and energetics: Recent advances in genomic and computational studies

Author: Morozov Alexandre V.
Tolkunov Denis
Publication venue
Publication date: 19/12/2009
Field of study

Chromatin is a complex of DNA, RNA and proteins whose primary function is to package genomic DNA into the tight confines of a cell nucleus. A fundamental repeating unit of chromatin is the nucleosome, an octamer of histone proteins around which 147 base pairs of DNA are wound in almost two turns of a left-handed superhelix. Chromatin is a dynamic structure which exerts profound influence on regulation of gene expression and other cellular functions. These chromatin-directed processes are facilitated by optimizing nucleosome positions throughout the genome and by remodeling nucleosomes in response to various external and internal signals such as environmental perturbations. Here we discuss large-scale maps of nucleosome positions made available through recent advances in parallel high-throughput sequencing and microarray technologies. We show that these maps reveal common features of nucleosome organization in eukaryotic genomes. We also survey computational models designed to predict nucleosome formation scores or energies, and demonstrate how these predictions can be used to position multiple nucleosome on the genome without steric overlap.Comment: 41 pages, 11 figure

arXiv.org e-Print Archive

Electrostatic map of T7 DNA. Comparative analysis of functional and electrostatic properties of T7 RNA polymerase specific promoters

Author: Beskaravainy P. M.
Dzhelyadin T. R.
Kamzolova S. G.
Osypov A. A.
Sorokin A. A.
Temlyakova E. A.
Publication venue: 'Informa UK Limited'
Publication date: 24/06/2013
Field of study

The entire T7 bacteriophage genome contains 39937 base pairs (Database NCBI RefSeq N1001604). Here, electrostatic potential distribution around double helical T7 DNA was calculated by Coulomb method using the computer program of Sorokin A.A. Electrostatic profiles of 17 promoters recognized by T7 phage specific RNA polymerase were analyzed. It was shown that electrostatic profiles of all T7 RNA polymerase specific promoters can be characterized by distinctive motifs which are specific for each promoter class. Comparative analysis of electrostatic profiles of native T7 promoters of different classes demonstrates that T7 RNA polymerase can differentiate them due to their electrostatic features.Comment: This is an Author's Original Manuscript of an article submitted for consideration in the Journal of Journal of Biomolecular Structure & Dynamic

arXiv.org e-Print Archive

Retrotransposon Tto1: functional analysis and engineering for insertional mutagenesis

Author: Tramontano Andrea
Tramontano Andrea
Tramontano Andrea
Publication venue: 'Elsevier BV'
Publication date: 21/02/2011
Field of study

Retrotransposons are genomic parasites activated by stress conditions that can be seriously detrimental for their host. In this work I demonstrate that Tto1, a typical plant LTR retrotransposon with insertion preference into genes can be turned into a synthetic molecular tool for gene tagging in plants and can be used to predict models for its replication steps. Although retrotransposons have been already used in plant mutagenesis, such application always required establishing protocols for tissue cultures and regeneration in vitro. Here, I show that sequence engineering of Tto1 provides the possibility to obtain transposition in vivo, with a simple screening method based on PCR and with the advantage to skip all in vitro manipulations. An artificial -estradiol inducible promoter has been used to obtain transposition “on demand” in Arabidopsis plants, which generates stable unlinked insertions that follow mendelian segregation in the progeny. Comparing serial deletions of 3’ LTR of the engineered inducible Tto1 (iTto1), I have mapped its two natural terminators and identified the “minimal” R (redundant) region required to achieve the complete reverse transcription of the genomic mRNA into a new cDNA copy. Interestingly, the transcripts ending at the major “early” terminator cannot support reverse transcription, suggesting a mechanism of natural control on the expression. Transcripts with a more extended termination point contain 100 essential nucleotides that define the active nucleus of the R region. This sequence promotes the formation of a stable hairpin structure that “kisses” a complementary identical hairpin on the cDNA and determines the formation of the characteristic cDNA/mRNA heteroduplex. Since the LTR is a repeated sequence the definition of a minimal redundant region has also the important implication to reduce the only possible target for sequence-based gene silencing, which should lead to an increase of the mutagenic efficiency of iTto1. Additional investigations have been carried out in attempt to identify points of improvement of iTto1 performances. By sequence alignment I identified different versions of the integrase that might have influence on insertion efficiency. Furthermore I tested the pOp6/LhGR-N system that will provide higher expression levels in different host plants. The final goal of my work is to extend the application of iTto1 to crop mutagenesis, therefore a big part of my work has been spent to develop Tto1 constructs with activity in barley. Transgenic plants have been obtained, however the constructs still need further experimentation

Recommended from our members

Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM.

Author: Huang Lan
Li Liqi
Li Yongsheng
Xiao Weidong
Yang Hua
Yu Sanjiu
Zheng Xiaoqi
Zhou Shiwen
Publication venue: eScholarship, University of California
Publication date: 20/11/2014
Field of study

BackgroundIdentification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed.ResultsHere we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools.ConclusionsComparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots

eScholarship - University of California

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin

Author: Lanchantin Jack
Qi Yanjun
Sekhon Arshdeep
Singh Ritambhara
Publication venue
Publication date: 07/11/2017
Field of study

The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together? Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; we call AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map. Code and data are shared at www.deepchrome.orgComment: 12 pages; At NIPS 201

arXiv.org e-Print Archive

Recommended from our members

TCO, a Putative Transcriptional Regulator in Arabidopsis, Is a Target of the Protein Kinase CK2.

Author: Carey Nicholas S
Chow Brenda Y
Krogan Naden T
Krogan Nevan J
Running Katherine LD
Stevenson Erica J
Swaney Danielle L
Weinman Laina M
Publication venue: eScholarship, University of California
Publication date: 28/12/2018
Field of study

As multicellular organisms grow, spatial and temporal patterns of gene expression are strictly regulated to ensure that developmental programs are invoked at appropriate stages. In this work, we describe a putative transcriptional regulator in Arabidopsis, TACO LEAF (TCO), whose overexpression results in the ectopic activation of reproductive genes during vegetative growth. Isolated as an activation-tagged allele, tco-1D displays gene misexpression and phenotypic abnormalities, such as curled leaves and early flowering, characteristic of chromatin regulatory mutants. A role for TCO in this mode of transcriptional regulation is further supported by the subnuclear accumulation patterns of TCO protein and genetic interactions between tco-1D and chromatin modifier mutants. The endogenous expression pattern of TCO and gene misregulation in tco loss-of-function mutants indicate that this factor is involved in seed development. We also demonstrate that specific serine residues of TCO protein are targeted by the ubiquitous kinase CK2. Collectively, these results identify TCO as a novel regulator of gene expression whose activity is likely influenced by phosphorylation, as is the case with many chromatin regulators

eScholarship - University of California

Evolving methods for rational de novo design of functional RNA molecules

Author: Findeiß Sven
Günzel Christian
Hammer Stefan
Mörl Mario
Publication venue: 'Elsevier BV'
Publication date: 10/05/2019
Field of study

Artificial RNA molecules with novel functionality have many applications in synthetic biology, pharmacy and white biotechnology. The de novo design of such devices using computational methods and prediction tools is a resource-efficient alternative to experimental screening and selection pipelines. In this review, we describe methods common to many such computational approaches, thoroughly dissect these methods and highlight open questions for the individual steps. Initially, it is essential to investigate the biological target system, the regulatory mechanism that will be exploited, as well as the desired components in order to define design objectives. Subsequent computational design is needed to combine the selected components and to obtain novel functionality. This process can usually be split into constrained sequence sampling, the formulation of an optimization problem and an in silico analysis to narrow down the number of candidates with respect to secondary goals. Finally, experimental analysis is important to check whether the defined design objectives are indeed met in the target environment and detailed characterization experiments should be performed to improve the mechanistic models and detect missing design requirements.Comment: Published at METHODS, Issue title: Chemical Biology of RNA, Guest Editor: Michael Ryckelync

arXiv.org e-Print Archive

Recommended from our members

Principles of dimer-specific gene regulation revealed by a comprehensive characterization of NF-κB family DNA binding.

Author: Ahmed Bilal
Bulyk Martha L
Chang Abraham B
Ragoussis Jiannis
Siggers Trevor
Smale Stephen T
Teixeira Ana
Udalova Irina A
Williams Kevin J
Wong Daniel
Publication venue: eScholarship, University of California
Publication date: 20/11/2011
Field of study

The unique DNA-binding properties of distinct NF-κB dimers influence the selective regulation of NF-κB target genes. To more thoroughly investigate these dimer-specific differences, we combined protein-binding microarrays and surface plasmon resonance to evaluate DNA sites recognized by eight different NF-κB dimers. We observed three distinct binding-specificity classes and clarified mechanisms by which dimers might regulate distinct sets of genes. We identified many new nontraditional NF-κB binding site (κB site) sequences and highlight the plasticity of NF-κB dimers in recognizing κB sites with a single consensus half-site. This study provides a database that can be used in efforts to identify NF-κB target sites and uncover gene regulatory circuitry

eScholarship - University of California