17,215 research outputs found
Understanding Transcriptional Regulation Using De-novo Sequence Motif Discovery, Network Inference and Interactome Data
Gene regulation is a complex process involving the role of several genomic
elements which work in concert to drive spatio-temporal expression. The
experimental characterization of gene regulatory elements is a very complex and
resource-intensive process. One of the major goals in computational biology is
the \textit{in-silico} annotation of previously uncharacterized elements using
results from the subset of known, previously annotated, regulatory elements.
The recent results of the ENCODE project (\emph{http://encode.nih.gov})
presented in-depth analysis of such functional (regulatory) non-coding elements
for 1% of the human genome. It is hoped that the results obtained on this
subset can be scaled to the rest of the genome. This is an extremely important
effort which will enable faster dissection of other functional elements in key
biological processes such as disease progression and organ development
(\cite{Kleinjan2005},\cite{Lieb2006}. The computational annotation of these
hitherto uncharacterized regions would require an identification of features
that have good predictive value.
In this work, we study transcriptional regulation as a problem in
heterogeneous data integration, across sequence, expression and interactome
level attributes. Using the example of the \textit{Gata2} gene and its recently
discovered urogenital enhancers \cite{Khandekar2004} as a case study, we
examine the predictive value of various high throughput functional genomic
assays (from projects like ENCODE and SymAtlas) in characterizing these
enhancers and their regulatory role. Observing results from the application of
modern statistical learning methodologies for each of these data modalities, we
propose a set of features that are most discriminatory to find these enhancers.Comment: 25 pages, 9 fig
An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs
Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty.
Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed.
Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven
Nucleosome positioning and energetics: Recent advances in genomic and computational studies
Chromatin is a complex of DNA, RNA and proteins whose primary function is to
package genomic DNA into the tight confines of a cell nucleus. A fundamental
repeating unit of chromatin is the nucleosome, an octamer of histone proteins
around which 147 base pairs of DNA are wound in almost two turns of a
left-handed superhelix. Chromatin is a dynamic structure which exerts profound
influence on regulation of gene expression and other cellular functions. These
chromatin-directed processes are facilitated by optimizing nucleosome positions
throughout the genome and by remodeling nucleosomes in response to various
external and internal signals such as environmental perturbations. Here we
discuss large-scale maps of nucleosome positions made available through recent
advances in parallel high-throughput sequencing and microarray technologies. We
show that these maps reveal common features of nucleosome organization in
eukaryotic genomes. We also survey computational models designed to predict
nucleosome formation scores or energies, and demonstrate how these predictions
can be used to position multiple nucleosome on the genome without steric
overlap.Comment: 41 pages, 11 figure
Electrostatic map of T7 DNA. Comparative analysis of functional and electrostatic properties of T7 RNA polymerase specific promoters
The entire T7 bacteriophage genome contains 39937 base pairs (Database NCBI
RefSeq N1001604). Here, electrostatic potential distribution around double
helical T7 DNA was calculated by Coulomb method using the computer program of
Sorokin A.A. Electrostatic profiles of 17 promoters recognized by T7 phage
specific RNA polymerase were analyzed. It was shown that electrostatic profiles
of all T7 RNA polymerase specific promoters can be characterized by distinctive
motifs which are specific for each promoter class. Comparative analysis of
electrostatic profiles of native T7 promoters of different classes demonstrates
that T7 RNA polymerase can differentiate them due to their electrostatic
features.Comment: This is an Author's Original Manuscript of an article submitted for
consideration in the Journal of Journal of Biomolecular Structure & Dynamic
Retrotransposon Tto1: functional analysis and engineering for insertional mutagenesis
Retrotransposons are genomic parasites activated by stress conditions that can be seriously detrimental for their host. In this work I demonstrate that Tto1, a typical plant LTR retrotransposon with insertion preference into genes can be turned into a synthetic molecular tool for gene tagging in plants and can be used to predict models for its replication steps. Although retrotransposons have been already used in plant mutagenesis, such application always required establishing protocols for tissue cultures and regeneration in vitro. Here, I show that sequence engineering of Tto1 provides the possibility to obtain transposition in vivo, with a simple screening method based on PCR and with the advantage to skip all in vitro manipulations. An artificial -estradiol inducible promoter has been used to obtain transposition “on demand” in Arabidopsis plants, which generates stable unlinked insertions that follow mendelian segregation in the progeny.
Comparing serial deletions of 3’ LTR of the engineered inducible Tto1 (iTto1), I have mapped its two natural terminators and identified the “minimal” R (redundant) region required to achieve the complete reverse transcription of the genomic mRNA into a new cDNA copy. Interestingly, the transcripts ending at the major “early” terminator cannot support reverse transcription, suggesting a mechanism of natural control on the expression. Transcripts with a more extended termination point contain 100 essential nucleotides that define the active nucleus of the R region. This sequence promotes the formation of a stable hairpin structure that “kisses” a complementary identical hairpin on the cDNA and determines the formation of the characteristic cDNA/mRNA heteroduplex. Since the LTR is a repeated sequence the definition of a minimal redundant region has also the important implication to reduce the only possible target for sequence-based gene silencing, which should lead to an increase of the mutagenic efficiency of iTto1.
Additional investigations have been carried out in attempt to identify points of improvement of iTto1 performances. By sequence alignment I identified different versions of the integrase that might have influence on insertion efficiency. Furthermore I tested the pOp6/LhGR-N system that will provide higher expression levels in different host plants. The final goal of my work is to extend the application of iTto1 to crop mutagenesis, therefore a big part of my work has been spent to develop Tto1 constructs with activity in barley. Transgenic plants have been obtained, however the constructs still need further experimentation
Recommended from our members
Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM.
BackgroundIdentification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed.ResultsHere we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools.ConclusionsComparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots
Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin
The past decade has seen a revolution in genomic technologies that enable a
flood of genome-wide profiling of chromatin marks. Recent literature tried to
understand gene regulation by predicting gene expression from large-scale
chromatin measurements. Two fundamental challenges exist for such learning
tasks: (1) genome-wide chromatin signals are spatially structured,
high-dimensional and highly modular; and (2) the core aim is to understand what
are the relevant factors and how they work together? Previous studies either
failed to model complex dependencies among input signals or relied on separate
feature analysis to explain the decisions. This paper presents an
attention-based deep learning approach; we call AttentiveChrome, that uses a
unified architecture to model and to interpret dependencies among chromatin
factors for controlling gene regulation. AttentiveChrome uses a hierarchy of
multiple Long short-term memory (LSTM) modules to encode the input signals and
to model how various chromatin marks cooperate automatically. AttentiveChrome
trains two levels of attention jointly with the target prediction, enabling it
to attend differentially to relevant marks and to locate important positions
per mark. We evaluate the model across 56 different cell types (tasks) in
human. Not only is the proposed architecture more accurate, but its attention
scores also provide a better interpretation than state-of-the-art feature
visualization methods such as saliency map.
Code and data are shared at www.deepchrome.orgComment: 12 pages; At NIPS 201
Recommended from our members
TCO, a Putative Transcriptional Regulator in Arabidopsis, Is a Target of the Protein Kinase CK2.
As multicellular organisms grow, spatial and temporal patterns of gene expression are strictly regulated to ensure that developmental programs are invoked at appropriate stages. In this work, we describe a putative transcriptional regulator in Arabidopsis, TACO LEAF (TCO), whose overexpression results in the ectopic activation of reproductive genes during vegetative growth. Isolated as an activation-tagged allele, tco-1D displays gene misexpression and phenotypic abnormalities, such as curled leaves and early flowering, characteristic of chromatin regulatory mutants. A role for TCO in this mode of transcriptional regulation is further supported by the subnuclear accumulation patterns of TCO protein and genetic interactions between tco-1D and chromatin modifier mutants. The endogenous expression pattern of TCO and gene misregulation in tco loss-of-function mutants indicate that this factor is involved in seed development. We also demonstrate that specific serine residues of TCO protein are targeted by the ubiquitous kinase CK2. Collectively, these results identify TCO as a novel regulator of gene expression whose activity is likely influenced by phosphorylation, as is the case with many chromatin regulators
Evolving methods for rational de novo design of functional RNA molecules
Artificial RNA molecules with novel functionality have many applications in
synthetic biology, pharmacy and white biotechnology. The de novo design of such
devices using computational methods and prediction tools is a
resource-efficient alternative to experimental screening and selection
pipelines. In this review, we describe methods common to many such
computational approaches, thoroughly dissect these methods and highlight open
questions for the individual steps. Initially, it is essential to investigate
the biological target system, the regulatory mechanism that will be exploited,
as well as the desired components in order to define design objectives.
Subsequent computational design is needed to combine the selected components
and to obtain novel functionality. This process can usually be split into
constrained sequence sampling, the formulation of an optimization problem and
an in silico analysis to narrow down the number of candidates with respect to
secondary goals. Finally, experimental analysis is important to check whether
the defined design objectives are indeed met in the target environment and
detailed characterization experiments should be performed to improve the
mechanistic models and detect missing design requirements.Comment: Published at METHODS, Issue title: Chemical Biology of RNA, Guest
Editor: Michael Ryckelync
Recommended from our members
Principles of dimer-specific gene regulation revealed by a comprehensive characterization of NF-κB family DNA binding.
The unique DNA-binding properties of distinct NF-κB dimers influence the selective regulation of NF-κB target genes. To more thoroughly investigate these dimer-specific differences, we combined protein-binding microarrays and surface plasmon resonance to evaluate DNA sites recognized by eight different NF-κB dimers. We observed three distinct binding-specificity classes and clarified mechanisms by which dimers might regulate distinct sets of genes. We identified many new nontraditional NF-κB binding site (κB site) sequences and highlight the plasticity of NF-κB dimers in recognizing κB sites with a single consensus half-site. This study provides a database that can be used in efforts to identify NF-κB target sites and uncover gene regulatory circuitry
- …