19 research outputs found
MEMOFinder: combining _de_ _novo_ motif prediction methods with a database of known motifs
*Background:* Methods for finding overrepresented sequence motifs are useful in several key areas of computational biology. They aim at detecting very weak signals responsible for biological processes requiring robust sequence identification like transcription-factor binding to DNA or docking sites in proteins. Currently, general performance of the model-based motif-finding methods is unsatisfactory; however, different methods are successful in different cases. This leads to the practical problem of combining results of different motif-finding tools, taking into account current knowledge collected in motif databases.
*Results:* We propose a new complete service allowing researchers to submit their sequences for analysis by four different motif-finding methods for clustering and comparison with a reference motif database. It is tailored for regulatory motif detection, however it allows for substantial amount of configuration regarding sequence background, motif database and parameters for motif-finding methods.
*Availability:* The method is available online as a webserver at: http://bioputer.mimuw.edu.pl/software/mmf/. In addition, the source code is released on a GNU General Public License
Challenges for modeling global gene regulatory networks during development: Insights from Drosophila
AbstractDevelopment is regulated by dynamic patterns of gene expression, which are orchestrated through the action of complex gene regulatory networks (GRNs). Substantial progress has been made in modeling transcriptional regulation in recent years, including qualitative “coarse-grain” models operating at the gene level to very “fine-grain” quantitative models operating at the biophysical “transcription factor-DNA level”. Recent advances in genome-wide studies have revealed an enormous increase in the size and complexity or GRNs. Even relatively simple developmental processes can involve hundreds of regulatory molecules, with extensive interconnectivity and cooperative regulation. This leads to an explosion in the number of regulatory functions, effectively impeding Boolean-based qualitative modeling approaches. At the same time, the lack of information on the biophysical properties for the majority of transcription factors within a global network restricts quantitative approaches. In this review, we explore the current challenges in moving from modeling medium scale well-characterized networks to more poorly characterized global networks. We suggest to integrate coarse- and find-grain approaches to model gene regulatory networks in cis. We focus on two very well-studied examples from Drosophila, which likely represent typical developmental regulatory modules across metazoans
RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes
Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single
experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in
the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and
the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely
related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach
applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a
modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference
sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its
implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software
Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs
<p>Abstract</p> <p>Background</p> <p>Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult.</p> <p>Results</p> <p>We develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms.</p> <p>Conclusion</p> <p>We show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.</p
Arabidopsis SWI/SNF chromatin remodeling complex binds both promoters and terminators to regulate gene expression
ATP-dependent chromatin remodeling complexes
are important regulators of gene expression in Eukaryotes.
In plants, SWI/SNF-type complexes have
been shown critical for transcriptional control of
key developmental processes, growth and stress responses.
To gain insight into mechanisms underlying
these roles, we performed whole genome mapping
of the SWI/SNF catalytic subunit BRM in Arabidopsis
thaliana, combined with transcript profiling
experiments. Our data showthatBRM occupies thousands
of sites in Arabidopsis genome, most of which
located within or close to genes. Among identified direct
BRM transcriptional targets almost equal numbers
were up- and downregulated upon BRM depletion,
suggesting that BRM can act as both activator
and repressor of gene expression. Interestingly,
in addition to genes showing canonical pattern of
BRM enrichment near transcription start site, many
other genes showed a transcription termination sitecentred
BRM occupancy profile. We found that BRMbound
3ďż˝ gene regions have promoter-like features,
including presence of TATA boxes and high H3K4me3
levels, and possess high antisense transcriptional
activity which is subjected to both activation and
repression by SWI/SNF complex. Our data suggest
that binding to gene terminators and controlling transcription
of non-coding RNAs is another way through
which SWI/SNF complex regulates expression of its
targets
Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data
Additional file 1 of Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers
Supplementary Tables and Figures. This file contains additional tables and figures, such as table of datasets used for training, feature importance table and predictions for recently added VISTA sequences. (PDF 236 kb