188 research outputs found
Identification of candidate regulatory sequences in mammalian 3' UTRs by statistical analysis of oligonucleotide distributions
3' untranslated regions (3' UTRs) contain binding sites for many regulatory
elements, and in particular for microRNAs (miRNAs). The importance of
miRNA-mediated post-transcriptional regulation has become increasingly clear in
the last few years.
We propose two complementary approaches to the statistical analysis of
oligonucleotide frequencies in mammalian 3' UTRs aimed at the identification of
candidate binding sites for regulatory elements. The first method is based on
the identification of sets of genes characterized by evolutionarily conserved
overrepresentation of an oligonucleotide. The second method is based on the
identification of oligonucleotides showing statistically significant strand
asymmetry in their distribution in 3' UTRs.
Both methods are able to identify many previously known binding sites located
in 3'UTRs, and in particular seed regions of known miRNAs. Many new candidates
are proposed for experimental verification.Comment: Added two reference
Additive Functions in Boolean Models of Gene Regulatory Network Modules
Gene-on-gene regulations are key components of every living organism. Dynamical abstract models of genetic regulatory networks help explain the genome’s evolvability and robustness. These properties can be attributed to the structural topology of the graph formed by genes, as vertices, and regulatory interactions, as edges. Moreover, the actual gene interaction of each gene is believed to play a key role in the stability of the structure. With advances in biology, some effort was deployed to develop update functions in Boolean models that include recent knowledge. We combine real-life gene interaction networks with novel update functions in a Boolean model. We use two sub-networks of biological organisms, the yeast cell-cycle and the mouse embryonic stem cell, as topological support for our system. On these structures, we substitute the original random update functions by a novel threshold-based dynamic function in which the promoting and repressing effect of each interaction is considered. We use a third real-life regulatory network, along with its inferred Boolean update functions to validate the proposed update function. Results of this validation hint to increased biological plausibility of the threshold-based function. To investigate the dynamical behavior of this new model, we visualized the phase transition between order and chaos into the critical regime using Derrida plots. We complement the qualitative nature of Derrida plots with an alternative measure, the criticality distance, that also allows to discriminate between regimes in a quantitative way. Simulation on both real-life genetic regulatory networks show that there exists a set of parameters that allows the systems to operate in the critical region. This new model includes experimentally derived biological information and recent discoveries, which makes it potentially useful to guide experimental research. The update function confers additional realism to the model, while reducing the complexity and solution space, thus making it easier to investigate
Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrepresented upstream motifs
BACKGROUND: Transcriptional regulation is a key mechanism in the functioning
of the cell, and is mostly effected through transcription factors binding to
specific recognition motifs located upstream of the coding region of the
regulated gene. The computational identification of such motifs is made easier
by the fact that they often appear several times in the upstream region of the
regulated genes, so that the number of occurrences of relevant motifs is often
significantly larger than expected by pure chance. RESULTS: To exploit this
fact, we construct sets of genes characterized by the statistical
overrepresentation of a certain motif in their upstream regions. Then we study
the functional characterization of these sets by analyzing their annotation to
Gene Ontology terms. For the sets showing a statistically significant specific
functional characterization, we conjecture that the upstream motif
characterizing the set is a binding site for a transcription factor involved in
the regulation of the genes in the set. CONCLUSIONS: The method we propose is
able to identify many known binding sites in S. cerevisiae and new candidate
targets of regulation by known transcription factors. Its application to less
well studied organisms is likely to be valuable in the exploration of their
regulatory interaction network.Comment: 19 pages, 1 figure. Published version with several improvements.
Supplementary material available from the author
CLOE: Identification of putative functional relationships among genes by comparison of expression profiles between two species
BACKGROUND: Public repositories of microarray data contain an incredible amount of information that is potentially relevant to explore functional relationships among genes by meta-analysis of expression profiles. However, the widespread use of this resource by the scientific community is at the moment limited by the limited availability of effective tools of analysis. We here describe CLOE, a simple cDNA microarray data mining strategy based on meta-analysis of datasets from pairs of species. The method consists in ranking EST probes in the datasets of the two species according to the similarity of their expression profiles with that of two EST probes from orthologous genes, and extracting orthologous EST pairs from a given top interval of the ranked lists. The Gene Ontology annotation of the obtained candidate partners is then analyzed for keywords overrepresentation. RESULTS: We demonstrate the capabilities of the approach by testing its predictive power on three proteomically-defined mammalian protein complexes, in comparison with single and multiple species meta-analysis approaches. Our results show that CLOE can find candidate partners for a greater number of genes, if compared to multiple species co-expression analysis, but retains a comparable specificity even when applied to species as close as mouse and human. On the other hand, it is much more specific than single organisms co-expression analysis, strongly reducing the number of potential candidate partners for a given gene of interest. CONCLUSIONS: CLOE represents a simple and effective data mining approach that can be easily used for meta-analysis of cDNA microarray experiments characterized by very heterogeneous coverage. Importantly, it produces for genes of interest an average number of high confidence putative partners that is in the range of standard experimental validation techniques
The impact of TP53 activation and apoptosis in primary hereditary microcephaly
Autosomal recessive primary microcephaly (MCPH) is a constellation of disorders that share significant brain size reduction and mild to moderate intellectual disability, which may be accompanied by a large variety of more invalidating clinical signs. Extensive neural progenitor cells (NPC) proliferation and differentiation are essential to determine brain final size. Accordingly, the 30 MCPH loci mapped so far (MCPH1-MCPH30) encode for proteins involved in microtubule and spindle organization, centriole biogenesis, nuclear envelope, DNA replication and repair, underscoring that a wide variety of cellular processes is required for sustaining NPC expansion during development. Current models propose that altered balance between symmetric and asymmetric division, as well as premature differentiation, are the main mechanisms leading to MCPH. Although studies of cellular alterations in microcephaly models have constantly shown the co-existence of high DNA damage and apoptosis levels, these mechanisms are less considered as primary factors. In this review we highlight how the molecular and cellular events produced by mutation of the majority of MCPH genes may converge on apoptotic death of NPCs and neurons, via TP53 activation. We propose that these mechanisms should be more carefully considered in the alterations of the sophisticated equilibrium between proliferation, differentiation and death produced by MCPH gene mutations. In consideration of the potential druggability of cell apoptotic pathways, a better understanding of their role in MCPH may significantly facilitate the development of translational approaches
Disease-gene discovery by integration of 3D gene expression and transcription factor binding affinities
Abstract
Motivation: The computational evaluation of candidate genes for hereditary disorders is a non-trivial task. Several excellent methods for disease-gene prediction have been developed in the past 2 decades, exploiting widely differing data sources to infer disease-relevant functional relationships between candidate genes and disorders. We have shown recently that spatially mapped, i.e. 3D, gene expression data from the mouse brain can be successfully used to prioritize candidate genes for human Mendelian disorders of the central nervous system.
Results: We improved our previous work 2-fold: (i) we demonstrate that condition-independent transcription factor binding affinities of the candidate genes' promoters are relevant for disease-gene prediction and can be integrated with our previous approach to significantly enhance its predictive power; and (ii) we define a novel similarity measure—termed Relative Intensity Overlap—for both 3D gene expression patterns and binding affinity profiles that better exploits their disease-relevant information content. Finally, we present novel disease-gene predictions for eight loci associated with different syndromes of unknown molecular basis that are characterized by mental retardation.
Contact: [email protected] or [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online
Ab initio identification of putative human transcription factor binding sites by comparative genomics
We discuss a simple and powerful approach for the ab initio identification of
cis-regulatory motifs involved in transcriptional regulation. The method we
present integrates several elements: human-mouse comparison, statistical
analysis of genomic sequences and the concept of coregulation. We apply it to a
complete scan of the human genome. By using the catalogue of conserved upstream
sequences collected in the CORG database we construct sets of genes sharing the
same overrepresented motif (short DNA sequence) in their upstream regions both
in human and in mouse. We perform this construction for all possible motifs
from 5 to 8 nucleotides in length and then filter the resulting sets looking
for two types of evidence of coregulation: first, we analyze the Gene Ontology
annotation of the genes in the set, searching for statistically significant
common annotations; second, we analyze the expression profiles of the genes in
the set as measured by microarray experiments, searching for evidence of
coexpression. The sets which pass one or both filters are conjectured to
contain a significant fraction of coregulated genes, and the upstream motifs
characterizing the sets are thus good candidates to be the binding sites of the
TF's involved in such regulation. In this way we find various known motifs and
also some new candidate binding sites.Comment: 22 pages, 2 figures. Supplementary material available from the
author
- …