21,795 research outputs found
A multiple-instance scoring method to predict tissue-specific cis-regulatory motifs and regions
Transcription is the central process of gene regulation. In higher eukaryotes, the transcription of a gene is usually regulated by multiple cis-regulatory regions (CRRs). In different tissues, different transcription factors bind to their cis-regulatory motifs in these CRRs to drive tissue-specific expression patterns of their target genes. By combining the genome-wide gene expression data with the genomic sequence data, we proposed multiple-instance scoring (MIS) method to predict the tissue-specific motifs and the corresponding CRRs. The method is mainly based on the assumption that only a subset of CRRs of the expressed gene should function in the studied tissue. By testing on the simulated datasets and the fly muscle dataset, MIS can identify true motifs when noise is high and shows higher specificity for predicting the tissue-specific functions of CRRs
TF2Network : predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information
A gene regulatory network (GRN) is a collection of regulatory interactions between transcription factors (TFs) and their target genes. GRNs control different biological processes and have been instrumental to understand the organization and complexity of gene regulation. Although various experimental methods have been used to map GRNs in Arabidop-sis thaliana, their limited throughput combined with the large number of TFs makes that for many genes our knowledge about regulating TFs is incomplete. We introduce TF2Network, a tool that exploits the vast amount of TF binding site information and enables the delineation of GRNs by detecting potential regulators for a set of co-expressed or functionally related genes. Validation using two experimental benchmarks reveals that TF2Network predicts the correct regulator in 75-92% of the test sets. Furthermore, our tool is robust to noise in the input gene sets, has a low false discovery rate, and shows a better performance to recover correct regulators compared to other plant tools. TF2Network is accessible through a web interface where GRNs are interactively visualized and annotated with various types of experimental functional information. TF2Network was used to perform systematic functional and regulatory gene annotations, identifying new TFs involved in circadian rhythm and stress response
Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining
Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog
A sparse regulatory network of copy-number driven expression reveals putative breast cancer oncogenes
The influence of DNA cis-regulatory elements on a gene's expression has been
intensively studied. However, little is known about expressions driven by
trans-acting DNA hotspots. DNA hotspots harboring copy number aberrations are
recognized to be important in cancer as they influence multiple genes on a
global scale. The challenge in detecting trans-effects is mainly due to the
computational difficulty in detecting weak and sparse trans-acting signals
amidst co-occuring passenger events. We propose an integrative approach to
learn a sparse interaction network of DNA copy-number regions with their
downstream targets in a breast cancer dataset. Information from this network
helps distinguish copy-number driven from copy-number independent expression
changes on a global scale. Our result further delineates cis- and trans-effects
in a breast cancer dataset, for which important oncogenes such as ESR1 and
ERBB2 appear to be highly copy-number dependent. Further, our model is shown to
be efficient and in terms of goodness of fit no worse than other state-of the
art predictors and network reconstruction models using both simulated and real
data.Comment: Accepted at IEEE International Conference on Bioinformatics &
Biomedicine (BIBM 2010
Recommended from our members
The Expanding Landscape of Alternative Splicing Variation in Human Populations.
Alternative splicing is a tightly regulated biological process by which the number of gene products for any given gene can be greatly expanded. Genomic variants in splicing regulatory sequences can disrupt splicing and cause disease. Recent developments in sequencing technologies and computational biology have allowed researchers to investigate alternative splicing at an unprecedented scale and resolution. Population-scale transcriptome studies have revealed many naturally occurring genetic variants that modulate alternative splicing and consequently influence phenotypic variability and disease susceptibility in human populations. Innovations in experimental and computational tools such as massively parallel reporter assays and deep learning have enabled the rapid screening of genomic variants for their causal impacts on splicing. In this review, we describe technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing. We summarize major findings from population transcriptomic studies of alternative splicing and discuss the implications of these findings for human genetics and medicine
A regulatory code for neurogenic gene expression in the Drosophila embryo
Bioinformatics methods have identified enhancers that mediate restricted expression in the Drosophila embryo. However, only a small fraction of the predicted enhancers actually work when tested in vivo. In the present study, co-regulated neurogenic enhancers that are activated by intermediate levels of the Dorsal regulatory gradient are shown to contain several shared sequence motifs. These motifs permitted the identification of new neurogenic enhancers with high precision: five out of seven predicted enhancers direct restricted expression within ventral regions of the neurogenic ectoderm. Mutations in some of the shared motifs disrupt enhancer function, and evidence is presented that the Twist and Su(H) regulatory proteins are essential for the specification of the ventral neurogenic ectoderm prior to gastrulation. The regulatory model of neurogenic gene expression defined in this study permitted the identification of a neurogenic enhancer in the distant Anopheles genome. We discuss the prospects for deciphering regulatory codes that link primary DNA sequence information with predicted patterns of gene expression
Recommended from our members
Evolution of substrate-specific gene expression and RNA editing in brown rot wood-decaying fungi.
Fungi that decay wood have characteristic associations with certain tree species, but the mechanistic bases for these associations are poorly understood. We studied substrate-specific gene expression and RNA editing in six species of wood-decaying fungi from the 'Antrodia clade' (Polyporales, Agaricomycetes) on three different wood substrates (pine, spruce, and aspen) in submerged cultures. We identified dozens to hundreds of substrate-biased genes (i.e., genes that are significantly upregulated in one substrate relative to the other two substrates) in each species, and these biased genes are correlated with their host ranges. Evolution of substrate-biased genes is associated with gene family expansion, gain and loss of genes, and variation in cis- and trans- regulatory elements, rather than changes in protein coding sequences. We also demonstrated widespread RNA editing events in the Antrodia clade, which differ from those observed in the Ascomycota in their distribution, substitution types, and the genomic environment. Moreover, we found that substrates could affect editing positions and frequency, including editing events occurring in mRNA transcribed from wood-decay-related genes. This work shows the extent to which gene expression and RNA editing differ among species and substrates, and provides clues into mechanisms by which wood-decaying fungi may adapt to different hosts
Application of regulatory sequence analysis and metabolic network analysis to the interpretation of gene expression data
We present two complementary approaches for the interpretation of clusters of
co-regulated genes, such as those obtained from DNA chips and related methods.
Starting from a cluster of genes with similar expression profiles, two basic
questions can be asked:
1. Which mechanism is responsible for the coordinated transcriptional response
of the genes? This question is approached by extracting motifs that are shared
between the upstream sequences of these genes. The motifs extracted are putative
cis-acting regulatory elements.
2. What is the physiological meaning for the cell to express together these
genes? One way to answer the question is to search for potential metabolic
pathways that could be catalyzed by the products of the genes. This can be
done by selecting the genes from the cluster that code for enzymes, and trying
to assemble the catalyzed reactions to form metabolic pathways.
We present tools to answer these two questions, and we illustrate their use with
selected examples in the yeast Saccharomyces cerevisiae. The tools are available
on the web (http://ucmb.ulb.ac.be/bioinformatics/rsa-tools/;
http://www.ebi.ac.uk/research/pfbp/; http://www.soi.city.ac.uk/~msch/)
- …