4,193 research outputs found
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
Formation of regulatory modules by local sequence duplication
Turnover of regulatory sequence and function is an important part of
molecular evolution. But what are the modes of sequence evolution leading to
rapid formation and loss of regulatory sites? Here, we show that a large
fraction of neighboring transcription factor binding sites in the fly genome
have formed from a common sequence origin by local duplications. This mode of
evolution is found to produce regulatory information: duplications can seed new
sites in the neighborhood of existing sites. Duplicate seeds evolve
subsequently by point mutations, often towards binding a different factor than
their ancestral neighbor sites. These results are based on a statistical
analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome,
and a comparison set of intergenic regulatory sequence in Saccharomyces
cerevisiae. In fly regulatory modules, pairs of binding sites show
significantly enhanced sequence similarity up to distances of about 50 bp. We
analyze these data in terms of an evolutionary model with two distinct modes of
site formation: (i) evolution from independent sequence origin and (ii)
divergent evolution following duplication of a common ancestor sequence. Our
results suggest that pervasive formation of binding sites by local sequence
duplications distinguishes the complex regulatory architecture of higher
eukaryotes from the simpler architecture of unicellular organisms
Identifying Regulators from Multiple Types of Biological Data in Cancer
Cancer genomes accumulate alterations that promote cancer cell proliferation and survival. Structural, genetic and epigenetic alterations that have a selective advantage for tumorigenesis affect key regulatory genes and microRNAs that in turn regulate the expression of many target genes. The goal of this dissertation is to leverage the alteration-rich landscape of cancer genomes to detect key regulatory genes and microRNAs. To this end, we designed a feature selection algorithm to identify DNA methylation signals around a gene that would highly predict its expression. We found that genes whose expression could be predicted by DNA methylation accurately were enriched in Gene Ontology terms related to the regulation of various biological processes. This suggests that genes controlled by DNA methylation are regulatory genes. We also developed two tools that infer relationships between regulatory genes and target genes leveraging structural and epigenetic data. The first tool, ProcessDriver integrates copy number alteration and gene expression datasets to identify copy number cancer driver genes, target genes of these drivers and the disrupted biological processes. Our results showed that driver genes selected by ProcessDriver are enriched in known cancer genes. Using survival analysis, we showed that drivers are linked to new tumor events after initial treatment. The second tool was developed to leverage structural and epigenetic data to infer interactions between regulatory genes and targets on a network-level. Our canonical correlation analysis-based approach utilized the DNA methylation or copy number states of potential regulators and the expression states of potential targets to score regulatory interactions. We then incorporated these regulatory interaction scores as prior knowledge in a dynamic Bayesian framework utilizing time series gene expression data. Our results indicated that the canonical correlation analysis-based scores reflect the true interactions between genes with high accuracy, and the accuracy can be further increased by using the scores as a prior in the dynamic Bayesian framework. Finally, we are developing an algorithm to detect cancer-related microRNAs, associated targets and disrupted biological processes. Our preliminary results suggest that the modules of miRNAs and target genes identified in this approach are enriched in known microRNA-gene interactions
Transcriptome-based Gene Networks for Systems-level Analysis of Plant Gene Functions
Present day genomic technologies are evolving at an unprecedented rate, allowing interrogation of
cellular activities with increasing breadth and depth. However, we know very little about how the
genome functions and what the identified genes do. The lack of functional annotations of genes
greatly limits the post-analytical interpretation of new high throughput genomic datasets. For plant
biologists, the problem is much severe. Less than 50% of all the identified genes in the model plant
Arabidopsis thaliana, and only about 20% of all genes in the crop model Oryza sativa have some
aspects of their functions assigned. Therefore, there is an urgent need to develop innovative
methods to predict and expand on the currently available functional annotations of plant genes.
With open-access catching the ‘pulse’ of modern day molecular research, an integration of the
copious amount of transcriptome datasets allows rapid prediction of gene functions in specific
biological contexts, which provide added evidence over traditional homology-based functional
inference. The main goal of this dissertation was to develop data analysis strategies and tools
broadly applicable in systems biology research.
Two user friendly interactive web applications are presented: The Rice Regulatory
Network (RRN) captures an abiotic-stress conditioned gene regulatory network designed to
facilitate the identification of transcription factor targets during induction of various environmental
stresses. The Arabidopsis Seed Active Network (SANe) is a transcriptional regulatory network
that encapsulates various aspects of seed formation, including embryogenesis, endosperm
development and seed-coat formation. Further, an edge-set enrichment analysis algorithm is
proposed that uses network density as a parameter to estimate the gain or loss in correlation of
pathways between two conditionally independent coexpression networks
Cis-cop: Multiobjective identification of cis-regulatory modules based on constrains
Gene expression regulation is an intricate,
dynamic phenomenon essential for all biolog ical functions. The necessary instructions for
gen expression are encoded in cis-regulatory
elements that work together and interact
with the RNA polymerase to confer specific
spatial and temporal patterns of transcrip tion. Therefore, the identification of these el ements is currently an active area of research
in computational analysis of regulatory se quences. However, the problem is difficult
since the combinatorial interactions between
the regulating factors can be very complex.
Here we present a web server, Cis-cop, that
identifies cis-regulatory modules given a set
of transcription factor binding sites and, ad ditionally, also RNA pol sites for a group of
genes
- …