38,224 research outputs found

    Finding gene regulatory network candidates using the gene expression knowledge base

    Get PDF
    BACKGROUND: Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of ‘omics’ data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. RESULTS: We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. CONCLUSIONS: Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0386-y) contains supplementary material, which is available to authorized users

    Gene expression in large pedigrees: analytic approaches.

    Get PDF
    BackgroundWe currently have the ability to quantify transcript abundance of messenger RNA (mRNA), genome-wide, using microarray technologies. Analyzing genotype, phenotype and expression data from 20 pedigrees, the members of our Genetic Analysis Workshop (GAW) 19 gene expression group published 9 papers, tackling some timely and important problems and questions. To study the complexity and interrelationships of genetics and gene expression, we used established statistical tools, developed newer statistical tools, and developed and applied extensions to these tools.MethodsTo study gene expression correlations in the pedigree members (without incorporating genotype or trait data into the analysis), 2 papers used principal components analysis, weighted gene coexpression network analysis, meta-analyses, gene enrichment analyses, and linear mixed models. To explore the relationship between genetics and gene expression, 2 papers studied expression quantitative trait locus allelic heterogeneity through conditional association analyses, and epistasis through interaction analyses. A third paper assessed the feasibility of applying allele-specific binding to filter potential regulatory single-nucleotide polymorphisms (SNPs). Analytic approaches included linear mixed models based on measured genotypes in pedigrees, permutation tests, and covariance kernels. To incorporate both genotype and phenotype data with gene expression, 4 groups employed linear mixed models, nonparametric weighted U statistics, structural equation modeling, Bayesian unified frameworks, and multiple regression.Results and discussionRegarding the analysis of pedigree data, we found that gene expression is familial, indicating that at least 1 factor for pedigree membership or multiple factors for the degree of relationship should be included in analyses, and we developed a method to adjust for familiality prior to conducting weighted co-expression gene network analysis. For SNP association and conditional analyses, we found FaST-LMM (Factored Spectrally Transformed Linear Mixed Model) and SOLAR-MGA (Sequential Oligogenic Linkage Analysis Routines -Major Gene Analysis) have similar type 1 and type 2 errors and can be used almost interchangeably. To improve the power and precision of association tests, prior knowledge of DNase-I hypersensitivity sites or other relevant biological annotations can be incorporated into the analyses. On a biological level, eQTL (expression quantitative trait loci) are genetically complex, exhibiting both allelic heterogeneity and epistasis. Including both genotype and phenotype data together with measurements of gene expression was found to be generally advantageous in terms of generating improved levels of significance and in providing more interpretable biological models.ConclusionsPedigrees can be used to conduct analyses of and enhance gene expression studies

    Cross-Species Network Analysis Uncovers Conserved Nitrogen-Regulated Network Modules in Rice

    Get PDF
    In this study, we used a cross-species network approach to uncover nitrogen-regulated network modules conserved across a model and a crop species. By translating gene “network knowledge” from the data-rich model Arabidopsis (Arabidopsis thaliana) to a crop (Oryza sativa), we identified evolutionarily conserved N-regulatory modules as targets for translational studies to improve N-use efficiency in transgenic plants. To uncover such conserved N-regulatory network modules, we first generated a N-regulatory network based solely on rice (O. sativa) transcriptome and gene interaction data. Next, we enhanced the “network knowledge” in the rice N-regulatory network using transcriptome and gene interaction data from Arabidopsis and new data from Arabidopsis and rice plants exposed to the same N-treatment conditions. This cross-species network analysis uncovered a set of N-regulated transcription factors (TFs) predicted to target the same genes and network modules in both species. Supernode analysis of the TFs and their targets in these conserved network modules uncovered genes directly related to nitrogen use (e.g. N-assimilation) and to other shared biological processes indirectly related to nitrogen. This cross-species network approach was validated with members of two TF families in the supernode network, bZIP-TGA and HRS1/HHO family, have recently been experimentally validated to mediate the N-response in Arabidopsis.Fil: Obertello, Mariana. University of New York; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Investigaciones en Ingeniería Genética y Biología Molecular ; ArgentinaFil: Shrivastava, Stuti. University of New York; Estados UnidosFil: Katari, Manpreet S.. University of New York; Estados UnidosFil: Coruzzi, Gloria M.. University of New York; Estados Unido

    Conservation and co-option in developmental programmes: the importance of homology relationships

    Get PDF
    One of the surprising insights gained from research in evolutionary developmental biology (evo-devo) is that increasing diversity in body plans and morphology in organisms across animal phyla are not reflected in similarly dramatic changes at the level of gene composition of their genomes. For instance, simplicity at the tissue level of organization often contrasts with a high degree of genetic complexity. Also intriguing is the observation that the coding regions of several genes of invertebrates show high sequence similarity to those in humans. This lack of change (conservation) indicates that evolutionary novelties may arise more frequently through combinatorial processes, such as changes in gene regulation and the recruitment of novel genes into existing regulatory gene networks (co-option), and less often through adaptive evolutionary processes in the coding portions of a gene. As a consequence, it is of great interest to examine whether the widespread conservation of the genetic machinery implies the same developmental function in a last common ancestor, or whether homologous genes acquired new developmental roles in structures of independent phylogenetic origin. To distinguish between these two possibilities one must refer to current concepts of phylogeny reconstruction and carefully investigate homology relationships. Particularly problematic in terms of homology decisions is the use of gene expression patterns of a given structure. In the future, research on more organisms other than the typical model systems will be required since these can provide insights that are not easily obtained from comparisons among only a few distantly related model species

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions.</p> <p>Results</p> <p>In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification.</p> <p>Conclusion</p> <p>High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.</p
    corecore