23 research outputs found

    Bayesian variable selection and data integration for biological regulatory networks

    Get PDF
    A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three data types in a principled variable selection framework. The gene expression data are modeled as a function of the unknown gene regulatory network which has an informed prior distribution based upon both ChIP binding and promoter sequence data. We also present a variable weighting methodology for the principled balancing of multiple sources of prior information. We apply our procedure to the discovery of gene regulatory relationships in Saccharomyces cerevisiae (Yeast) for which we can use several external sources of information to validate our results. Our inferred relationships show greater biological relevance on the external validation measures than previous data integration methods. Our model also estimates synergistic and antagonistic interactions between transcription factors, many of which are validated by previous studies. We also evaluate the results from our procedure for the weighting for multiple sources of prior information. Finally, we discuss our methodology in the context of previous approaches to data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Annotation-based meta-analysis of microarray experiments

    Get PDF
    We are developing software applications to perform meta-analysis of microarray experiments based on standardized experiment annotations aiming to identify similar experiments and cluster experiments. The applications were tested on files obtained from the ArrayExpress public repository. Annotation terms were used to compute experiment dissimilarities to find experiments related to a query experiment. These applications may motivate efforts of bench biologists to better annotate experiments

    Functional genomics of the beta-cell: short-chain 3-hydroxyacyl-coenzyme A dehydrogenase regulates insulin secretion independent of K+ currents

    Get PDF
    Recent advances in functional genomics afford the opportunity to interrogate the expression profiles of thousands of genes simultaneously and examine the function of these genes in a high-throughput manner. In this study, we describe a rational and efficient approach to identifying novel regulators of insulin secretion by the pancreatic beta-cell. Computational analysis of expression profiles of several mouse and cellular models of impaired insulin secretion identified 373 candidate genes involved in regulation of insulin secretion. Using RNA interference, we assessed the requirements of 10 of these candidates and identified four genes (40%) as being essential for normal insulin secretion. Among the genes identified was Hadhsc, which encodes short-chain 3-hydroxyacyl-coenzyme A dehydrogenase (SCHAD), an enzyme of mitochondrial beta-oxidation of fatty acids whose mutation results in congenital hyperinsulinism. RNA interference-mediated gene suppression of Hadhsc in insulinoma cells and primary rodent islets revealed enhanced basal but normal glucose-stimulated insulin secretion. This increase in basal insulin secretion was not attenuated by the opening of the KATP channel with diazoxide, suggesting that SCHAD regulates insulin secretion through a KATP channel-independent mechanism. Our results suggest a molecular explanation for the hyperinsulinemia hypoglycemic seen in patients with SCHAD deficiency

    FungiDB: An Integrated Bioinformatic Resource for Fungi and Oomycetes

    Get PDF
    FungiDB (fungidb.org) is a free online resource for data mining and functional genomics analysis for fungal and oomycete species. FungiDB is part of the Eukaryotic Pathogen Genomics Database Resource (EuPathDB, eupathdb.org) platform that integrates genomic, transcriptomic, proteomic, and phenotypic datasets, and other types of data for pathogenic and non-pathogenic, free-living and parasitic organisms. FungiDB is one of the largest EuPathDB databases containing nearly 100 genomes obtained from GenBank, AspGD, The Broad Institute, JGI, Ensembl, and other sources. FungiDB offers a user-friendly web interface with embedded bioinformatics tools that support custom in silico experiments that leverage FungiDB-integrated data. In addition, a Galaxy-based workspace enables users to generate custom pipelines for large scale data analysis (e.g. RNA-Seq, variant calling, etc.). This review provides an introduction to the FungiDB resources and focuses on available features, tools and queries and how they can be used to mine data across a diverse range of integrated FungiDB datasets and records

    EuPathDB: the eukaryotic pathogen genomics database resource

    Get PDF
    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions

    Gene discovery in the Apicomplexa as revealed by EST sequencing and assembly of a comparative gene database. Genome Res

    Get PDF
    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%-20% represent putative homologs with a conservative cutoff of p < 10 −9 , thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets

    EuPathDB: the eukaryotic pathogen database

    Get PDF
    ABSTRACT EuPathDB (http://eupathdb.org) resources include 11 databases supporting eukaryotic pathogen genomic and functional genomic data, isolate data and phylogenomics. EuPathDB resources are built using the same infrastructure and provide a sophisticated search strategy system enabling complex interrogations of underlying data. Recent advances in EuPathDB resources include the design and implementation of a new data loading workflow, a new database supporting Piroplasmida (i.e. Babesia and Theileria), the addition of large amounts of new data and data types and the incorporation of new analysis tools. New data include genome sequences and annotation, strand-specific RNA-seq data, splice junction predictions (based on RNAseq), phosphoproteomic data, high-throughput phenotyping data, single nucleotide polymorphism data based on high-throughput sequencing (HTS) and expression quantitative trait loci data. New analysis tools enable users to search for DNA motifs and define genes based on their genomic colocation, view results from searches graphically (i.e. genes mapped to chromosomes or isolates displayed on a map) and analyze data from columns in result tables (word cloud and histogram summaries of column content). The manuscript herein describes updates to EuPathDB since the previous report published in NAR in 2010

    Comparative and Functional Genomics Conference Review The MGED ontology: a framework for describing functional genomics experiments

    No full text
    Abstract The Microarray Gene Expression Data (MGED) society was formed with an initial focus on experiments involving microarray technology. Despite the diversity of applications, there are common concepts used and a common need to capture experimental information in a standardized manner. In building the MGED ontology, it was recognized that it would be impractical to cover all the different types of experiments on all the different types of organisms by listing and defining all the types of organisms and their properties. Our solution was to create a framework for describing microarray experiments with an initial focus on the biological sample and its manipulation. For concepts that are common for many species, we could provide a manageable listing of controlled terms. For concepts that are species-specific or whose values cannot be readily listed, we created an 'OntologyEntry' concept that referenced an external resource. The MGED ontology is a work in progress that needs additional instances and particularly needs constraints to be added. The ontology currently covers the experimental sample and design, and we have begun capturing aspects of the microarrays themselves as well. The primary application of the ontology will be to develop forms for entering information into databases, and consequently allowing queries, taking advantage of the structure provided by the ontology. The application of an ontology of experimental conditions extends beyond microarray experiments and, as the scope of MGED includes other aspects of functional genomics, so too will the MGED ontology

    EpoDB: a database of genes expressed during vertebrate erythropoiesis.

    No full text
    EpoDB is a database designed for the study of gene regulation during differentiation and development of vertebrate red blood cells. In building EpoDB, we have taken the in advance approach to the data integration problem: we have extracted data relevant to red blood cells from GenBank, SWISS-PROT, TRRD (transcriptional regulation data) and GERD (expression levels data) to create a single integrated, highly curated view. Tools have been developed to automate data extraction from online resources, cleanse data of errors, enter information manually from the primary literature, generate a uniform, canonical representation of information and maintain data currency. The database is organized around biological features, e.g., genes, rather than sequences, which are supported by a controlled and consistent vocabulary for gene names and gene family names. Beyond the standard database queries, the functionality of EpoDB includes the ability to extract features and subsequences, display sequences and features graphically using bioWidget viewers and integrated analysis tools. EpoDB may be accessed at: http://cbil.humgen.upenn.edu/epodb
    corecore