Search CORE

23 research outputs found

Bayesian variable selection and data integration for biological regulatory networks

Author: Chen Guang
Jensen Shane T.
Stoeckert Jr, Christian J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three data types in a principled variable selection framework. The gene expression data are modeled as a function of the unknown gene regulatory network which has an informed prior distribution based upon both ChIP binding and promoter sequence data. We also present a variable weighting methodology for the principled balancing of multiple sources of prior information. We apply our procedure to the discovery of gene regulatory relationships in Saccharomyces cerevisiae (Yeast) for which we can use several external sources of information to validate our results. Our inferred relationships show greater biological relevance on the external validation measures than previous data integration methods. Our model also estimates synergistic and antagonistic interactions between transcription factors, many of which are validated by previous studies. We also evaluate the results from our procedure for the weighting for multiple sources of prior information. Finally, we discuss our methodology in the context of previous approaches to data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Annotation-based meta-analysis of microarray experiments

Author: Christian J. Stoeckert Jr.
Elisabetta Manduchi
Jie Zheng
Junmin Liu
Publication venue
Publication date: 06/08/2009
Field of study

We are developing software applications to perform meta-analysis of microarray experiments based on standardized experiment annotations aiming to identify similar experiments and cluster experiments. The applications were tested on files obtained from the ArrayExpress public repository. Annotation terms were used to compute experiment dissimilarities to find experiments related to a query experiment. These applications may motivate efforts of bench biologists to better annotate experiments

Crossref

Nature Precedings

Functional genomics of the beta-cell: short-chain 3-hydroxyacyl-coenzyme A dehydrogenase regulates insulin secretion independent of K+ currents

Author: Becker Thomas C.
Doliba Nicolai M.
Gupta Rana K.
Hardy Olga T.
Hohmeier Hans E.
Kaestner Klaus .
Manduchi Elisabetta
Matschinsky Franz M.
Newgard Christopher B.
Stoeckert Christian J., Jr.
White Peter
Publication venue: eScholarship@UMassChan
Publication date: 23/03/2007
Field of study

Recent advances in functional genomics afford the opportunity to interrogate the expression profiles of thousands of genes simultaneously and examine the function of these genes in a high-throughput manner. In this study, we describe a rational and efficient approach to identifying novel regulators of insulin secretion by the pancreatic beta-cell. Computational analysis of expression profiles of several mouse and cellular models of impaired insulin secretion identified 373 candidate genes involved in regulation of insulin secretion. Using RNA interference, we assessed the requirements of 10 of these candidates and identified four genes (40%) as being essential for normal insulin secretion. Among the genes identified was Hadhsc, which encodes short-chain 3-hydroxyacyl-coenzyme A dehydrogenase (SCHAD), an enzyme of mitochondrial beta-oxidation of fatty acids whose mutation results in congenital hyperinsulinism. RNA interference-mediated gene suppression of Hadhsc in insulinoma cells and primary rodent islets revealed enhanced basal but normal glucose-stimulated insulin secretion. This increase in basal insulin secretion was not attenuated by the opening of the KATP channel with diazoxide, suggesting that SCHAD regulates insulin secretion through a KATP channel-independent mechanism. Our results suggest a molecular explanation for the hyperinsulinemia hypoglycemic seen in patients with SCHAD deficiency

eScholarship@UMMS

FungiDB: An Integrated Bioinformatic Resource for Fungi and Oomycetes

Author: Aurrecoechea Cristina
Basenko Evelina Y
Crouch Kathryn
Harb Omar S
Hertz-Fowler Christiane
Jr Stoeckert Christian J
Kissinger Jessica C
Pulman Jane A
Roos David S
Shanmugasundram Achchuthan
Starns David
Warrenfeltz Susanne
Publication venue: 'MDPI AG'
Publication date: 01/03/2018
Field of study

FungiDB (fungidb.org) is a free online resource for data mining and functional genomics analysis for fungal and oomycete species. FungiDB is part of the Eukaryotic Pathogen Genomics Database Resource (EuPathDB, eupathdb.org) platform that integrates genomic, transcriptomic, proteomic, and phenotypic datasets, and other types of data for pathogenic and non-pathogenic, free-living and parasitic organisms. FungiDB is one of the largest EuPathDB databases containing nearly 100 genomes obtained from GenBank, AspGD, The Broad Institute, JGI, Ensembl, and other sources. FungiDB offers a user-friendly web interface with embedded bioinformatics tools that support custom in silico experiments that leverage FungiDB-integrated data. In addition, a Galaxy-based workspace enables users to generate custom pipelines for large scale data analysis (e.g. RNA-Seq, variant calling, etc.). This review provides an introduction to the FungiDB resources and focuses on available features, tools and queries and how they can be used to mine data across a diverse range of integrated FungiDB datasets and records

Multidisciplinary Digital Publishing Institute

University of Liverpool Repository

Crossref

Directory of Open Access Journals

Enlighten

EuPathDB: the eukaryotic pathogen genomics database resource

Author: Aurrecoechea Cristina
Barreto Ana
Basenko Evelina Y
Brestelli John
Brunk Brian P
Cade Shon
Crouch Kathryn
Doherty Ryan
Falke Dave
Fischer Steve
Gajria Bindu
Harb Omar S
Heiges Mark
Hertz-Fowler Christiane
Hu Sufen
Iodice John
Jr Stoeckert Christian J
Kissinger Jessica C
Lawrence Cris
Li Wei
Pinney Deborah F
Pulman Jane A
Roos David S
Shanmugasundram Achchuthan
Silva-Franco Fatima
Spruill Drew
Steinbiss Sascha
Wang Haiming
Warrenfeltz Susanne
Zheng Jie
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/11/2016
Field of study

The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions

University of Liverpool Repository

Crossref

PubMed Central

Enlighten

Gene discovery in the Apicomplexa as revealed by EST sequencing and assembly of a comparative gene database. Genome Res

Author: Brian P Brunk
Carmen Diaz
Christian J Stoeckert Jr
Daniel K Howe
David S Roos
Deana Pape
Emily A Johnson
Jay A Radke
Jennifer Anderson
Jessica C Kissinger
John Martin
Keliang Tang
L David Sibley
Li Li
Maria E Jerome
Michael White
Mike Dante
Paul Liberator
Robert H Cole
Robert H Waterston
Sandra W Clifton
Steven J Fogarty
Todd Wylie
Publication venue
Publication date: 01/01/2003
Field of study

Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%-20% represent putative homologs with a conservative cutoff of p < 10 −9 , thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets

CiteSeerX

EuPathDB: the eukaryotic pathogen database

Author: Alan Gingle
Ana Barreto
Bindu Gajria
Brian P Brunk
Brian Pitts
Christian J Stoeckert Jr
Cristina Aurrecoechea
David S Roos
Deborah F Pinney
Eileen T Kraemer
Ganesh Srinivasamoorthy
Greg Grant
Haiming Wang
Jessica C Kissinger
John Brestelli
John Iodice
Mark Heiges
Omar S Harb
Ryan Doherty
Shon Cade
Steve Fischer
Sufen Hu
Susanne Warrenfeltz
Wei Li
Xin Gao
Publication venue
Publication date: 01/01/2013
Field of study

ABSTRACT EuPathDB (http://eupathdb.org) resources include 11 databases supporting eukaryotic pathogen genomic and functional genomic data, isolate data and phylogenomics. EuPathDB resources are built using the same infrastructure and provide a sophisticated search strategy system enabling complex interrogations of underlying data. Recent advances in EuPathDB resources include the design and implementation of a new data loading workflow, a new database supporting Piroplasmida (i.e. Babesia and Theileria), the addition of large amounts of new data and data types and the incorporation of new analysis tools. New data include genome sequences and annotation, strand-specific RNA-seq data, splice junction predictions (based on RNAseq), phosphoproteomic data, high-throughput phenotyping data, single nucleotide polymorphism data based on high-throughput sequencing (HTS) and expression quantitative trait loci data. New analysis tools enable users to search for DNA motifs and define genes based on their genomic colocation, view results from searches graphically (i.e. genes mapped to chromosomes or isolates displayed on a map) and analyze data from columns in result tables (word cloud and histogram summaries of column content). The manuscript herein describes updates to EuPathDB since the previous report published in NAR in 2010

CiteSeerX

Comparative and Functional Genomics Conference Review The MGED ontology: a framework for describing functional genomics experiments

Author: Christian J Stoeckert Jr
Christian J Stoeckert Jr
Comp Funct
Helen Parkinson
Publication venue
Publication date: 01/01/2003
Field of study

Abstract The Microarray Gene Expression Data (MGED) society was formed with an initial focus on experiments involving microarray technology. Despite the diversity of applications, there are common concepts used and a common need to capture experimental information in a standardized manner. In building the MGED ontology, it was recognized that it would be impractical to cover all the different types of experiments on all the different types of organisms by listing and defining all the types of organisms and their properties. Our solution was to create a framework for describing microarray experiments with an initial focus on the biological sample and its manipulation. For concepts that are common for many species, we could provide a manageable listing of controlled terms. For concepts that are species-specific or whose values cannot be readily listed, we created an 'OntologyEntry' concept that referenced an external resource. The MGED ontology is a work in progress that needs additional instances and particularly needs constraints to be added. The ontology currently covers the experimental sample and design, and we have begun capturing aspects of the microarrays themselves as well. The primary application of the ontology will be to develop forms for entering information into databases, and consequently allowing queries, taking advantage of the structure provided by the ontology. The application of an ontology of experimental conditions extends beyond microarray experiments and, as the scope of MGED includes other aspects of functional genomics, so too will the MGED ontology

CiteSeerX

EpoDB: a database of genes expressed during vertebrate erythropoiesis.

Author: Brian Brunk
Christian J. Stoeckert
Fidel Salas
G. Christian Overton
Jr.
Juergen Haas
Publication venue
Publication date: 01/01/1998
Field of study

EpoDB is a database designed for the study of gene regulation during differentiation and development of vertebrate red blood cells. In building EpoDB, we have taken the in advance approach to the data integration problem: we have extracted data relevant to red blood cells from GenBank, SWISS-PROT, TRRD (transcriptional regulation data) and GERD (expression levels data) to create a single integrated, highly curated view. Tools have been developed to automate data extraction from online resources, cleanse data of errors, enter information manually from the primary literature, generate a uniform, canonical representation of information and maintain data currency. The database is organized around biological features, e.g., genes, rather than sequences, which are supported by a controlled and consistent vocabulary for gene names and gene family names. Beyond the standard database queries, the functionality of EpoDB includes the ability to extract features and subsequences, display sequences and features graphically using bioWidget viewers and integrated analysis tools. EpoDB may be accessed at: http://cbil.humgen.upenn.edu/epodb

CiteSeerX

PubMed Central