Search CORE

32 research outputs found

Implementation of GenePattern within the Stanford Microarray Database

Author: Brazma
C. A. Ball
F. Wymore
G. Sherlock
H. Jin
J. Demeter
J. Hubble
M. Mao
M. Nitzberg
T. B. K. Reddy
Z. K. Zachariah
Publication venue: Oxford University Press
Publication date
Field of study

Hundreds of researchers across the world use the Stanford Microarray Database (SMD; http://smd.stanford.edu/) to store, annotate, view, analyze and share microarray data. In addition to providing registered users at Stanford access to their own data, SMD also provides access to public data, and tools with which to analyze those data, to any public user anywhere in the world. Previously, the addition of new microarray data analysis tools to SMD has been limited by available engineering resources, and in addition, the existing suite of tools did not provide a simple way to design, execute and share analysis pipelines, or to document such pipelines for the purposes of publication. To address this, we have incorporated the GenePattern software package directly into SMD, providing access to many new analysis tools, as well as a plug-in architecture that allows users to directly integrate and share additional tools through SMD. In this article, we describe our implementation of the GenePattern microarray analysis software package into the SMD code base. This extension is available with the SMD source code that is fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD with an enriched data analysis capability

Crossref

PubMed Central

Annotare—a tool for annotating high-throughput biomedical investigations and resulting data

Author: A. Brazma
C. A. Ball
C. J. Stoeckert
E. Hastings
G. Sherlock
Gentleman
H. Parkinson
J. Liu
J. White
M. Miller
R. Shankar
R. Srinivasa
Saeed
T. Burdett
Publication venue: Oxford University Press
Publication date
Field of study

Summary: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis

Crossref

PubMed Central

Recommended from our members

Context-Specific Ontology Integration: A Bayesian Approach

Author: Alterovitz Gil
Katzin Dustin
Marwah Kshitij
Noy Natalya F.
Ramoni Marco
Zollanvari Amin
Publication venue: American Medical Informatics Association
Publication date: 21/03/2013
Field of study

We introduce a principled computational framework and methodology for automated discovery of context-specific functional links between ontologies. Our model leverages over disparate free-text literature resources to score the model of dependency linking two terms under a context against their model of independence. We identify linked terms as those having a significant bayes factor (p < 0.01). To scale our algorithm over massive ontologies, we propose a heuristic pruning technique as an efficient algorithm for inferring such links. We have applied this method to translationalize Gene Ontology to all other ontologies available at National Center of Biomedical Ontology (NCBO) BioPortal under the context of Human Disease ontology. Our results show that in addition to broadening the scope of hypothesis for researchers, our work can potentially be used to explore continuum of relationships among ontologies to guide various biological experiments

Harvard University - DASH

Effective knowledge management in translational medicine

Author: AH Bild
AI Riker
DR Rhodes
DR Rhodes
ED Perakslis
Eric D Perakslis
H Parkinson
J Gould
J Hubble
J Irgon
J Lamb
J Saltz
JC Barrett
JT Dudley
M Reich
R Chen
R Edgar
RZ Karim
S Szalma
SN Murphy
Sándor Szalma
Tatiana Khasanova
Venkata Koka
Y Benjamini
Y Nakamura
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The growing consensus that most valuable data source for biomedical discoveries is derived from human samples is clearly reflected in the growing number of translational medicine and translational sciences departments across pharma as well as academic and government supported initiatives such as Clinical and Translational Science Awards (CTSA) in the US and the Seventh Framework Programme (FP7) of EU with emphasis on translating research for human health. Methods The pharmaceutical companies of Johnson and Johnson have established translational and biomarker departments and implemented an effective knowledge management framework including building a data warehouse and the associated data mining applications. The implemented resource is built from open source systems such as i2b2 and GenePattern. Results The system has been deployed across multiple therapeutic areas within the pharmaceutical companies of Johnson and Johnsons and being used actively to integrate and mine internal and public data to support drug discovery and development decisions such as indication selection and trial design in a translational medicine setting. Our results show that the established system allows scientist to quickly re-validate hypotheses or generate new ones with the use of an intuitive graphical interface. Conclusions The implemented resource can serve as the basis of precompetitive sharing and mining of studies involving samples from human subjects thus enhancing our understanding of human biology and pathophysiology and ultimately leading to more effective treatment of diseases which represent unmet medical needs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

STARNET 2: a web-based tool for accelerating discovery of gene regulatory networks using microarray co-expression data

Author: Chen Hailin
Jupiter Daniel
VanBuren Vincent
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult. Results S<smcaps>TAR</smcaps>N<smcaps>ET</smcaps> 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. S<smcaps>TAR</smcaps>N<smcaps>ET</smcaps> 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, <it>Drosophila</it>, <it>C. elegans</it>, <it>S. cerevisiae</it>, <it>Arabidopsis </it>and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new H<smcaps>EAT</smcaps>S<smcaps>EEKER </smcaps>module. Conclusion S<smcaps>TAR</smcaps>N<smcaps>ET</smcaps> 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to compare two networks. The list of genes in a S<smcaps>TAR</smcaps>N<smcaps>ET </smcaps>network may be useful in developing a list of candidate genes to use for the inference of causal networks. The tool is freely available at <url>http://vanburenlab.medicine.tamhsc.edu/starnet2.html</url>, and does not require user registration.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Texas A&M Repository

Inferring gene ontologies from pairwise similarity data.

Author: Bafna Vineet
Dutkowski Janusz
Ideker Trey
Kramer Michael
Yu Michael
Publication venue: eScholarship, University of California
Publication date: 01/06/2014
Field of study

MotivationWhile the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference.MethodsWe consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.ResultsFor task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall).ConclusionThis study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data

PubMed Central

eScholarship - University of California

High-throughput processing and normalization of one-color microarrays for transcriptional meta-analyses

Author: A Brazma
A Brazma
A Campain
BM Bolstad
D Ghosh
DR Rhodes
DR Rhodes
F Hong
GP Srivastava
HK Lee
I Dozmorov
J Hubble
JC Newman
JD Wren
JE Larkin
Jonathan D Wren
L Shi
L Shi
M Kapushesky
M Severgnini
MG Dozmorov
Mikhail G Dozmorov
P Cahan
P Cahan
PK Tan
RA Irizarry
T Bammler
T Barrett
T Konishi
W Fujibuchi
WC Cheng
X Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Microarray experiments are becoming increasingly common in biomedical research, as is their deposition in publicly accessible repositories, such as Gene Expression Omnibus (GEO). As such, there has been a surge in interest to use this microarray data for meta-analytic approaches, whether to increase sample size for a more powerful analysis of a specific disease (e.g. lung cancer) or to re-examine experiments for reasons different than those examined in the initial, publishing study that generated them. For the average biomedical researcher, there are a number of practical barriers to conducting such meta-analyses such as manually aggregating, filtering and formatting the data. Methods to automatically process large repositories of microarray data into a standardized, directly comparable format will enable easier and more reliable access to microarray data to conduct meta-analyses. Methods We present a straightforward, simple but robust against potential outliers method for automatic quality control and pre-processing of tens of thousands of single-channel microarray data files. GEO GDS files are quality checked by comparing parametric distributions and quantile normalized to enable direct comparison of expression level for subsequent meta-analyses. Results 13,000 human 1-color experiments were processed to create a single gene expression matrix that subsets can be extracted from to conduct meta-analyses. Interestingly, we found that when conducting a global meta-analysis of gene-gene co-expression patterns across all 13,000 experiments to predict gene function, normalization had minimal improvement over using the raw data. Conclusions Normalization of microarray data appears to be of minimal importance on analyses based on co-expression patterns when the sample size is on the order of thousands microarray datasets. Smaller subsets, however, are more prone to aberrations and artefacts, and effective means of automating normalization procedures not only empowers meta-analytic approaches, but aids in reproducibility by providing a standard way of approaching the problem. Data availability: matrix containing normalized expression of 20,813 genes across 13,000 experiments is available for download at . Source code for GDS files pre-processing is available from the authors upon request.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MGEx-Udb: A Mammalian Uterus Database for Expression-Based Cataloguing of Genes across Conditions, Including Endometriosis and Cervical Cancer

Author: A Balasubramanian
A Brazma
A Jemal
A Subramanian
Akhilesh K. Bajpai
C Stark
D Warde-Farley
Darshan S. Chandrashekar
DR Rhodes
E Taylor
H Choi
H Parkinson
H Wakaguri
HW Chen
J Chen
J Hubble
K Ikeo
KK Acharya
Kshitish K. Acharya
M Ashburner
M Magrane
Mahalakshmi Dinakaran
MS Boguski
O Morozova
PT Spellman
R Klaes
RA Irizarry
S Wray
SA Ochsner
Selvarajan Ilakya
SJ Xiao
SM Agarwal
Sravanthi Davuluri
T Barrett
T Werner
TF Rayner
V Pihur
X Kong
X Liu
Zhanjiang Liu
Publication venue: Public Library of Science
Publication date: 11/05/2012
Field of study

Gene expression profiling of uterus tissue has been performed in various contexts, but a significant amount of the data remains underutilized as it is not covered by the existing general resources.). The database can be queried with gene names/IDs, sub-tissue locations, as well as various conditions such as the cervical cancer, endometrial cycles and disorders, and experimental treatments. Accordingly, the output would be a) transcribed and dormant genes listed for the queried condition/location, or b) expression profile of the gene of interest in various uterine conditions. The results also include the reliability score for the expression status of each gene. MGEx-Udb also provides information related to Gene Ontology annotations, protein-protein interactions, transcripts, promoters, and expression status by other sequencing techniques, and facilitates various other types of analysis of the individual genes or co-expressed gene clusters.In brief, MGEx-Udb enables easy cataloguing of co-expressed genes and also facilitates bio-marker discovery for various uterine conditions

Public Library of Science (PLOS)

Crossref

PubMed Central

Predicting gene ontology from a global meta-analysis of 1-color microarray experiments

Author: Dozmorov Mikhail G
Giles Cory B
Wren Jonathan D
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Global meta-analysis (GMA) of microarray data to identify genes with highly similar co-expression profiles is emerging as an accurate method to predict gene function and phenotype, even in the absence of published data on the gene(s) being analyzed. With a third of human genes still uncharacterized, this approach is a promising way to direct experiments and rapidly understand the biological roles of genes. To predict function for genes of interest, GMA relies on a guilt-by-association approach to identify sets of genes with known functions that are consistently co-expressed with it across different experimental conditions, suggesting coordinated regulation for a specific biological purpose. Our goal here is to define how sample, dataset size and ranking parameters affect prediction performance. Results 13,000 human 1-color microarrays were downloaded from GEO for GMA analysis. Prediction performance was benchmarked by calculating the distance within the Gene Ontology (GO) tree between predicted function and annotated function for sets of 100 randomly selected genes. We find the number of new predicted functions rises as more datasets are added, but begins to saturate at a sample size of approximately 2,000 experiments. For the gene set used to predict function, we find precision to be higher with smaller set sizes, yet with correspondingly poor recall and, as set size is increased, recall and F-measure also tend to increase but at the cost of precision. Conclusions Of the 20,813 genes expressed in 50 or more experiments, at least one predicted GO category was found for 72.5% of them. Of the 5,720 genes without GO annotation, 4,189 had at least one predicted ontology using top 40 co-expressed genes for prediction analysis. For the remaining 1,531 genes without GO predictions or annotations, ~17% (257 genes) had sufficient co-expression data yet no statistically significantly overrepresented ontologies, suggesting their regulation may be more complex.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EcoliWiki: a wiki-based community resource for Escherichia coli

Author: Adrienne E. Zweifel
Altschul
Amanda M. Supak
Anand Venkatraman
Ashburner
Blattner
Brenley K. McIntosh
Chanchala R. Lairikyengbam
Cho
Cho
Daniel P. Renfro
Deborah A. Siegele
Demerec
Durfee
Ferenci
Finn
Florez
Gama-Castro
Glasner
Grainger
Gwendowlyn S. Knapp
Hayashi
Hodis
Hu
Hu
Hubble
Huss
James C. Hu
Jeong
Keseler
Kiefer
Krogh
Lee
Lili Niu
Maglott
Marchler-Bauer
Medigue
Misra
Mooney
Muller
Nathan M. Liles
Nichols
Ostlund
Penkett
Pieper
Reidhaar-Olson
Rice
Rozen
Rudd
Sanchez-Romero
Sayers
Scheer
Stein
Thomas
UniProt Consortium
Vora
Publication venue: Oxford University Press
Publication date
Field of study

EcoliWiki is the community annotation component of the PortEco (http://porteco.org; formerly EcoliHub) project, an online data resource that integrates information on laboratory strains of Escherichia coli, its phages, plasmids and mobile genetic elements. As one of the early adopters of the wiki approach to model organism databases, EcoliWiki was designed to not only facilitate community-driven sharing of biological knowledge about E. coli as a model organism, but also to be interoperable with other data resources. EcoliWiki content currently covers genes from five laboratory E. coli strains, 21 bacteriophage genomes, F plasmid and eight transposons. EcoliWiki integrates the Mediawiki wiki platform with other open-source software tools and in-house software development to extend how wikis can be used for model organism databases. EcoliWiki can be accessed online at http://ecoliwiki.net

Crossref

PubMed Central