Search CORE

269 research outputs found

GenomeGraphs: integrated genomic data visualization with R.

Author: Bullard James
Dudoit Sandrine
Durinck Steffen
Spellman Paul T
Publication venue: eScholarship, University of California
Publication date: 01/01/2009
Field of study

BackgroundBiological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses.ResultsWe developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system.ConclusionGenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

The XBabelPhish MAGE-ML and XML Translator

Author: Catherine A Ball
Don Maier
Farrell Wymore
Gavin Sherlock
T Spellman Paul
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background MAGE-ML has been promoted as a standard format for describing microarray experiments and the data they produce. Two characteristics of the MAGE-ML format compromise its use as a universal standard: First, MAGE-ML files are exceptionally large – too large to be easily read by most people, and often too large to be read by most software programs. Second, the MAGE-ML standard permits many ways of representing the same information. As a result, different producers of MAGE-ML create different documents describing the same experiment and its data. Recognizing all the variants is an unwieldy software engineering task, resulting in software packages that can read and process MAGE-ML from some, but not all producers. This Tower of MAGE-ML Babel bars the unencumbered exchange of microarray experiment descriptions couched in MAGE-ML. Results We have developed XBabelPhish – an XQuery-based technology for translating one MAGE-ML variant into another. XBabelPhish's use is not restricted to translating MAGE-ML documents. It can transform XML files independent of their DTD, XML schema, or semantic content. Moreover, it is designed to work on very large (> 200 Mb.) files, which are common in the world of MAGE-ML. Conclusion XBabelPhish provides a way to inter-translate MAGE-ML variants for improved interchange of microarray experiment information. More generally, it can be used to transform most XML files, including very large ones that exceed the capacity of most XML tools.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integrating biological knowledge into variable selection : an empirical Bayes approach with an application in cancer biology

Author: Bayani Nora
Gray Joe W.
Hill Steven M. (Mark)
Kuo Wen-Lin
Mukherjee Sach
Neve Richard M.
Spellman Paul T.
Ziyad Safiyyah
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. Results: We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. Conclusions: The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

The Cell Cycle–Regulated Genes of Schizosaccharomyces pombe

Author: Adam Rosebrock
Anna Oliva
Bruce Futcher
Francisco Ferrezuelo
Haiying Chen
Janet Leatherwood
Paul T. Spellman
Saumyadipta Pyne
Steve Skiena
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

Many genes are regulated as an innate part of the eukaryotic cell cycle, and a complex transcriptional network helps enable the cyclic behavior of dividing cells. This transcriptional network has been studied in Saccharomyces cerevisiae (budding yeast) and elsewhere. To provide more perspective on these regulatory mechanisms, we have used microarrays to measure gene expression through the cell cycle of Schizosaccharomyces pombe (fission yeast). The 750 genes with the most significant oscillations were identified and analyzed. There were two broad waves of cell cycle transcription, one in early/mid G2 phase, and the other near the G2/M transition. The early/mid G2 wave included many genes involved in ribosome biogenesis, possibly explaining the cell cycle oscillation in protein synthesis in S. pombe. The G2/M wave included at least three distinctly regulated clusters of genes: one large cluster including mitosis, mitotic exit, and cell separation functions, one small cluster dedicated to DNA replication, and another small cluster dedicated to cytokinesis and division. S. pombe cell cycle genes have relatively long, complex promoters containing groups of multiple DNA sequence motifs, often of two, three, or more different kinds. Many of the genes, transcription factors, and regulatory mechanisms are conserved between S. pombe and S. cerevisiae. Finally, we found preliminary evidence for a nearly genome-wide oscillation in gene expression: 2,000 or more genes undergo slight oscillations in expression as a function of the cell cycle, although whether this is adaptive, or incidental to other events in the cell, such as chromatin condensation, we do not know

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Repositori Obert UdL

Stony Brook University - SUNY

FigShare

A robust prognostic signature for hormone-positive node-negative breast cancer

Author: Collisson Eric A
Enache Oana M
Gray Joe W
Griffith Obi L
Heiser Laura M
Pepin Francois
Spellman Paul T
Publication venue: Digital Commons@Becker
Publication date: 01/01/2013
Field of study

BACKGROUND: Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). METHODS: We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. RESULTS: Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. CONCLUSIONS: RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment

Crossref

Springer - Publisher Connector

Digital Commons@Becker

PubMed Central

Integrated analysis of breast cancer cell lines reveals unique signaling pathways

Author: Barbara L Weber
Carolyn L Talcott
Jeffrey R Jackson
Joe W Gray
Keith R Laderoute
Laura M Heiser
Merrill Knapp
Nicholas J Wang
Paul T Spellman
Ph.D Paul T Spellman
Richard F Wooster
Safiyyah Ziyad
Sylvie Laquerre
Wen-Lin Kuo
Yinghui Guan
Zhi Hu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Mapping of sub-networks in the EGFR-MAPK pathway in different breast cancer cell lines reveals that PAK1 may be a marker for sensitivity to MEK inhibitors

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

UNT Digital Library

A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB

Author: A Brazma
AI Saeed
Alvis Brazma
Anna Farne
AR Jones
B Dysvik
BR Zeeberg
CA Ball
Catherine A Ball
Christian J Stoeckert
Donald S Maier
E Manduchi
Ele Holloway
Farrell Wymore
Gavin Sherlock
Helen C Causton
Helen Parkinson
J White
John Quackenbush
Joseph White
Junmin Liu
Kjell Petersen
M Navarange
Michael Miller
MT Vass
P Spellman
Patricia L Whetzel
Paul T Spellman
Philippe Rocca-Serra
PL Whetzel
PT Spellman
R Anbazhagan
Rafael A Irizarry
Tim F Rayner
Ugis Sarkans
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support. RESULTS: We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. CONCLUSION: MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML

University of Bergen

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

Integrated Analyses of microRNAs Demonstrate Their Widespread Influence on Gene Expression in High-Grade Serous Ovarian Carcinoma

Author: Creighton Chad J.
Du Ying
Getz Gad
Gibbs Richard A.
Gunaratne Preethi H.
Hayes D. Neil
Hernandez-Herrera Anadulce
Jacobsen Anders
Larsson Erik
Levine Douglas A.
Mankoo Parminder
Perou Charles M.
Sander Chris
Schultz Nikolaus
Sheridan Robert
Spellman Paul T.
Wheeler David A.
Xiao Weimin
Zhang Yiqun
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The Cancer Genome Atlas (TCGA) Network recently comprehensively catalogued the molecular aberrations in 487 high-grade serous ovarian cancers, with much remaining to be elucidated regarding the microRNAs (miRNAs). Here, using TCGA ovarian data, we surveyed the miRNAs, in the context of their predicted gene targets.Integration of miRNA and gene patterns yielded evidence that proximal pairs of miRNAs are processed from polycistronic primary transcripts, and that intronic miRNAs and their host gene mRNAs derive from common transcripts. Patterns of miRNA expression revealed multiple tumor subtypes and a set of 34 miRNAs predictive of overall patient survival. In a global analysis, miRNA:mRNA pairs anti-correlated in expression across tumors showed a higher frequency of in silico predicted target sites in the mRNA 3'-untranslated region (with less frequency observed for coding sequence and 5'-untranslated regions). The miR-29 family and predicted target genes were among the most strongly anti-correlated miRNA:mRNA pairs; over-expression of miR-29a in vitro repressed several anti-correlated genes (including DNMT3A and DNMT3B) and substantially decreased ovarian cancer cell viability.This study establishes miRNAs as having a widespread impact on gene expression programs in ovarian cancer, further strengthening our understanding of miRNA biology as it applies to human cancer. As with gene transcripts, miRNAs exhibit high diversity reflecting the genomic heterogeneity within a clinically homogeneous disease population. Putative miRNA:mRNA interactions, as identified using integrative analysis, can be validated. TCGA data are a valuable resource for the identification of novel tumor suppressive miRNAs in ovarian as well as other cancers

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

ScholarBank@NUS

FigShare

Integrated analysis of germline and somatic variants in ovarian cancer

Author: Ding Li
Druley Todd E.
Fulton Robert S.
Goodfellow Paul J.
Graubert Timothy A.
Johnson Kimberly J.
Kanchi Krishna L.
Kandoth Cyriac
Koboldt Daniel C.
Larson David E.
Leiserson Mark D.M.
Lu Charles
Mardis Elaine R.
McLellan Michael D.
McMichael Joshua F.
Miller Christopher A.
Raphael Benjamin J.
Schmidt Heather K.
Spellman Paul T.
Wendl Michael C.
Wilson Richard K.
Wyczalkowski Matthew A.
Xie Mingchao
Zhang Qunyuan
Publication venue: Digital Commons@Becker
Publication date: 01/01/2014
Field of study

We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyze germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2, and PALB2. Additionally, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B, and MLL3). Evidence for loss of heterozygosity was found in 100% and 76% of cases with germline BRCA1 and BRCA2 truncations respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 237 candidate functional germline truncation and missense variants, including 2 pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK, and MLL pathways

Crossref

Digital Commons@Becker

PubMed Central