Search CORE

18 research outputs found

Multi-Edge Gene Set Networks Reveal Novel Insights into Global Relationships between Biological Themes

Author: Marto Jarrod
Parikh Jignesh R.
Xia Yu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/

CiteSeerX

Public Library of Science (PLOS)

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

Recommended from our members

Discovering Causal Signaling Pathways Through Gene-Expression Patterns

Author: Blüthgen Nils
Klinger Bertram
Marto Jarrod
Parikh Jignesh R.
Xia Yu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/12/2012
Field of study

High-throughput gene-expression studies result in lists of differentially expressed genes. Most current meta-analyses of these gene lists include searching for significant membership of the translated proteins in various signaling pathways. However, such membership enrichment algorithms do not provide insight into which pathways caused the genes to be differentially expressed in the first place. Here, we present an intuitive approach for discovering upstream signaling pathways responsible for regulating these differentially expressed genes. We identify consistently regulated signature genes specific for signal transduction pathways from a panel of single-pathway perturbation experiments. An algorithm that detects overrepresentation of these signature genes in a gene group of interest is used to infer the signaling pathway responsible for regulation. We expose our novel resource and algorithm through a web server called SPEED: Signaling Pathway Enrichment using Experimental Data sets. SPEED can be freely accessed at http://speed.sys-bio.net/

Harvard University - DASH

Multiplierz: An Extensible API Based Desktop Environment for Proteomics Data Analysis

Author: Askenazi Manor
Blank Nathaniel C.
Cashorali Tanya
Ficarro Scott B.
Marto Jarrod A.
Parikh Jignesh R.
Webber James T.
Zhang Yi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

BACKGROUND. Efficient analysis of results from mass spectrometry-based proteomics experiments requires access to disparate data types, including native mass spectrometry files, output from algorithms that assign peptide sequence to MS/MS spectra, and annotation for proteins and pathways from various database sources. Moreover, proteomics technologies and experimental methods are not yet standardized; hence a high degree of flexibility is necessary for efficient support of high- and low-throughput data analytic tasks. Development of a desktop environment that is sufficiently robust for deployment in data analytic pipelines, and simultaneously supports customization for programmers and non-programmers alike, has proven to be a significant challenge. RESULTS. We describe multiplierz, a flexible and open-source desktop environment for comprehensive proteomics data analysis. We use this framework to expose a prototype version of our recently proposed common API (mzAPI) designed for direct access to proprietary mass spectrometry files. In addition to routine data analytic tasks, multiplierz supports generation of information rich, portable spreadsheet-based reports. Moreover, multiplierz is designed around a "zero infrastructure" philosophy, meaning that it can be deployed by end users with little or no system administration support. Finally, access to multiplierz functionality is provided via high-level Python scripts, resulting in a fully extensible data analytic environment for rapid development of custom algorithms and deployment of high-throughput data pipelines. CONCLUSION. Collectively, mzAPI and multiplierz facilitate a wide range of data analysis tasks, spanning technology development to biological annotation, for mass spectrometry-based proteomics research.Dana-Farber Cancer Institute; National Human Genome Research Institute (P50HG004233); National Science Foundation Integrative Graduate Education and Research Traineeship grant (DGE-0654108

Crossref

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

PubMed Central

Recommended from our members

multiplierz: An Extensible API Based Desktop Environment for Proteomics Data Analysis

Author: Askenazi Manor
Blank Nathaniel C
Cashorali Tanya
Ficarro Scott
Marto Jarrod
Parikh Jignesh R
Webber James T
Zhang Yi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/02/2011
Field of study

Background: Efficient analysis of results from mass spectrometry-based proteomics experiments requires access to disparate data types, including native mass spectrometry files, output from algorithms that assign peptide sequence to MS/MS spectra, and annotation for proteins and pathways from various database sources. Moreover, proteomics technologies and experimental methods are not yet standardized; hence a high degree of flexibility is necessary for efficient support of high- and low-throughput data analytic tasks. Development of a desktop environment that is sufficiently robust for deployment in data analytic pipelines, and simultaneously supports customization for programmers and non-programmers alike, has proven to be a significant challenge. Results: We describe multiplierz, a flexible and open-source desktop environment for comprehensive proteomics data analysis. We use this framework to expose a prototype version of our recently proposed common API (mzAPI) designed for direct access to proprietary mass spectrometry files. In addition to routine data analytic tasks, multiplierz supports generation of information rich, portable spreadsheet-based reports. Moreover, multiplierz is designed around a "zero infrastructure" philosophy, meaning that it can be deployed by end users with little or no system administration support. Finally, access to multiplierz functionality is provided via high-level Python scripts, resulting in a fully extensible data analytic environment for rapid development of custom algorithms and deployment of high-throughput data pipelines. Conclusion: Collectively, mzAPI and multiplierz facilitate a wide range of data analysis tasks, spanning technology development to biological annotation, for mass spectrometry-based proteomics research

Harvard University - DASH

Production of a reference transcriptome and transcriptomic database (PocilloporaBase) for the cauliflower coral, Pocillopora damicornis

Author: AM Kerr
AM Reitzel
B Chevreux
BE Brown
Brian R Granger
CL Hunter
CR Voolstra
DC Rio
E Meyer
EP Green
ER Mardis
GK Ostrander
I Letunic
JA Schwarz
Jarrod A Marto
JB Jackson
JF Ryan
JH Pinzon
Jignesh R Parikh
JM Pandolfi
JM Rothberg
John R Finnerty
KJ Portune
LC Grasso
LD Mydlarz
Les Kaufman
LK Bay
M Kanehisa
MJH Van Oppen
MK DeSalvo
NH Putnam
Nikki Traylor-Knowles
O Levy
RW Grigg
S Foret
S Sunagawa
SA Sandin
Sara Garamszegi
SE Edge
Tristan J Lubinski
Yu Xia
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Motivated by the precarious state of the world's coral reefs, there is currently a keen interest in coral transcriptomics. By identifying changes in coral gene expression that are triggered by particular environmental stressors, we can begin to characterize coral stress responses at the molecular level, which should lead to the development of more powerful diagnostic tools for evaluating the health of corals in the field. Furthermore, the identification of genetic variants that are more or less resilient in the face of particular stressors will help us to develop more reliable prognoses for particular coral populations. Toward this end, we performed deep mRNA sequencing of the cauliflower coral, <it>Pocillopora damicornis</it>, a geographically widespread Indo-Pacific species that exhibits a great diversity of colony forms and is able to thrive in habitats subject to a wide range of human impacts. Importantly, <it>P. damicornis </it>is particularly amenable to laboratory culture. We collected specimens from three geographically isolated Hawaiian populations subjected to qualitatively different levels of human impact. We isolated RNA from colony fragments ("nubbins") exposed to four environmental stressors (heat, desiccation, peroxide, and hypo-saline conditions) or control conditions. The RNA was pooled and sequenced using the 454 platform. Description Both the raw reads (n = 1, 116, 551) and the assembled contigs (n = 70, 786; mean length = 836 nucleotides) were deposited in a new publicly available relational database called PocilloporaBase <url>http://www.PocilloporaBase.org</url>. Using BLASTX, 47.2% of the contigs were found to match a sequence in the NCBI database at an E-value threshold of ≤.001; 93.6% of those contigs with matches in the NCBI database appear to be of metazoan origin and 2.3% bacterial origin, while most of the remaining 4.1% match to other eukaryotes, including algae and amoebae. Conclusions <it>P. damicornis </it>now joins the handful of coral species for which extensive transcriptomic data are publicly available. Through PocilloporaBase <url>http://www.PocilloporaBase.org</url>, one can obtain assembled contigs and raw reads and query the data according to a wide assortment of attributes including taxonomic origin, PFAM motif, KEGG pathway, and GO annotation.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Miami: Scholarship Miami

KEGG Pathway co-membership and co-differential expression gene set networks.

Author: Jarrod A. Marto (104616)
Jignesh R. Parikh (136422)
Yu Xia (8822)
Publication venue
Publication date
Field of study

In the KEGG Pathway gene set networks nodes represent KEGG Pathways; green nodes are metabolic pathways and purple nodes are non-metabolic pathways. A) The KEGG Pathway co-membership gene set network represents pathway crosstalk with an edge indicating a significant degree of crosstalk. B) The KEGG Pathway co-differential expression (co-DE) gene set network is constructed using the co-enrichment method applied to over five thousand differentially expressed gene lists derived from gene expression microarray data. C) A novel “Folding, Sorting, and Degradation” module is unique to the co-DE gene set network.</p

FigShare

Differences in co-differentially expressed chromosome loci with respect to 3D proximity and pathway participation.

Author: Jarrod A. Marto (104616)
Jignesh R. Parikh (136422)
Yu Xia (8822)
Publication venue
Publication date
Field of study

Data averages for same chromosome loci pairs are shown as green columns while data averages for different chromosome loci pairs are shown as purple columns. The columns are separated by whether the loci pairs are co-differentially expressed or not. A) The median number of Hi-C reads indicating contacts between pairs of chromosome loci in 3D space. B) The median pathway participation profile similarity between loci pairs computed based on the KEGG Pathway annotations of the corresponding genes. Asterisks (*) indicate a significant difference in values (Benjamini-corrected Wilcoxon-Mann-Whitney test p-value <0.05).</p

FigShare

Generalized gene set network construction methods.

Author: Jarrod A. Marto (104616)
Jignesh R. Parikh (136422)
Yu Xia (8822)
Publication venue
Publication date
Field of study

A) Co-membership gene set networks connect gene sets if there is significant overlap in the gene set members. B) Linkage gene set networks connect a pair of gene sets if there are a significant number of edges between the unique components of the gene sets in a reference single-biomolecule network. C) Co-enrichment gene set networks connect gene sets if there are a significant number of experiments where the unique components of the gene sets are enriched together. D) The application of each of the three gene set network methods to 8 different gene set collections; the number of gene sets in each collection are noted in parentheses. The Venn diagrams describe the overlap in gene set pairs (edges) between two or all three gene set networks per collection.</p

FigShare

Most significant edge in each gene set network.

Author: Jarrod A. Marto (104616)
Jignesh R. Parikh (136422)
Yu Xia (8822)
Publication venue
Publication date
Field of study

Most significant edge in each gene set network.</p

FigShare