Search CORE

EPEPT: A web service for enhanced P-value estimation in permutation tests

Author: A Subramanian
B Efron
E Edgington
G Benson
Hector Rovira
Ilya Shmulevich
J Boyle
Jake Lin
John Boyle
R Deidda
TA Knijnenburg
Theo A Knijnenburg
VG Tusher
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

arXiv.org e-Print Archive

ProbCD: enrichment analysis accounting for categorization uncertainty

Author: A Lewin
A Vinayagam
B Engelhardt
C Andersson
C Jones
D Martin
E Levy
I Rivals
Ilya Shmulevich
J Goeman
L Goodman
M Aubry
P Shannon
R Fisher
R Sealfon
R Vencio
Ricardo ZN Vêncio
S Carroll
S Maere
T Joshi
W Zhang
W Zhang
Z Jiang
Publication venue
Publication date: 01/01/2007
Field of study

As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R package to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for
the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/. We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation

CiteSeerX

arXiv.org e-Print Archive

Nature Precedings

Simcluster: clustering enumeration gene expression data on the simplex space

Author: Carlos A de B Pereira
E Dougherty
G Stolovitzky
H Thygesen
Helena Brentani
I Braslavsky
Ilya Shmulevich
J Aitchison
J Aitchison
K Okubo
L Cai
L Hood
Leonardo Varuzza
M Bainbridge
M Brun
M de Hoon
M Gilchrist
M Margulies
M Schena
N Bolshakova
R Loganantharaj
R Page
R Vencio
R Vencio
RF Service
Ricardo ZN Vêncio
S Audic
S Brenner
S Datta
S Fodor
T Seo
V Velculescu
Publication venue
Publication date: 01/01/2007
Field of study

Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space.

Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster.

Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Nature Precedings

Adaptable data management for systems biology investigations

Author: B Marzolf
Brazma Aea
C Bachman
CGJ Plaisant
Chris Cavnor
D Kelly
David Burdick
G Fischer
GLJ Kiczales
Hector Rovira
I Goldberg
Ilya Shmulevich
J Boyle
J Saltz
John Boyle
L Hood
LSP Haas
MCX Li
MLT Reich
Sarah Killcoyne
TUA Etzold
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Within research each experiment is different, the focus changes and the data is generated from a continually evolving barrage of technologies. There is a continual introduction of new techniques whose usage ranges from in-house protocols through to high-throughput instrumentation. To support these requirements data management systems are needed that can be rapidly built and readily adapted for new usage. Results The adaptable data management system discussed is designed to support the seamless mining and analysis of biological experiment data that is commonly used in systems biology (e.g. ChIP-chip, gene expression, proteomics, imaging, flow cytometry). We use different content graphs to represent different views upon the data. These views are designed for different roles: equipment specific views are used to gather instrumentation information; data processing oriented views are provided to enable the rapid development of analysis applications; and research project specific views are used to organize information for individual research experiments. This management system allows for both the rapid introduction of new types of information and the evolution of the knowledge it represents. Conclusion Data management is an important aspect of any research enterprise. It is the foundation on which most applications are built, and must be easily extended to serve new functionality for new scientific areas. We have found that adopting a three-tier architecture for data management, built around distributed standardized content repositories, allows us to rapidly develop new applications to support a diverse user community.</p

SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments

Author: A Mortazavi
B Marzolf
Bruz Marzolf
Chris C Cavnor
David B Burdick
H Grosshans
H Ji
Hector Rovira
Ilya Shmulevich
J Boyle
J Eid
J Taylor
Jake Lin
Jeremy Handcock
John Boyle
L Jorgensen
ME Dinger
P Parameswaran
R Cronn
RJ Taft
Ryan Bressler
Sarah Killcoyne
Stephen A Ramsey
T Werner
Y Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Results Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. Conclusion The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.</p

The cancer genome atlas pan-cancer analysis project

Author: Collisson Eric A.
Ellrott Kyle
Mills Shaw Kenna R.
Mills Gordon B.
Ozenberger Brad A.
Sander Chris
Shmulevich Ilya
Stuart Joshua M.
The Cancer Genome Atlas Research Network
Weinstein John N.
Publication venue
Publication date: 01/01/2013
Field of study

The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile

Carolina Digital Repository

Trade-off between Responsiveness and Noise Suppression in Biomolecular System Responses to Environmental Cues

Author: Alexander V. Ratushny
Andrey Rzhetsky
AV Oppenheim
AV Ratushny
B Ren
BW Andrews
D Hwang
D Lockshon
HC Berg
HW Platta
Ilya Shmulevich
JJ Smith
JJ Smith
JJ Smith
John D. Aitchison
JT Mettetal
M Acar
M Nykter
MG Koerkamp
MR Bennett
P Hersen
RB Northrop
RE Kalman
S Mangan
SA Ramsey
T Ideker
TJ Perkins
V Litvak
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

When living systems detect changes in their external environment their response must be measured to balance the need to react appropriately with the need to remain stable, ignoring insignificant signals. Because this is a fundamental challenge of all biological systems that execute programs in response to stimuli, we developed a generalized time-frequency analysis (TFA) framework to systematically explore the dynamical properties of biomolecular networks. Using TFA, we focused on two well-characterized yeast gene regulatory networks responsive to carbon-source shifts and a mammalian innate immune regulatory network responsive to lipopolysaccharides (LPS). The networks are comprised of two different basic architectures. Dual positive and negative feedback loops make up the yeast galactose network; whereas overlapping positive and negative feed-forward loops are common to the yeast fatty-acid response network and the LPS-induced network of macrophages. TFA revealed remarkably distinct network behaviors in terms of trade-offs in responsiveness and noise suppression that are appropriately tuned to each biological response. The wild type galactose network was found to be highly responsive while the oleate network has greater noise suppression ability. The LPS network appeared more balanced, exhibiting less bias toward noise suppression or responsiveness. Exploration of the network parameter space exposed dramatic differences in system behaviors for each network. These studies highlight fundamental structural and dynamical principles that underlie each network, reveal constrained parameters of positive and negative feedback and feed-forward strengths that tune the networks appropriately for their respective biological roles, and demonstrate the general utility of the TFA approach for systems and synthetic biology

CiteSeerX

Public Library of Science (PLOS)