Search CORE

1,561 research outputs found

Multivariate hierarchical Bayesian model for differential gene expression analysis in microarray experiments

Author: A Lewin
B Efron
BP Durbin
CM Kendziorski
D Amaratunga
D Shalon
DA Notterman
Hong Yan
Hongya Zhao
JD Storey
JG Ibrahim
K Lo
Kwok-Leung Chan
Lee-Ming Cheng
M Schena
M Schena
MA Harris
MA Newton
MA Newton
MA Sartor
N Dean
P Baldi
P Broet
P Sham
PM Lee
PO Brown
R Delongchamp
R Gottardo
S Dudoit
S Wang
SOM Manda
VG Tusher
W Huber
Y Benjamini
Y Chen
YH Yang
YH Yang
YH Yang
Publication venue: BioMed Central
Publication date: 13/02/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data

Author: Li Hongzhe
Wei Zhi
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 02/10/2007
Field of study

Microarray time course (MTC) gene expression data are commonly collected to study the dynamic nature of biological processes. One important problem is to identify genes that show different expression profiles over time and pathways that are perturbed during a given biological process. While methods are available to identify the genes with differential expression levels over time, there is a lack of methods that can incorporate the pathway information in identifying the pathways being modified/activated during a biological process. In this paper we develop a hidden spatial-temporal Markov random field (hstMRF)-based method for identifying genes and subnetworks that are related to biological processes, where the dependency of the differential expression patterns of genes on the networks are modeled over time and over the network of pathways. Simulation studies indicated that the method is quite effective in identifying genes and modified subnetworks and has higher sensitivity than the commonly used procedures that do not use the pathway structure or time dependency information, with similar false discovery rates. Application to a microarray gene expression study of systemic inflammation in humans identified a core set of genes on the KEGG pathways that show clear differential expression patterns over time. In addition, the method confirmed that the TOLL-like signaling pathway plays an important role in immune response to endotoxins.Comment: Published in at http://dx.doi.org/10.1214/07--AOAS145 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Collection Of Biostatistics Research Archive

Nonparametric false discovery rate control for identifying simultaneous signals

Author: Nguyen Yet Tien
Zhao Sihai Dave
Publication venue
Publication date: 15/01/2019
Field of study

It is frequently of interest to jointly analyze multiple sequences of multiple tests in order to identify simultaneous signals, defined as features tested in multiple studies whose test statistics are non-null in each. In many problems, however, the null distributions of the test statistics may be complicated or even unknown, and there do not currently exist any procedures that can be employed in these cases. This paper proposes a new nonparametric procedure that can identify simultaneous signals across multiple studies even without knowing the null distributions of the test statistics. The method is shown to asymptotically control the false discovery rate, and in simulations had excellent power and error control. In an analysis of gene expression and histone acetylation patterns in the brains of mice exposed to a conspecific intruder, it identified genes that were both differentially expressed and next to differentially accessible chromatin. The proposed method is available in the R package github.com/sdzhao/ssa

arXiv.org e-Print Archive

Old Dominion University

A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

Author: Fokoue Ernest
Publication venue
Publication date: 01/01/2014
Field of study

Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham's razor non plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.Comment: 18 pages, 2 figures 3 table

arXiv.org e-Print Archive

Bulgarian Digital Mathematics Library at IMI-BAS

Listen to genes : dealing with microarray data in the frequency domain

Author: A Claridge-Chang
AN Stepanova
AN Stepanova
B-R Kim
Diego Di Bernardo
Dongyun Yi
H Guo
H Ueda
HG McWatters
IP Androulakis
J Fan
J Fan
J Qian
JCW Locke
JH Wu
Jianfeng Feng
MJ Yanovsky
MR Doyle
N Dojer
P DHaeseleer
PO Lim
PT Spellman
R Balasubramaniyan
R Cristi
Ritesh Krishna
S Kim
S Wichert
Shuixia Guo
SL Harmer
SX Guo
U Alon
Vicky Buchanan-Wollaston
W Pan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/04/2009
Field of study

Background: We present a novel and systematic approach to analyze temporal microarray data. The approach includes normalization, clustering and network analysis of genes. Methodology: Genes are normalized using an error model based uniform normalization method aimed at identifying and estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000 genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail. Conclusions: We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of potential interest to Arabidopsis researchers

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Random Forests: some methodological insights

Author: Genuer Robin
Poggi Jean-Michel
Tuleau Christine
Publication venue
Publication date: 01/01/2008
Field of study

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy

arXiv.org e-Print Archive

HAL-UNICE

INRIA a CCSD electronic archive server

Recommended from our members

Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset

Author: A Chadt
A Colorni
A Gamez-Pozo
A Rasche
A Tiss
A Tiss
AC Sauve
AL Oberg
Alexandra Chadt
Ali Tiss
B Wu
C Bauer
C Mercier
C Yang
Celia J Smith
Chris Bauer
D Kwon
D Mantini
DB West
Dieter Beule
E Lange
EP Xing
Frank Kleinjung
G Ge
GK Smyth
H Ressom
Hadi Al-Hasani
HS Jurgens
HS Jürgens
I Guyon
J Hua
J McGuire
J Norris
J Voortman
JE Shaw
JF Timms
JL Rodgers
Johannes Schuchhardt
Johnson RAaBGK
JR Ortlepp
K Coombes
Knut Reinert
L Breiman
M Dorigo
M Kirchner
M Palmblad
M Sturm
Mark W Towers
ME de Noo
MJ Crawley
MP van der Werff
N Tiffin
O Kohlbacher
P Du
P Pratapa
P Zhang
PV Rao
Q Liu
R Aebersold
R Cramer
Rainer Cramer
RC Gentleman
Robert Gentleman and Vince Carey and Wolfgang Huber and Rafael Irizarry and Sandrine Dudoit (Ed)
SM Carlson
T Alexandrov
T Dreja
T Hastie
Tanja Dreja
W Yu
X Liu
X Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics

Central Archive at the University of Reading

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central