Search CORE

2,604 research outputs found

A unified framework for finding differentially expressed genes from microarray experiments

Author: C Tang
C Zhang
D Stekel
DL Davies
G Casella
G Getz
GJ McLachlan
H Hui-Huang
H Sahai
I Guyon
I Lonnstedt
IB Jeffery
J Shaik
J Shaik
J Shaik
J Shaik
J Shaik
Jahangheer S Shaik
JD Storey
Mohammed Yeasin
P Tamayo
RA Fisher
RL Fernando
RM Miller
RO Duda
S Mukherjee
S Tavazoie
T Li
TR Golub
U Alon
VG Tusher
X Chen
Y Benjamini
Y Benjamini
Y Su
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background This paper presents a unified framework for finding differentially expressed genes (DEGs) from the microarray data. The proposed framework has three interrelated modules: (i) gene ranking, ii) significance analysis of genes and (iii) validation. The first module uses two gene selection algorithms, namely, a) two-way clustering and b) combined adaptive ranking to rank the genes. The second module converts the gene ranks into p-values using an R-test and fuses the two sets of p-values using the Fisher's omnibus criterion. The DEGs are selected using the FDR analysis. The third module performs three fold validations of the obtained DEGs. The robustness of the proposed unified framework in gene selection is first illustrated using false discovery rate analysis. In addition, the clustering-based validation of the DEGs is performed by employing an adaptive subspace-based clustering algorithm on the training and the test datasets. Finally, a projection-based visualization is performed to validate the DEGs obtained using the unified framework. Results The performance of the unified framework is compared with well-known ranking algorithms such as t-statistics, Significance Analysis of Microarrays (SAM), Adaptive Ranking, Combined Adaptive Ranking and Two-way Clustering. The performance curves obtained using 50 simulated microarray datasets each following two different distributions indicate the superiority of the unified framework over the other reported algorithms. Further analyses on 3 real cancer datasets and 3 Parkinson's datasets show the similar improvement in performance. First, a 3 fold validation process is provided for the two-sample cancer datasets. In addition, the analysis on 3 sets of Parkinson's data is performed to demonstrate the scalability of the proposed method to multi-sample microarray datasets. Conclusion This paper presents a unified framework for the robust selection of genes from the two-sample as well as multi-sample microarray experiments. Two different ranking methods used in module 1 bring diversity in the selection of genes. The conversion of ranks to p-values, the fusion of p-values and FDR analysis aid in the identification of significant genes which cannot be judged based on gene ranking alone. The 3 fold validation, namely, robustness in selection of genes using FDR analysis, clustering, and visualization demonstrate the relevance of the DEGs. Empirical analyses on 50 artificial datasets and 6 real microarray datasets illustrate the efficacy of the proposed approach. The analyses on 3 cancer datasets demonstrate the utility of the proposed approach on microarray datasets with two classes of samples. The scalability of the proposed unified approach to multi-sample (more than two sample classes) microarray datasets is addressed using three sets of Parkinson's Data. Empirical analyses show that the unified framework outperformed other gene selection methods in selecting differentially expressed genes from microarray data.</p

University of Memphis Digital Commons

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bisociative knowledge discovery for microarray data analysis

Author: Gruden Kristina
Kulovesi Kimmo
Lavrac Nada
Motaln Helena
Mozetic Igor
Novak Petra Kralj
Petek Marko
Podpecan Vid
Toivonen Hannu
Publication venue: Department of Informatics Engineering University of Coimbra
Publication date: 01/01/2010
Field of study

Peer reviewe

CiteSeerX

Helsingin yliopiston digitaalinen arkisto

Diverse correlation structures in gene expression data and their utility in improving statistical inference

Author: Klebanov Lev
Yakovlev Andrei
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/12/2007
Field of study

It is well known that correlations in microarray data represent a serious nuisance deteriorating the performance of gene selection procedures. This paper is intended to demonstrate that the correlation structure of microarray data provides a rich source of useful information. We discuss distinct correlation substructures revealed in microarray gene expression data by an appropriate ordering of genes. These substructures include stochastic proportionality of expression signals in a large percentage of all gene pairs, negative correlations hidden in ordered gene triples, and a long sequence of weakly dependent random variables associated with ordered pairs of genes. The reported striking regularities are of general biological interest and they also have far-reaching implications for theory and practice of statistical methods of microarray data analysis. We illustrate the latter point with a method for testing differential expression of nonoverlapping gene pairs. While designed for testing a different null hypothesis, this method provides an order of magnitude more accurate control of type 1 error rate compared to conventional methods of individual gene expression profiling. In addition, this method is robust to the technical noise. Quantitative inference of the correlation structure has the potential to extend the analysis of microarray data far beyond currently practiced methods.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS120 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Sequential stopping for high-throughput experiments

Author: Armstrong
Campo Dell'orto
D. Rossell
P. Muller
Tibshirani
Yang
Zien
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/08/2012
Field of study

In high-throughput experiments, the sample size is typically chosen informally. Most formal sample-size calculations depend critically on prior knowledge. We propose a sequential strategy that, by updating knowledge when new data are available, depends less critically on prior assumptions. Experiments are stopped or continued based on the potential benefits in obtaining additional data. The underlying decision-theoretic framework guarantees the design to proceed in a coherent fashion. We propose intuitively appealing, easy-to-implement utility functions. As in most sequential design problems, an exact solution is prohibitive. We propose a simulation-based approximation that uses decision boundaries. We apply the method to RNA-seq, microarray, and reverse-phase protein array studies and show its potential advantages. The approach has been added to the Bioconductor package gaga

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Understanding pathways

Author: Soh Donny
Soh Donny
Publication venue: Computing, Imperial College London
Publication date: 01/03/2011
Field of study

The challenge with todays microarray experiments is to infer biological conclusions from them. There are two crucial difficulties to be surmounted in this challenge:(1) A lack of suitable biological repository that can be easily integrated into computational algorithms. (2) Contemporary algorithms used to analyze microarray data are unable to draw consistent biological results from diverse datasets of the same disease. To deal with the first difficulty, we believe a core database that unifies available biological repositories is important. Towards this end, we create a unified biological database from three popular biological repositories (KEGG, Ingenuity and Wikipathways). This database provides computer scientists the flexibility of easily integrating biological information using simple API calls or SQL queries. To deal with the second difficulty of deriving consistent biological results from the experiments, we first conceptualize the notion of “subnetworks”, which refers to a connected portion in a biological pathway. Then we propose a method that identifies subnetworks that are consistently expressed by patients of he same disease phenotype. We test our technique on independent datasets of several diseases, including ALL, DMD and lung cancer. For each of these diseases, we obtain two independent microarray datasets produced by distinct labs on distinct platforms. In each case, our technique consistently produces overlapping lists of significant nontrivial subnetworks from two independent sets of microarray data. The gene-level agreement of these significant subnetworks is between 66.67% to 91.87%. In contrast, when the same pairs of microarray datasets were analysed using GSEA and t-test, this percentage fell between 37% to 55.75% (GSEA) and between 2.55% to 19.23% (t-test). Furthermore, the genes selected using GSEA and t-test do not form subnetworks of substantial size. Thus it is more probable that the subnetworks selected by our technique can provide the researcher with more descriptive information on the portions of the pathway which actually associates with the disease. Keywords: pathway analysis, microarra

Spiral - Imperial College Digital Repository

Recommended from our members

An Automated Bayesian Framework for Integrative Gene Expression Analysis and Predictive Medicine

Author: Alterovitz Gil
Parikh Neena
Zollanvari Amin
Publication venue: American Medical Informatics Association
Publication date: 21/03/2013
Field of study

Motivation: This work constructs a closed loop Bayesian Network framework for predictive medicine via integrative analysis of publicly available gene expression findings pertaining to various diseases. Results: An automated pipeline was successfully constructed. Integrative models were made based on gene expression data obtained from GEO experiments relating to four different diseases using Bayesian statistical methods. Many of these models demonstrated a high level of accuracy and predictive ability. The approach described in this paper can be applied to any complex disorder and can include any number and type of genome-scale studies

Harvard University - DASH

A statistical framework for the analysis of microarray probe-level data

Author: Irizarry Rafael A.
Wu Zhijin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/12/2007
Field of study

In microarray technology, a number of critical steps are required to convert the raw measurements into the data relied upon by biologists and clinicians. These data manipulations, referred to as preprocessing, influence the quality of the ultimate measurements and studies that rely upon them. Standard operating procedure for microarray researchers is to use preprocessed data as the starting point for the statistical analyses that produce reported results. This has prevented many researchers from carefully considering their choice of preprocessing methodology. Furthermore, the fact that the preprocessing step affects the stochastic properties of the final statistical summaries is often ignored. In this paper we propose a statistical framework that permits the integration of preprocessing into the standard statistical analysis flow of microarray data. This general framework is relevant in many microarray platforms and motivates targeted analysis methods for specific applications. We demonstrate its usefulness by applying the idea in three different applications of the technology.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS116 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Next station in microarray data analysis: GEPAS

Author: Al-Shahrour Fátima
Alloza Eva
Burguet Jordi
Conde Lucía
Dopazo Joaquín
Herrero Javier
Huerta-Cepas Jaime
Minguez Pablo
Montaner David
Mukherjee Sach
Pujana Miguel A. G.
Tárraga Joaquín
Valls Joan
Vaquerizas Juan M.
Vera Javier
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and array-comparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at

Diposit Digital de la Universitat de Barcelona