Search CORE

370,194 research outputs found

Independent component analysis of Alzheimer's DNA microarray gene expression data

Author: Chen Zhongxue
Huang Xudong
Kong Wei
Liu Qingzhong
Mou Xiaoyang
Rogers Jack T
Vanderburg Charles R
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Gene microarray technology is an effective tool to investigate the simultaneous activity of multiple cellular pathways from hundreds to thousands of genes. However, because data in the colossal amounts generated by DNA microarray technology are usually complex, noisy, high-dimensional, and often hindered by low statistical power, their exploitation is difficult. To overcome these problems, two kinds of unsupervised analysis methods for microarray data: principal component analysis (PCA) and independent component analysis (ICA) have been developed to accomplish the task. PCA projects the data into a new space spanned by the principal components that are mutually orthonormal to each other. The constraint of mutual orthogonality and second-order statistics technique within PCA algorithms, however, may not be applied to the biological systems studied. Extracting and characterizing the most informative features of the biological signals, however, require higher-order statistics. Results ICA is one of the unsupervised algorithms that can extract higher-order statistical structures from data and has been applied to DNA microarray gene expression data analysis. We performed FastICA method on DNA microarray gene expression data from Alzheimer's disease (AD) hippocampal tissue samples and consequential gene clustering. Experimental results showed that the ICA method can improve the clustering results of AD samples and identify significant genes. More than 50 significant genes with high expression levels in severe AD were extracted, representing immunity-related protein, metal-related protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, cellular binding protein, and ribosomal protein. Within the aforementioned categories, our method also found 37 significant genes with low expression levels. Moreover, it is worth noting that some oncogenes and phosphorylation-related proteins are expressed in low levels. In comparison to the PCA and support vector machine recursive feature elimination (SVM-RFE) methods, which are widely used in microarray data analysis, ICA can identify more AD-related genes. Furthermore, we have validated and identified many genes that are associated with AD pathogenesis. Conclusion We demonstrated that ICA exploits higher-order statistics to identify gene expression profiles as linear combinations of elementary expression patterns that lead to the construction of potential AD-related pathogenic pathways. Our computing results also validated that the ICA model outperformed PCA and the SVM-RFE method. This report shows that ICA as a microarray data analysis tool can help us to elucidate the molecular taxonomy of AD and other multifactorial and polygenic complex diseases.</p

Crossref

Scholarly Works @ SHSU (Sam Houston State University)

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A novel dimensionality reduction technique based on independent component analysis for modeling microarray gene expression data

Author: Kustra Rafal
Liu Han
Zhang Ji
Publication venue: CSREA Press
Publication date: 01/01/2004
Field of study

DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. But one challenge of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usually in thousands. In statistical terms this very large number of predictors compared to a small number of samples or observations makes the classification problem difficult. This is known as the ”curse of dimensionality problem”. An efficient way to solve this problem is by using dimensionality reduction techniques. Principle Component Analysis(PCA) is a leading method for dimensionality reduction of gene expression data which is optimal in the sense of least square error. In this paper we propose a new dimensionality reduction technique for specific bioinformatics applications based on Independent component Analysis(ICA). Being able to exploit higher order statistics to identify a linear model result, this ICA based dimensionality reduction technique outperforms PCA from both statistical and biological significance aspects. We present experiments on NCI 60 dataset to show this result

University of Southern Queensland ePrints

Multiclass microarray gene expression classification based on fusion of correlation features

Author: Chetty Girija
Chetty Madhu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

In this paper, we propose novel algorithmic models based on fusion of independent and correlated gene features for multiclass microarray gene expression classification. It is possible for genes to get co-expressed via different pathways. Moreover, a gene may or may not be co-active for all samples. In this paper, we approach this problem with a optimal feature selection technique using analysis based on statistical techniques to model the complex interactions between genes. The two different types of correlation modelling techniques based on the cross modal factor analysis (CFA) and canonical correlation analysis (CCA) were examined. The subsequent fusion of CCA/CFA features with principal component analysis (PCA) features at feature-level, and at score-level result in significant enhancement in classification accuracy for different data sets corresponding to multiclass microarray gene expression data

Crossref

University of Canberra Research Repository

Federation ResearchOnline

Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects

Author: Neil Lawrence
Nicolo Fusi
Oliver Stegle
Publication venue
Publication date: 02/06/2011
Field of study

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. 

Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an
eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation. 

We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies

Nature Precedings

Multivariate curve resolution of time course microarray data

Author: A de Juan
A de Juan
BM Kim
Christopher P Allen
DM Rocke
DP Kreil
E Segal
ER Malinowski
H Kim
JH Jiang
L Liu
M Juanita Martinez
M Schuemans
M Van Benthem
Margaret Werner-Washburne
MCK Yang
MN Leger
NS Holter
O Alter
O Alter
O Alter
OS Borgen
P Lu
PD Wentzell
Peter D Wentzell
PJ Gemperline
PT Spellman
R Rajkó
R Tauler
R Tauler
S Bergmann
S Raychaudhuri
S Van Huffel
SI Lee
Sushmita Roy
T Ideker
Tobias K Karakach
W Huber
W Liebermeister
W Windig
WH Lawton
X Cui
Y Chen
Z Bar-Joseph
Publication venue: BioMed Central
Publication date: 01/07/2006
Field of study

BACKGROUND: Modeling of gene expression data from time course experiments often involves the use of linear models such as those obtained from principal component analysis (PCA), independent component analysis (ICA), or other methods. Such methods do not generally yield factors with a clear biological interpretation. Moreover, implicit assumptions about the measurement errors often limit the application of these methods to log-transformed data, destroying linear structure in the untransformed expression data. RESULTS: In this work, a method for the linear decomposition of gene expression data by multivariate curve resolution (MCR) is introduced. The MCR method is based on an alternating least-squares (ALS) algorithm implemented with a weighted least squares approach. The new method, MCR-WALS, extracts a small number of basis functions from untransformed microarray data using only non-negativity constraints. Measurement error information can be incorporated into the modeling process and missing data can be imputed. The utility of the method is demonstrated through its application to yeast cell cycle data. CONCLUSION: Profiles extracted by MCR-WALS exhibit a strong correlation with cell cycle-associated genes, but also suggest new insights into the regulation of those genes. The unique features of the MCR-WALS algorithm are its freedom from assumptions about the underlying linear model other than the non-negativity of gene expression, its ability to analyze non-log-transformed data, and its use of measurement error information to obtain a weighted model and accommodate missing measurements

Crossref

Directory of Open Access Journals

PubMed Central

Explorative data analysis of MCL reveals gene expression networks implicated in survival and prognosis supported by explorative CGH analysis

Author: A Alizadeh
A Rosenwald
A Sala
Andreas Rosenwald
B Lapeyre
B Stefansson
C Bogner
C Norbury
C Norbury
C Schrader
C Weiss
CJF Ter Braak
D Monk
DL DeWitt
E Aleem
E Grinstein
F Bosch
F Mueller-Pillasch
F Rubio-Moscardo
F Wilcoxon
FW Wiese
G Ambrosini
G Maga
GA Velders
GK Smyth
Hans K Müller-Hermelink
HR Herschman
I Salaverria
I Wittke
J Bond
J Golay
J Li
J Nielsen
J Oksanen
Julia C Engelmann
Jörg Schultz
K Milde-Langosch
K Monica
KA Schafer
KB Marcu
LH Argatoff
M Derenzini
M Hartl
M Malumbres
M Srivastava
M Weniger
MA Lampson
Markus Weniger
ME Crosby
N Sethi
P Andersen
PS Knoepfler
R Development Core Team
R Gentleman
R Raty
R Starr
RM Garavito
RN Eisenman
S Akira
S Horstmann
S Pelengaris
S Peri
Stefan Pinkert
Steffen Blenk
T Hla
T Katzenberger
T Therneau
Thomas Dandekar
TJ Yen
Tobias Müller
V Cesi
VA Beardmore
WN Venables
Y Benjamini
Z Tang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Mantle cell lymphoma (MCL) is an incurable B cell lymphoma and accounts for 6% of all non-Hodgkin's lymphomas. On the genetic level, MCL is characterized by the hallmark translocation t(11;14) that is present in most cases with few exceptions. Both gene expression and comparative genomic hybridization (CGH) data vary considerably between patients with implications for their prognosis. Methods We compare patients over and below the median of survival. Exploratory principal component analysis of gene expression data showed that the second principal component correlates well with patient survival. Explorative analysis of CGH data shows the same correlation. Results On chromosome 7 and 9 specific genes and bands are delineated which improve prognosis prediction independent of the previously described proliferation signature. We identify a compact survival predictor of seven genes for MCL patients. After extensive re-annotation using GEPAT, we established protein networks correlating with prognosis. Well known genes (CDC2, CCND1) and further proliferation markers (WEE1, CDC25, aurora kinases, BUB1, PCNA, E2F1) form a tight interaction network, but also non-proliferative genes (SOCS1, TUBA1B CEBPB) are shown to be associated with prognosis. Furthermore we show that aggressive MCL implicates a gene network shift to higher expressed genes in late cell cycle states and refine the set of non-proliferative genes implicated with bad prognosis in MCL. Conclusion The results from explorative data analysis of gene expression and CGH data are complementary to each other. Including further tests such as Wilcoxon rank test we point both to proliferative and non-proliferative gene networks implicated in inferior prognosis of MCL and identify suitable markers both in gene expression and CGH data.</p

University of Regensburg Publication Server

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central