110,853 research outputs found
Analysis of Gene Expression Data Using BRB-Array Tools
BRB-ArrayTools is an integrated software system for the comprehensive analysis of DNA microarray experiments. It was developed by professional biostatisticians experienced in the design and analysis of DNA microarray studies and incorporates methods developed by leading statistical laboratories. The software is designed for use by biomedical scientists who wish to have access to state-of-the-art statistical methods for the analysis of gene expression data and to receive training in the statistical analysis of high dimensional data. The software provides the most extensive set of tools available for predictive classifier development and complete cross-validation. It offers extensive links to genomic websites for gene annotation and analysis tools for pathway analysis. An archive of over 100 datasets of published microarray data with associated clinical data is provided and BRB-ArrayTools automatically imports data from the Gene Expression Omnibus public archive at the National Center for Biotechnology Information
Robust Group-Level Inference in Neuroimaging Genetic Studies
International audienceGene-neuroimaging studies involve high-dimensional data that have a complex statistical structure and that are likely to be contaminated with outliers. Robust, outlier-resistant methods are an alternative to prior outliers removal, which is a difficult task under high-dimensional unsupervised settings. In this work, we consider robust regression and its application to neuroimaging through an example gene-neuroimaging study on a large cohort of 300 subjects. We use randomized brain parcellation to sample a set of adapted low-dimensional spatial models to analyse the data. We combine this approach with robust regression in an analysis method that we show is outperforming state-of-the-art neuroimaging analysis methods
Principal component gene set enrichment (PCGSE)
Motivation: Although principal component analysis (PCA) is widely used for
the dimensional reduction of biomedical data, interpretation of PCA results
remains daunting. Most existing methods attempt to explain each principal
component (PC) in terms of a small number of variables by generating
approximate PCs with few non-zero loadings. Although useful when just a few
variables dominate the population PCs, these methods are often inadequate for
characterizing the PCs of high-dimensional genomic data. For genomic data,
reproducible and biologically meaningful PC interpretation requires methods
based on the combined signal of functionally related sets of genes. While gene
set testing methods have been widely used in supervised settings to quantify
the association of groups of genes with clinical outcomes, these methods have
seen only limited application for testing the enrichment of gene sets relative
to sample PCs. Results: We describe a novel approach, principal component gene
set enrichment (PCGSE), for computing the statistical association between gene
sets and the PCs of genomic data. The PCGSE method performs a two-stage
competitive gene set test using the correlation between each gene and each PC
as the gene-level test statistic with flexible choice of both the gene set test
statistic and the method used to compute the null distribution of the gene set
statistic. Using simulated data with simulated gene sets and real gene
expression data with curated gene sets, we demonstrate that biologically
meaningful and computationally efficient results can be obtained from a simple
parametric version of the PCGSE method that performs a correlation-adjusted
two-sample t-test between the gene-level test statistics for gene set members
and genes not in the set. Availability:
http://cran.r-project.org/web/packages/PCGSE/index.html Contact:
[email protected] or [email protected]
Predictive response-relevant clustering of expression data provides insights into disease processes
This article describes and illustrates a novel method of microarray data analysis that couples model-based clustering and binary classification to form clusters of ;response-relevant' genes; that is, genes that are informative when discriminating between the different values of the response. Predictions are subsequently made using an appropriate statistical summary of each gene cluster, which we call the ;meta-covariate' representation of the cluster, in a probit regression model. We first illustrate this method by analysing a leukaemia expression dataset, before focusing closely on the meta-covariate analysis of a renal gene expression dataset in a rat model of salt-sensitive hypertension. We explore the biological insights provided by our analysis of these data. In particular, we identify a highly influential cluster of 13 genes-including three transcription factors (Arntl, Bhlhe41 and Npas2)-that is implicated as being protective against hypertension in response to increased dietary sodium. Functional and canonical pathway analysis of this cluster using Ingenuity Pathway Analysis implicated transcriptional activation and circadian rhythm signalling, respectively. Although we illustrate our method using only expression data, the method is applicable to any high-dimensional datasets
- ā¦