6 research outputs found
Recommended from our members
Error, reproducibility and sensitivity : a pipeline for data processing of Agilent oligonucleotide expression arrays
Background
Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples.
Results
We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log2 units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators.
Conclusions
This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells
Nonparametric estimation of genewise variance for microarray data
Estimation of genewise variance arises from two important applications in
microarray data analysis: selecting significantly differentially expressed
genes and validation tests for normalization of microarray data. We approach
the problem by introducing a two-way nonparametric model, which is an extension
of the famous Neyman--Scott model and is applicable beyond microarray data. The
problem itself poses interesting challenges because the number of nuisance
parameters is proportional to the sample size and it is not obvious how the
variance function can be estimated when measurements are correlated. In such a
high-dimensional nonparametric problem, we proposed two novel nonparametric
estimators for genewise variance function and semiparametric estimators for
measurement correlation, via solving a system of nonlinear equations. Their
asymptotic normality is established. The finite sample property is demonstrated
by simulation studies. The estimators also improve the power of the tests for
detecting statistically differentially expressed genes. The methodology is
illustrated by the data from microarray quality control (MAQC) project.Comment: Published in at http://dx.doi.org/10.1214/10-AOS802 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Nonparametric methods for the analysis of single-color pathogen microarrays
<p>Abstract</p> <p>Background</p> <p>The analysis of oligonucleotide microarray data in pathogen surveillance and discovery is a challenging task. Target template concentration, nucleic acid integrity, and host nucleic acid composition can each have a profound effect on signal distribution. Exploratory analysis of fluorescent signal distribution in clinical samples has revealed deviations from normality, suggesting that distribution-free approaches should be applied.</p> <p>Results</p> <p>Positive predictive value and false positive rates were examined to assess the utility of three well-established nonparametric methods for the analysis of viral array hybridization data: (1) Mann-Whitney <it>U</it>, (2) the Spearman correlation coefficient and (3) the chi-square test. Of the three tests, the chi-square proved most useful.</p> <p>Conclusions</p> <p>The acceptance of microarray use for routine clinical diagnostics will require that the technology be accompanied by simple yet reliable analytic methods. We report that our implementation of the chi-square test yielded a combination of low false positive rates and a high degree of predictive accuracy.</p
Global gene expression profiling of healthy human brain and its application in studying neurological disorders
The human brain is the most complex structure known to mankind and one of the greatest challenges in modern biology is to understand how it is built and organized. The power of the brain arises from its variety of cells and structures, and ultimately where and when different genes are switched on and off throughout the brain tissue. In other words, brain function depends on the precise regulation of gene expression in its sub-anatomical structures. But, our understanding of the complexity and dynamics of the transcriptome of the human brain is still incomplete. To fill in the need, we designed a gene expression model that accurately defines the consistent blueprint of the brain transcriptome; thereby, identifying the core brain specific transcriptional processes conserved across individuals. Functionally characterizing this model would provide profound insights into the transcriptional landscape, biological pathways and the expression distribution of neurotransmitter systems.
Here, in this dissertation we developed an expression model by capturing the similarly expressed gene patterns across congruently annotated brain structures in six individual brains by using data from the Allen Brain Atlas (ABA). We found that 84% of genes are expressed in at least one of the 190 brain structures. By employing hierarchical clustering we were able to show that distinct structures of a bigger brain region can cluster together while still retaining their expression identity. Further, weighted correlation network analysis identified 19 robust modules of coexpressing genes in the brain that demonstrated a wide range of functional associations. Since signatures of local phenomena can be masked by larger signatures, we performed local analysis on each distinct brain structure. Pathway and gene ontology enrichment analysis on these structures showed, striking enrichment for brain region specific processes. Besides, we also mapped the structural distribution of the gene expression profiles of genes associated with major neurotransmission systems in the human. We also postulated the utility of healthy brain tissue gene expression to predict potential genes involved in a neurological disorder, in the absence of data from diseased tissues. To this end, we developed a supervised classification model, which achieved an accuracy of 84% and an AUC (Area Under the Curve) of 0.81 from ROC plots, for predicting autism-implicated genes using the healthy expression model as the baseline. This study represents the first use of healthy brain gene expression to predict the scope of genes in autism implication and this generic methodology can be applied to predict genes involved in other neurological disorders