15 research outputs found

    BeadArray Expression Analysis Using Bioconductor

    Get PDF
    Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered

    BeadDataPackR: A Tool to Facilitate the Sharing of Raw Data from Illumina BeadArray Studies

    Get PDF
    Microarray technologies have been an increasingly important tool in cancer research in the last decade, and a number of initiatives have sought to stress the importance of the provision and sharing of raw microarray data. Illumina BeadArrays provide a particular problem in this regard, as their random construction simultaneously adds value to analysis of the raw data and obstructs the sharing of those data

    Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips

    Get PDF
    Five strategies for pre-processing intensities from Illumina expression BeadChips are assessed from the point of view of precision and bias. The strategies include a popular variance stabilizing transformation and model-based background corrections that either use or ignore the control probes. Four calibration data sets are used to evaluate precision, bias and false discovery rate (FDR). The original algorithms are shown to have operating characteristics that are not easily comparable. Some tend to minimize noise while others minimize bias. Each original algorithm is shown to have an innate intensity offset, by which unlogged intensities are bounded away from zero, and the size of this offset determines its position on the noise–bias spectrum. By adding extra offsets, a continuum of related algorithms with different noise–bias trade-offs is generated, allowing direct comparison of the performance of the strategies on equivalent terms. Adding a positive offset is shown to decrease the FDR of each original algorithm. The potential of each strategy to generate an algorithm with an optimal noise–bias trade-off is explored by finding the offset that minimizes its FDR. The use of control probes as part of the background correction and normalization strategy is shown to achieve the lowest FDR for a given bias

    Generalization of the normal-exponential model: exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays

    Full text link
    Motivation: Illumina BeadArray technology includes negative control features that allow a precise estimation of the background noise. As an alternative to the background subtraction proposed in BeadStudio which leads to an important loss of information by generating negative values, a background correction method modeling the observed intensities as the sum of the exponentially distributed signal and normally distributed noise has been developed. Nevertheless, Wang and Ye (2011) display a kernel-based estimator of the signal distribution on Illumina BeadArrays and suggest that a gamma distribution would represent a better modeling of the signal density. Hence, the normal-exponential modeling may not be appropriate for Illumina data and background corrections derived from this model may lead to wrong estimation. Results: We propose a more flexible modeling based on a gamma distributed signal and a normal distributed background noise and develop the associated background correction. Our model proves to be markedly more accurate to model Illumina BeadArrays: on the one hand, this model offers a more correct fit of the observed intensities. On the other hand, the comparison of the operating characteristics of several background correction procedures on spike-in and on normal-gamma simulated data shows high similarities, reinforcing the validation of the normal-gamma modeling. The performance of the background corrections based on the normal-gamma and normal-exponential models are compared on two dilution data sets. Surprisingly, we observe that the implementation of a more accurate parametrisation in the model-based background correction does not increase the sensitivity. These results may be explained by the operating characteristics of the estimators: the normal-gamma background correction offers an improvement in terms of bias, but at the cost of a loss in precision

    Differential gene expression analysis in blood of first episode psychosis patients

    Get PDF
    Background Psychosis is a condition influenced by an interaction of environmental and genetic factors. Gene expression studies can capture these interactions; however, studies are usually performed in patients who are in remission. This study uses blood of first episode psychosis patients, in order to characterise deregulated pathways associated with psychosis symptom dimensions. Methods Peripheral blood from 149 healthy controls and 131 first episode psychosis patients was profiled using Illumina HT-12 microarrays. A case/control differential expression analysis was performed, followed by correlation of gene expression with positive and negative syndrome scale (PANSS) scores. Enrichment analyses were performed on the associated gene lists. We test for pathway differences between first episode psychosis patients who qualify for a Schizophrenia diagnosis against those who do not. Results A total of 978 genes were differentially expressed and enriched for pathways associated to immune function and the mitochondria. Using PANSS scores we found that positive symptom severity was correlated with immune function, while negative symptoms correlated with mitochondrial pathways. Conclusions Our results identified gene expression changes correlated with symptom severity and showed that key pathways are modulated by positive and negative symptom dimensions

    Meta-Analysis of Mesenchymal Stem Cell Gene Expression Data from Obese and Non-Obese Patients

    Get PDF
    The prevalence of gene expression microarray datasets in public repositories gives opportunity to analyze biologically interesting datasets without running the laboratory aspect in house. Such experimentation is expensive in terms of finances, time, and expertise, which often results in low numbers of replicates. Meta-analysis techniques attempt to overcome issues due to few biological or technical replicates by combining separate experiments together to increase statistical power. Proper statistical considerations help to offset issues like simultaneous testing of thousands of genes, unintended hybridization, and other noises. Microarrays contain light intensities from tens of thousands of hybridized probes giving a measure of gene expression for much of the human genome. This work focuses on identifying differentially expressed genes between obese and non-obese patients using microarray data from two studies collected from mesenchymal stem cell samples. Obesity is associated with poorer quality stem cells that are less readily available to differentiate and it is of interest to identify genes associated with this condition. Meta-analysis performed to increase statistical power from low replicate microarray experiments is an attempt to gain a better idea of the gene expression differences between obese and non-obese individuals compared to results from an individual study. Increased statistical power translates to improved ability to discover genes or sets of genes associated with this observed decrease in differentiation efficacy. Furthermore, pathway analysis could be completed to identify pathways of interest from this differential expression analysis --Abstract, p. ii

    Small data: practical modeling issues in human-model -omic data

    Get PDF
    This thesis is based on the following articles: Chapter 2: Holsbø, E., Perduca, V., Bongo, L.A., Lund, E. & Birmelé, E. (Manuscript). Stratified time-course gene preselection shows a pre-diagnostic transcriptomic signal for metastasis in blood cells: a proof of concept from the NOWAC study. Available at https://doi.org/10.1101/141325. Chapter 3: Bøvelstad, H.M., Holsbø, E., Bongo, L.A. & Lund, E. (Manuscript). A Standard Operating Procedure For Outlier Removal In Large-Sample Epidemiological Transcriptomics Datasets. Available at https://doi.org/10.1101/144519. Chapter 4: Holsbø, E. & Perduca, V. (2018). Shrinkage estimation of rate statistics. Case Studies in Business, Industry and Government Statistics 7(1), 14-25. Also available at http://hdl.handle.net/10037/14678.Human-model data are very valuable and important in biomedical research. Ethical and economical constraints limit the access to such data, and consequently these datasets rarely comprise more than a few hundred observations. As measurements are comparatively cheap, the tendency is to measure as many things as possible for the few, valuable participants in a study. With -omics technologies it is cheap and simple to make hundreds of thousands of measurements simultaneously. This few observations–many measurements setting is a high-dimensional problem in the technical language. Most gene expression experiments measure the expression levels of 10 000–15 000 genes for fewer than 100 subjects. I refer to this as the small data setting. This dissertation is an exercise in practical data analysis as it happens in a large epidemiological cohort study. It comprises three main projects: (i) predictive modeling of breast cancer metastasis from whole-blood transcriptomics measurements; (ii) standardizing a microarray data quality assessment in the Norwegian Women and Cancer (NOWAC) postgenome cohort; and (iii) shrinkage estimation of rates. These three are all small data analyses for various reasons. Predictive modeling in the small data setting is very challenging. There are several modern methods built to tackle high-dimensional data, but there is a need to evaluate these methods against one another when analyzing data in practice. Through the metastasis prediction work we learned first-hand that common practices in machine learning can be inefficient or harmful, especially for small data. I will outline some of the more important issues. In a large project such as NOWAC there is a need to centralize and disseminate knowledge and procedures. The standardization of NOWAC quality assessment was a project born of necessity. The standard operating procedure for outlier removal was developed so that preprocessing of the NOWAC microarray material should happen in the same way every time. We take this procedure from an archaic R-script that resided in peoples email inboxes to a well-documented, open-source R-package and present the NOWAC guidelines for microarray quality control. The procedure is built around the inherent high value of a singleobservation. Small data are plagued by high variance. Working with small data it is usually profitable to bias models by shrinkage or borrowing of information from elsewhere. We present a pseudo-Bayesian estimator of rates in an informal crime rate study. We exhibit the value of such procedures in a small data setting and demonstrate some novel considerations about the coverage properties of such a procedure. In short I gather some common practices in predictive modeling as applied to small data and assess their practical implications. I argue that with more focus on human-based datasets in biomedicine there is a need for particular consideration of these data in a small data paradigm to allow for reliable analysis. I will present what I believe to be sensible guidelines

    Transcriptional Analysis of Reciprocal Tumor-Microenvironment Interactions in Glioblastoma

    Get PDF
    In the last twenty years both computational biology and cancer biology have made great strides and in the last 5 years the merger of the two has helped to revolutionize our knowledge of personalized targeted therapy and the diversity of cancer. In cancer, cell-to-cell interactions between tumor cells and their microenvironment are critical determinants of tumor tissue biology and therapeutic responses. Interactions between glioblastoma (GBM) cells and endothelial cells (ECs) establish a purported stem cell niche. We hypothesized that genes that mediate these interactions would be important, particularly as therapeutic targets. Using a novel computational approach to deconvoluting expression data from mixed physical coculture of GBM cells and ECs, we identified a previously undescribed upregulation of the cAMP specific phosphodiesterase PDE7B in GBM cells in response to ECs. We further found that elevated PDE7B expression occurs in most GBM cases and has a negative effect on survival. PDE7B overexpression resulted in the expansion of a stem-like cell subpopulation, increased tumor aggressiveness, and increased growth in an intracranial GBM model. This deconvolution algorithm provides a new tool for cancer biology, particularly when looking at cell-to-cell interactions, and these results identify PDE7B as a therapeutic target in GBM
    corecore